Neural Network Toolbox™ Getting Started Guide R2013a Mark Hudson Beale Martin T. Hagan Howard B. Demuth How to Contact MathWorks Web Newsgroup www.mathworks.com/contact_TS.html Technical Support www.mathworks.com comp.soft-sys.matlab [email protected] [email protected] [email protected] [email protected] [email protected] Product enhancement suggestions Bug reports Documentation error reports Order status, license renewals, passcodes Sales, pricing, and general information 508-647-7000 (Phone) 508-647-7001 (Fax) The MathWorks, Inc. 3 Apple Hill Drive Natick, MA 01760-2098 For contact information about worldwide offices, see the MathWorks Web site. Neural Network Toolbox™ Getting Started Guide © COPYRIGHT 1992–2013 by The MathWorks, Inc. The software described in this document is furnished under a license agreement. The software may be used or copied only under the terms of the license agreement. No part of this manual may be photocopied or reproduced in any form without prior written consent from The MathWorks, Inc. FEDERAL ACQUISITION: This provision applies to all acquisitions of the Program and Documentation by, for, or through the federal government of the United States. By accepting delivery of the Program or Documentation, the government hereby agrees that this software or documentation qualifies as commercial computer software or commercial computer software documentation as such terms are used or defined in FAR 12.212, DFARS Part 227.72, and DFARS 252.227-7014. Accordingly, the terms and conditions of this Agreement and only those rights specified in this Agreement, shall pertain to and govern the use, modification, reproduction, release, performance, display, and disclosure of the Program and Documentation by the federal government (or other entity acquiring for or through the federal government) and shall supersede any conflicting contractual terms or conditions. If this License fails to meet the government’s needs or is inconsistent in any respect with federal procurement law, the government agrees to return the Program and Documentation, unused, to The MathWorks, Inc. Trademarks MATLAB and Simulink are registered trademarks of The MathWorks, Inc. See www.mathworks.com/trademarks for a list of additional trademarks. Other product or brand names may be trademarks or registered trademarks of their respective holders. Patents MathWorks products are protected by one or more U.S. patents. Please see www.mathworks.com/patents for more information. Revision History June 1992 April 1993 January 1997 July 1997 January 1998 September 2000 June 2001 July 2002 January 2003 June 2004 October 2004 October 2004 March 2005 March 2006 September 2006 March 2007 September 2007 March 2008 October 2008 March 2009 September 2009 March 2010 September 2010 April 2011 September 2011 March 2012 September 2012 March 2013 First printing Second printing Third printing Fourth printing Fifth printing Sixth printing Seventh printing Online only Online only Online only Online only Eighth printing Online only Online only Ninth printing Online only Online only Online only Online only Online only Online only Online only Tenth printing Online only Online only Online only Online only Online only Revised for Version 3 (Release 11) Revised for Version 4 (Release 12) Minor revisions (Release 12.1) Minor revisions (Release 13) Minor revisions (Release 13SP1) Revised for Version 4.0.3 (Release 14) Revised for Version 4.0.4 (Release 14SP1) Revised for Version 4.0.4 Revised for Version 4.0.5 (Release 14SP2) Revised for Version 5.0 (Release 2006a) Minor revisions (Release 2006b) Minor revisions (Release 2007a) Revised for Version 5.1 (Release 2007b) Revised for Version 6.0 (Release 2008a) Revised for Version 6.0.1 (Release 2008b) Revised for Version 6.0.2 (Release 2009a) Revised for Version 6.0.3 (Release 2009b) Revised for Version 6.0.4 (Release 2010a) Revised for Version 7.0 (Release 2010b) Revised for Version 7.0.1 (Release 2011a) Revised for Version 7.0.2 (Release 2011b) Revised for Version 7.0.3 (Release 2012a) Revised for Version 8.0 (Release 2012b) Revised for Version 8.0.1 (Release 2013a) _ Acknowledgments v Acknowledgments Acknowledgments The authors would like to thank the following people: Joe Hicklin of MathWorks for getting Howard into neural network research years ago at the University of Idaho, for encouraging Howard and Mark to write the toolbox, for providing crucial help in getting the first toolbox Version 1.0 out the door, for continuing to help with the toolbox in many ways, and for being such a good friend. Roy Lurie of MathWorks for his continued enthusiasm for the possibilities for Neural Network Toolbox™ software. Mary Ann Freeman of MathWorks for general support and for her leadership of a great team of people we enjoy working with. Rakesh Kumar of MathWorks for cheerfully providing technical and practical help, encouragement, ideas and always going the extra mile for us. Alan LaFleur of MathWorks for facilitating our documentation work. Stephen Vanreusel of MathWorks for help with testing. Dan Doherty of MathWorks for marketing support and ideas. Orlando De Jesús of Oklahoma State University for his excellent work in developing and programming the dynamic training algorithms described in “Time Series and Dynamic Systems” and in programming the neural network controllers described in “Neural Network Control Systems” in the Neural Network Toolbox User’s Guide. Martin T. Hagan, Howard B. Demuth, and Mark Hudson Beale for permission to include various problems, examples, and other material from Neural Network Design, January, 1996. vi Contents Acknowledgments Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi Getting Started 1 Product Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Key Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2 1-2 Neural Networks Overview . . . . . . . . . . . . . . . . . . . . . . . . . 1-3 Using the Toolbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Automatic Script Generation . . . . . . . . . . . . . . . . . . . . . . . . 1-4 1-5 Neural Network Toolbox Applications . . . . . . . . . . . . . . . 1-6 Neural Network Design Steps . . . . . . . . . . . . . . . . . . . . . . . 1-8 Fitting a Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Defining a Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using the Neural Network Fitting Tool . . . . . . . . . . . . . . . . Using Command-Line Functions . . . . . . . . . . . . . . . . . . . . . 1-9 1-9 1-11 1-21 Recognizing Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Defining a Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using the Neural Network Pattern Recognition Tool . . . . . Using Command-Line Functions . . . . . . . . . . . . . . . . . . . . . 1-29 1-29 1-31 1-41 Clustering Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Defining a Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-50 1-50 vii Using the Neural Network Clustering Tool . . . . . . . . . . . . . Using Command-Line Functions . . . . . . . . . . . . . . . . . . . . . 1-51 1-60 Time Series Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Defining a Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using the Neural Network Time Series Tool . . . . . . . . . . . Using Command-Line Functions . . . . . . . . . . . . . . . . . . . . . 1-67 1-67 1-68 1-80 Parallel Computing on CPUs and GPUs . . . . . . . . . . . . . . Parallel Computing Toolbox . . . . . . . . . . . . . . . . . . . . . . . . . Parallel CPU Workers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . GPU Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multiple GPU/CPU Computing . . . . . . . . . . . . . . . . . . . . . . Cluster Computing with MATLAB Distributed Computing Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Load Balancing, Large Problems, and Beyond . . . . . . . . . . 1-90 1-90 1-90 1-91 1-91 Sample Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-94 1-92 1-92 Glossary Index viii Contents 1 Getting Started • “Product Description” on page 1-2 • “Neural Networks Overview” on page 1-3 • “Using the Toolbox” on page 1-4 • “Neural Network Toolbox Applications” on page 1-6 • “Neural Network Design Steps” on page 1-8 • “Fitting a Function” on page 1-9 • “Recognizing Patterns” on page 1-29 • “Clustering Data” on page 1-50 • “Time Series Prediction” on page 1-67 • “Parallel Computing on CPUs and GPUs” on page 1-90 • “Sample Data Sets” on page 1-94 1 Getting Started Product Description Create, train, and simulate neural networks Neural Network Toolbox™ provides functions and apps for modeling complex nonlinear systems that are not easily modeled with a closed-form equation. Neural Network Toolbox supports supervised learning with feedforward, radial basis, and dynamic networks. It also supports unsupervised learning with self-organizing maps and competitive layers. With the toolbox you can design, train, visualize, and simulate neural networks. You can use Neural Network Toolbox for applications such as data fitting, pattern recognition, clustering, time-series prediction, and dynamic system modeling and control. To speed up training and handle large data sets, you can distribute computations and data across multicore processors, GPUs, and computer clusters using Parallel Computing Toolbox™. Key Features • Supervised networks, including multilayer, radial basis, learning vector quantization (LVQ), time-delay, nonlinear autoregressive (NARX), and layer-recurrent • Unsupervised networks, including self-organizing maps and competitive layers • Apps for data-fitting, pattern recognition, and clustering • Parallel computing and GPU support for accelerating training (using Parallel Computing Toolbox) • Preprocessing and postprocessing for improving the efficiency of network training and assessing network performance • Modular network representation for managing and visualizing networks of arbitrary size • Simulink® blocks for building and evaluating neural networks and for control systems applications 1-2 Neural Networks Overview Neural Networks Overview Neural networks are composed of simple elements operating in parallel. These elements are inspired by biological nervous systems. As in nature, the connections between elements largely determine the network function. You can train a neural network to perform a particular function by adjusting the values of the connections (weights) between elements. Typically, neural networks are adjusted, or trained, so that a particular input leads to a specific target output. The next figure illustrates such a situation. There, the network is adjusted, based on a comparison of the output and the target, until the network output matches the target. Typically, many such input/target pairs are needed to train a network. Neural networks have been trained to perform complex functions in various fields, including pattern recognition, identification, classification, speech, vision, and control systems. Neural networks can also be trained to solve problems that are difficult for conventional computers or human beings. The toolbox emphasizes the use of neural network paradigms that build up to—or are themselves used in— engineering, financial, and other practical applications. This topic explains how to use four graphical tools for training neural networks to solve problems in function fitting, pattern recognition, clustering, and time series. Using these four tools will give you an excellent introduction to the use of the Neural Network Toolbox software. 1-3 1 Getting Started Using the Toolbox There are four ways you can use the Neural Network Toolbox software. The first way is through the four graphical user interfaces (GUIs). You can open these GUIs from a master GUI, which you can open with the command nnstart. These provide a quick and easy way to access the power of the toolbox for the following tasks: • Function fitting • Pattern recognition • Data clustering • Time series analysis The second way to use the toolbox is through basic command-line operations. The command-line operations offer more flexibility than the GUIs, but with some added complexity. If this is your first experience with the toolbox, the GUIs provide the best introduction. In addition, the GUIs can generate scripts of documented MATLAB® code to provide you with templates for creating your own customized command-line functions. The process of using the GUIs first, and then generating and modifying MATLAB scripts, is an excellent way to learn about the functionality of the toolbox. The third way to use the toolbox is through customization. This advanced capability allows you to create your own custom neural networks, while still having access to the full functionality of the toolbox. You can create networks with arbitrary connections, and you will still be able to train them using existing toolbox training functions (as long as the network components are differentiable). The fourth way to use the toolbox is through the ability to modify any of the functions contained in the toolbox. Every computational component is written in MATLAB code and is fully accessible. These four levels of toolbox usage span the novice to the expert - simple wizards guide the new user through specific applications, and network customization allows researchers to try novel architectures with minimal 1-4 Using the Toolbox effort. Whatever your level of neural network and MATLAB knowledge, there are toolbox features to suit your needs. Automatic Script Generation The GUIs described in this topic form an important part of the documentation for the Neural Network Toolbox software. The GUIs guide you through the process of designing neural networks to solve problems in four important application areas, without requiring any background in neural networks or sophistication in using MATLAB. In addition, the GUIs can automatically generate both simple and advanced MATLAB scripts that can reproduce the steps performed by the GUI, but with the option to override default settings. These scripts can provide you with a template for creating customized code, and they can aid you in becoming familiar with the command-line functionality of the toolbox. It is highly recommended that you use the automatic script generation facility of the GUIs. 1-5 1 Getting Started Neural Network Toolbox Applications It would be impossible to cover the total range of applications for which neural networks have provided outstanding solutions. The remaining sections of this topic describe only a few of the applications in function fitting, pattern recognition, clustering, and time series analysis. The following table provides an idea of the diversity of applications for which neural networks provide state-of-the-art solutions. 1-6 Industry Business Applications Aerospace High-performance aircraft autopilot, flight path simulation, aircraft control systems, autopilot enhancements, aircraft component simulation, and aircraft component fault detection Automotive Automobile automatic guidance system, and warranty activity analysis Banking Check and other document reading and credit application evaluation Defense Weapon steering, target tracking, object discrimination, facial recognition, new kinds of sensors, sonar, radar and image signal processing including data compression, feature extraction and noise suppression, and signal/image identification Electronics Code sequence prediction, integrated circuit chip layout, process control, chip failure analysis, machine vision, voice synthesis, and nonlinear modeling Entertainment Animation, special effects, and market forecasting Financial Real estate appraisal, loan advising, mortgage screening, corporate bond rating, credit-line use analysis, credit card activity tracking, portfolio trading program, corporate financial analysis, and currency price prediction Industrial Prediction of industrial processes, such as the output gases of furnaces, replacing complex and costly equipment used for this purpose in the past Neural Network Toolbox™ Applications Industry Business Applications Insurance Policy application evaluation and product optimization Manufacturing Manufacturing process control, product design and analysis, process and machine diagnosis, real-time particle identification, visual quality inspection systems, beer testing, welding quality analysis, paper quality prediction, computer-chip quality analysis, analysis of grinding operations, chemical product design analysis, machine maintenance analysis, project bidding, planning and management, and dynamic modeling of chemical process system Medical Breast cancer cell analysis, EEG and ECG analysis, prosthesis design, optimization of transplant times, hospital expense reduction, hospital quality improvement, and emergency-room test advisement Oil and gas Exploration Robotics Trajectory control, forklift robot, manipulator controllers, and vision systems Securities Market analysis, automatic bond rating, and stock trading advisory systems Speech Speech recognition, speech compression, vowel classification, and text-to-speech synthesis Telecommunications Image and data compression, automated information services, real-time translation of spoken language, and customer payment processing systems Transportation Truck brake diagnosis systems, vehicle scheduling, and routing systems 1-7 1 Getting Started Neural Network Design Steps In the remaining sections of this topic, you will follow the standard steps for designing neural networks to solve problems in four application areas: function fitting, pattern recognition, clustering, and time series analysis. The work flow for any of these problems has seven primary steps. (Data collection in step 1, while important, generally occurs outside the MATLAB environment.) 1 Collect data 2 Create the network 3 Configure the network 4 Initialize the weights and biases 5 Train the network 6 Validate the network 7 Use the network You will follow these steps using both the GUI tools and command-line operations in the next four sections. 1-8 Fitting a Function Fitting a Function Neural networks are good at fitting functions. In fact, there is proof that a fairly simple neural network can fit any practical function. Suppose, for instance, that you have data from a housing application. You want to design a network that can predict the value of a house (in $1000s), given 13 pieces of geographical and real estate information. You have a total of 506 example homes for which you have those 13 items of data and their associated market values. You can solve this problem in two ways: • Use a graphical user interface, nftool, as described in “Using the Neural Network Fitting Tool” on page 1-11. • Use command-line functions, as described in “Using Command-Line Functions” on page 1-21. It is generally best to start with the GUI, and then to use the GUI to automatically generate command-line scripts. Before using either method, first define the problem by selecting a data set. Each GUI has access to many sample data sets that you can use to experiment with the toolbox (see “Sample Data Sets” on page 1-94). If you have a specific problem that you want to solve, you can load your own data into the workspace. The next section describes the data format. Defining a Problem To define a fitting problem for the toolbox, arrange a set of Q input vectors as columns in a matrix. Then, arrange another set of Q target vectors (the correct output vectors for each of the input vectors) into a second matrix (see “Data Structures” for a detailed description of data formatting for static and time series data). For example, you can define the fitting problem for a Boolean AND gate with four sets of two-element input vectors and one-element targets as follows: inputs = [0 1 0 1; 0 0 1 1]; targets = [0 0 0 1]; 1-9 1 Getting Started The next section shows how to train a network to fit a data set, using the neural network fitting tool GUI, nftool. This example uses the housing data set provided with the toolbox. 1-10 Fitting a Function Using the Neural Network Fitting Tool 1 Open the Neural Network Start GUI with this command: nnstart 2 Click Fitting Tool to open the Neural Network Fitting Tool. (You can also use the command nftool.) 1-11 1 Getting Started 3 Click Next to proceed. 4 Click Load Example Data Set in the Select Data window. The Fitting Data Set Chooser window opens. Note Use the Inputs and Targets options in the Select Data window when you need to load data from the MATLAB workspace. 1-12 Fitting a Function 5 Select House Pricing, and click Import. This returns you to the Select Data window. 6 Click Next to display the Validation and Test Data window, shown in the following figure. The validation and test data sets are each set to 15% of the original data. 1-13 1 Getting Started With these settings, the input vectors and target vectors will be randomly divided into three sets as follows: • 70% will be used for training. • 15% will be used to validate that the network is generalizing and to stop training before overfitting. • The last 15% will be used as a completely independent test of network generalization. (See “Dividing the Data” for more discussion of the data division process.) 7 Click Next. The standard network that is used for function fitting is a two-layer feedforward network, with a sigmoid transfer function in the hidden layer and a linear transfer function in the output layer. The default number of hidden neurons is set to 10. You might want to increase this number later, if the network training performance is poor. 1-14 Fitting a Function 8 Click Next. 1-15 1 Getting Started 9 Click Train. The training continued until the validation error failed to decrease for six iterations (validation stop). 1-16 Fitting a Function 10 Under Plots, click Regression. This is used to validate the network performance. The following regression plots display the network outputs with respect to targets for training, validation, and test sets. For a perfect fit, the data should fall along a 45 degree line, where the network outputs are equal to the targets. For this problem, the fit is reasonably good for all data sets, with R values in each case of 0.93 or above. If even more accurate results were required, you could retrain the network by clicking Retrain in nftool. This will change the initial weights and biases of the network, 1-17 1 Getting Started and may produce an improved network after retraining. Other options are provided on the following pane. 11 View the error histogram to obtain additional verification of network performance. Under the Plots pane, click Error Histogram. 1-18 Fitting a Function The blue bars represent training data, the green bars represent validation data, and the red bars represent testing data. The histogram can give you an indication of outliers, which are data points where the fit is significantly worse than the majority of data. In this case, you can see that while most errors fall between -5 and 5, there is a training point with an error of 17 and validation points with errors of 12 and 13. These outliers are also visible on the testing regression plot. The first corresponds to the point with a target of 50 and output near 33. It is a good idea to check the outliers to determine if the data is bad, or if those data points are different than the rest of the data set. If the outliers are valid data points, but are unlike the rest of the data, then the network is extrapolating for these points. You should collect more data that looks like the outlier points, and retrain the network. 12 Click Next in the Neural Network Fitting Tool to evaluate the network. 1-19 1 Getting Started At this point, you can test the network against new data. If you are dissatisfied with the network’s performance on the original or new data, you can do one of the following: • Train it again. • Increase the number of neurons. • Get a larger training data set. If the performance on the training set is good, but the test set performance is significantly worse, which could indicate overfitting, then reducing the number of neurons can improve your results. If training performance is poor, then you may want to increase the number of neurons. 13 If you are satisfied with the network performance, click Next. 14 Use the buttons on this screen to generate scripts or to save your results. 1-20 Fitting a Function • You can click Simple Script or Advanced Script to create MATLAB code that can be used to reproduce all of the previous steps from the command line. Creating MATLAB code can be helpful if you want to learn how to use the command-line functionality of the toolbox to customize the training process. In “Using Command-Line Functions” on page 1-21, you will investigate the generated scripts in more detail. • You can also have the network saved as net in the workspace. You can perform additional tests on it or put it to work on new inputs. 15 When you have created the MATLAB code and saved your results, click Finish. Using Command-Line Functions The easiest way to learn how to use the command-line functionality of the toolbox is to generate scripts from the GUIs, and then modify them to customize the network training. As an example, look at the simple script that was created at step 14 of the previous section. % Solve an Input-Output Fitting problem with a Neural Network 1-21 1 Getting Started % Script generated by NFTOOL % % This script assumes these variables are defined: % % houseInputs - input data. % houseTargets - target data. inputs = houseInputs; targets = houseTargets; % Create a Fitting Network hiddenLayerSize = 10; net = fitnet(hiddenLayerSize); % Set up Division of Data for Training, Validation, Testing net.divideParam.trainRatio = 70/100; net.divideParam.valRatio = 15/100; net.divideParam.testRatio = 15/100; % Train the Network [net,tr] = train(net,inputs,targets); % Test the Network outputs = net(inputs); errors = gsubtract(outputs,targets); performance = perform(net,targets,outputs) % View the Network view(net) % Plots % Uncomment these lines to enable various plots. %figure, plotperform(tr) %figure, plottrainstate(tr) %figure, plotfit(targets,outputs) %figure, plotregression(targets,outputs) %figure, ploterrhist(errors) 1-22 Fitting a Function You can save the script, and then run it from the command line to reproduce the results of the previous GUI session. You can also edit the script to customize the training process. In this case, follow each step in the script. 1 The script assumes that the input vectors and target vectors are already loaded into the workspace. If the data are not loaded, you can load them as follows: load house_dataset inputs = houseInputs; targets = houseTargets; This data set is one of the sample data sets that is part of the toolbox (see “Sample Data Sets” on page 1-94). You can see a list of all available data sets by entering the command help nndatasets. The load command also allows you to load the variables from any of these data sets using your own variable names. For example, the command [inputs,targets] = house_dataset; will load the housing inputs into the array inputs and the housing targets into the array targets. 2 Create a network. The default network for function fitting (or regression) problems, fitnet, is a feedforward network with the default tan-sigmoid transfer function in the hidden layer and linear transfer function in the output layer. You assigned ten neurons (somewhat arbitrary) to the one hidden layer in the previous section. The network has one output neuron, because there is only one target value associated with each input vector. hiddenLayerSize = 10; net = fitnet(hiddenLayerSize); Note More neurons require more computation, and they have a tendency to overfit the data when the number is set too high, but they allow the network to solve more complicated problems. More layers require more computation, but their use might result in the network solving complex problems more efficiently. To use more than one hidden layer, enter the hidden layer sizes as elements of an array in the fitnet command. 1-23 1 Getting Started 3 Set up the division of data. net.divideParam.trainRatio = 70/100; net.divideParam.valRatio = 15/100; net.divideParam.testRatio = 15/100; With these settings, the input vectors and target vectors will be randomly divided, with 70% used for training, 15% for validation and 15% for testing. (See “Dividing the Data” for more discussion of the data division process.) 4 Train the network. The network uses the default Levenberg-Marquardt algorithm for training (trainlm). To train the network, enter: [net,tr] = train(net,inputs,targets); During training, the following training window opens. This window displays training progress and allows you to interrupt training at any point by clicking Stop Training. 1-24 Fitting a Function This training stopped when the validation error increased for six iterations, which occurred at iteration 23. If you click Performance in the training window, a plot of the training errors, validation errors, and test errors appears, as shown in the following figure. In this example, the result is reasonable because of the following considerations: • The final mean-square error is small. • The test set error and the validation set error have similar characteristics. • No significant overfitting has occurred by iteration 17 (where the best validation performance occurs). 1-25 1 Getting Started 5 Test the network. After the network has been trained, you can use it to compute the network outputs. The following code calculates the network outputs, errors and overall performance. outputs = net(inputs); errors = gsubtract(targets,outputs); performance = perform(net,targets,outputs) It is also possible to calculate the network performance only on the test set, by using the testing indices, which are located in the training record. (See “Post-Training Analysis (Network Validation)” for a full description of the training record.) tInd = tr.testInd; tstOutputs = net(inputs(tInd)); tstPerform = perform(net,targets(tInd),tstOutputs) 1-26 Fitting a Function 6 Perform some analysis of the network response. If you click Regression in the training window, you can perform a linear regression between the network outputs and the corresponding targets. The following figure shows the results. The output tracks the targets very well for training, testing, and validation, and the R-value is over 0.95 for the total response. If even more accurate results were required, you could try any of these approaches: • Reset the initial network weights and biases to new values with init and train again (see “Initializing Weights” (init)). 1-27 1 Getting Started • Increase the number of hidden neurons. • Increase the number of training vectors. • Increase the number of input values, if more relevant information is available. • Try a different training algorithm (see “Training Algorithms”). In this case, the network response is satisfactory, and you can now put the network to use on new inputs. 7 View the network diagram. view(net) This creates the following diagram of the network. To get more experience in command-line operations, try some of these tasks: • During training, open a plot window (such as the regression plot), and watch it animate. • Plot from the command line with functions such as plotfit, plotregression, plottrainstate and plotperform. (For more information on using these functions, see their reference pages.) Also, see the advanced script for more options, when training from the command line. 1-28 Recognizing Patterns Recognizing Patterns In addition to function fitting, neural networks are also good at recognizing patterns. For example, suppose you want to classify a tumor as benign or malignant, based on uniformity of cell size, clump thickness, mitosis, etc. You have 699 example cases for which you have 9 items of data and the correct classification as benign or malignant. As with function fitting, there are two ways to solve this problem: • Use the nprtool GUI, as described in “Using the Neural Network Pattern Recognition Tool” on page 1-31. • Use a command-line solution, as described in “Using Command-Line Functions” on page 1-60. It is generally best to start with the GUI, and then to use the GUI to automatically generate command-line scripts. Before using either method, the first step is to define the problem by selecting a data set. The next section describes the data format. Defining a Problem To define a pattern recognition problem, arrange a set of Q input vectors as columns in a matrix. Then arrange another set of Q target vectors so that they indicate the classes to which the input vectors are assigned (see “Data Structures” for a detailed description of data formatting for static and time series data). There are two approaches to creating the target vectors. One approach can be used when there are only two classes; you set each scalar target value to either 1 or 0, indicating which class the corresponding input belongs to. For instance, you can define the exclusive-or classification problem as follows: inputs = [0 1 0 1; 0 0 1 1]; targets = [0 1 1 0]; 1-29 1 Getting Started Alternately, target vectors can have N elements, where for each target vector, one element is 1 and the others are 0. This defines a problem where inputs are to be classified into N different classes. For example, the following lines show how to define a classification problem that divides the corners of a 5-by-5-by-5 cube into three classes: • The origin (the first input vector) in one class • The corner farthest from the origin (the last input vector) in a second class • All other points in a third class inputs = [0 0 0 0 5 5 5 5; 0 0 5 5 0 0 5 5; 0 5 0 5 0 5 0 5]; targets = [1 0 0 0 0 0 0 0; 0 1 1 1 1 1 1 0; 0 0 0 0 0 0 0 1]; Classification problems involving only two classes can be represented using either format. The targets can consist of either scalar 1/0 elements or two-element vectors, with one element being 1 and the other element being 0. The next section shows how to train a network to recognize patterns, using the neural network pattern recognition tool GUI, nprtool. This example uses the cancer data set provided with the toolbox. This data set consists of 699 nine-element input vectors and two-element target vectors. There are two elements in each target vector, because there are two categories (benign or malignant) associated with each input vector. 1-30 Recognizing Patterns Using the Neural Network Pattern Recognition Tool 1 If needed, open the Neural Network Start GUI with this command: nnstart 2 Click Pattern Recognition Tool to open the Neural Network Pattern Recognition Tool. (You can also use the command nprtool.) 1-31 1 Getting Started 3 Click Next to proceed. The Select Data window opens. 4 Click Load Example Data Set. The Pattern Recognition Data Set Chooser window opens. 1-32 Recognizing Patterns 5 Select Breast Cancer and click Import. You return to the Select Data window. 6 Click Next to continue to the Validation and Test Data window. 1-33 1 Getting Started Validation and test data sets are each set to 15% of the original data. With these settings, the input vectors and target vectors will be randomly divided into three sets as follows: • 70% are used for training. • 15% are used to validate that the network is generalizing and to stop training before overfitting. • The last 15% are used as a completely independent test of network generalization. (See “Dividing the Data” for more discussion of the data division process.) 7 Click Next. The standard network that is used for pattern recognition is a two-layer feedforward network, with sigmoid transfer functions in both the hidden layer and the output layer. The default number of hidden neurons is set to 10. You might want to come back and increase this number if the network does not perform as well as you expect. The number of output neurons 1-34 Recognizing Patterns is set to 2, which is equal to the number of elements in the target vector (the number of categories). 8 Click Next. 1-35 1 Getting Started 9 Click Train. 1-36 Recognizing Patterns The training continues for 55 iterations. 10 Under the Plots pane, click Confusion in the Neural Network Pattern Recognition Tool. The next figure shows the confusion matrices for training, testing, and validation, and the three kinds of data combined. The network outputs are very accurate, as you can see by the high numbers of correct responses in the green squares and the low numbers of incorrect responses in the red squares. The lower right blue squares illustrate the overall accuracies. 1-37 1 Getting Started 11 Plot the Receiver Operating Characteristic (ROC) curve. Under the Plots pane, click Receiver Operating Characteristic in the Neural Network Pattern Recognition Tool. 1-38 Recognizing Patterns The colored lines in each axis represent the ROC curves. The ROC curve is a plot of the true positive rate (sensitivity) versus the false positive rate (1 specificity) as the threshold is varied. A perfect test would show points in the upper-left corner, with 100% sensitivity and 100% specificity. For this problem, the network performs very well. 12 In the Neural Network Pattern Recognition Tool, click Next to evaluate the network. 1-39 1 Getting Started At this point, you can test the network against new data. If you are dissatisfied with the network’s performance on the original or new data, you can train it again, increase the number of neurons, or perhaps get a larger training data set. If the performance on the training set is good, but the test set performance is significantly worse, which could indicate overfitting, then reducing the number of neurons can improve your results. 13 When you are satisfied with the network performance, click Next. 1-40 Recognizing Patterns 14 Use the buttons on this screen to save your results. • You can click Simple Script or Advanced Script to create MATLAB code that can be used to reproduce all of the previous steps from the command line. Creating MATLAB code can be helpful if you want to learn how to use the command-line functionality of the toolbox to customize the training process. In “Using Command-Line Functions” on page 1-41, you will investigate the generated scripts in more detail. • You can also save the network as net in the workspace. You can perform additional tests on it or put it to work on new inputs. 15 When you have saved your results, click Finish. Using Command-Line Functions The easiest way to learn how to use the command-line functionality of the toolbox is to generate scripts from the GUIs, and then modify them to customize the network training. As an example, let’s look at the simple script that was created at step 14 of the previous section. 1-41 1 Getting Started % Solve a Pattern Recognition Problem with a Neural Network % Script generated by NPRTOOL % % This script assumes these variables are defined: % % cancerInputs - input data. % cancerTargets - target data. inputs = cancerInputs; inputs = cancerTargets; % Create a Pattern Recognition Network hiddenLayerSize = 10; net = patternnet(hiddenLayerSize); % Set up Division of Data for Training, Validation, Testing net.divideParam.trainRatio = 70/100; net.divideParam.valRatio = 15/100; net.divideParam.testRatio = 15/100; % Train the Network [net,tr] = train(net,inputs,targets); % Test the Network outputs = net(inputs); errors = gsubtract(targets,outputs); performance = perform(net,targets,outputs) % View the Network view(net) % Plots % Uncomment these lines to enable various plots. %figure, plotperform(tr) %figure, plottrainstate(tr) %figure, plotconfusion(targets,outputs) %figure, ploterrhist(errors) 1-42 Recognizing Patterns You can save the script, and then run it from the command line to reproduce the results of the previous GUI session. You can also edit the script to customize the training process. In this case, follow each step in the script. 1 The script assumes that the input vectors and target vectors are already loaded into the workspace. If the data are not loaded, you can load them as follows: load cancer_dataset inputs = cancerInputs; targets = cancerTargets; 2 Create the network. The default network for function fitting (or regression) problems, patternnet, is a feedforward network with the default tan-sigmoid transfer functions in both the hidden and output layers. You assigned ten neurons (somewhat arbitrary) to the one hidden layer in the previous section. • The network has two output neurons, because there are two target values (categories) associated with each input vector. • Each output neuron represents a category. • When an input vector of the appropriate category is applied to the network, the corresponding neuron should produce a 1, and the other neurons should output a 0. To create the network, enter these commands: hiddenLayerSize = 10; net = patternnet(hiddenLayerSize); Note The choice of network architecture for pattern recognition problems follows similar guidelines to function fitting problems. More neurons require more computation, and they have a tendency to overfit the data when the number is set too high, but they allow the network to solve more complicated problems. More layers require more computation, but their use might result in the network solving complex problems more efficiently. To use more than one hidden layer, enter the hidden layer sizes as elements of an array in the patternnet command. 1-43 1 Getting Started 3 Set up the division of data. net.divideParam.trainRatio = 70/100; net.divideParam.valRatio = 15/100; net.divideParam.testRatio = 15/100; With these settings, the input vectors and target vectors will be randomly divided, with 70% used for training, 15% for validation and 15% for testing. (See “Dividing the Data” for more discussion of the data division process.) 4 Train the network. The pattern recognition network uses the default Scaled Conjugate Gradient (trainscg) algorithm for training. To train the network, enter this command: [net,tr] = train(net,inputs,targets); During training, as in function fitting, the training window opens. This window displays training progress. To interrupt training at any point, click Stop Training. 1-44 Recognizing Patterns This training stopped when the validation error increased for six iterations, which occurred at iteration 24. 5 Test the network. After the network has been trained, you can use it to compute the network outputs. The following code calculates the network outputs, errors and overall performance. outputs = net(inputs); errors = gsubtract(targets,outputs); performance = perform(net,targets,outputs) It is also possible to calculate the network performance only on the test set, by using the testing indices, which are located in the training record. 1-45 1 Getting Started tInd = tr.testInd; tstOutputs = net(inputs(tInd)); tstPerform = perform(net,targets(tInd),tstOutputs) 6 View the network diagram. view(net) This creates the following diagram of the network. 7 Plot the training, validation, and test performance. figure, plotperform(tr) 1-46 Recognizing Patterns 8 Use the plotconfusion function to plot the confusion matrix. It shows the various types of errors that occurred for the final trained network. figure, plotconfusion(targets,outputs) The next figure shows the results. 1-47 1 Getting Started The diagonal cells show the number of cases that were correctly classified, and the off-diagonal cells show the misclassified cases. The blue cell in the bottom right shows the total percent of correctly classified cases (in green) and the total percent of misclassified cases (in red). The results show very good recognition. If you needed even more accurate results, you could try any of the following approaches: • Reset the initial network weights and biases to new values with init and train again. • Increase the number of hidden neurons. • Increase the number of training vectors. • Increase the number of input values, if more relevant information is available. • Try a different training algorithm (see “Training Algorithms”). 1-48 Recognizing Patterns In this case, the network response is satisfactory, and you can now put the network to use on new inputs. To get more experience in command-line operations, here are some tasks you can try: • During training, open a plot window (such as the confusion plot), and watch it animate. • Plot from the command line with functions such as plotroc and plottrainstate. Also, see the advanced script for more options, when training from the command line. 1-49 1 Getting Started Clustering Data Clustering data is another excellent application for neural networks. This process involves grouping data by similarity. For example, you might perform: • Market segmentation by grouping people according to their buying patterns • Data mining by partitioning data into related subsets • Bioinformatic analysis by grouping genes with related expression patterns Suppose that you want to cluster flower types according to petal length, petal width, sepal length, and sepal width. You have 150 example cases for which you have these four measurements. As with function fitting and pattern recognition, there are two ways to solve this problem: • Use the nctool GUI, as described in “Using the Neural Network Clustering Tool” on page 1-51. • Use a command-line solution, as described in “Using Command-Line Functions” on page 1-60. Defining a Problem To define a clustering problem, simply arrange Q input vectors to be clustered as columns in an input matrix (see “Data Structures” for a detailed description of data formatting for static and time series data). For instance, you might want to cluster this set of 10 two-element vectors: inputs = [7 0 6 2 6 5 6 1 0 1; 6 2 5 0 7 5 5 1 2 2] The next section shows how to train a network using the nctool GUI. 1-50 Clustering Data Using the Neural Network Clustering Tool 1 If needed, open the Neural Network Start GUI with this command: nnstart 2 Click Clustering Tool to open the Neural Network Clustering Tool. (You can also use the command nctool.) 1-51 1 Getting Started 3 Click Next. The Select Data window appears. 4 Click Load Example Data Set. The Clustering Data Set Chooser window appears. 1-52 Clustering Data 5 In this window, select Simple Clusters, and click Import. You return to the Select Data window. 6 Click Next to continue to the Network Size window, shown in the following figure. For clustering problems, the self-organizing feature map (SOM) is the most commonly used network, because after the network has been trained, there are many visualization tools that can be used to analyze the resulting clusters. This network has one layer, with neurons organized in a grid. (For more information on the SOM, see “Self-Organizing Feature Maps”.) When creating the network, you specify the numbers of rows and columns in the grid. Here, the number of rows and columns is set to 10. The total number of neurons is 100. You can change this number in another run if you want. 1-53 1 Getting Started 7 Click Next. The Train Network window appears. 1-54 Clustering Data 8 Click Train. 1-55 1 Getting Started The training runs for the maximum number of epochs, which is 200. 9 For SOM training, the weight vector associated with each neuron moves to become the center of a cluster of input vectors. In addition, neurons that are adjacent to each other in the topology should also move close to each other in the input space, therefore it is possible to visualize a high-dimensional inputs space in the two dimensions of the network topology. Investigate some of the visualization tools for the SOM. Under the Plots pane, click SOM Sample Hits. 1-56 Clustering Data The default topology of the SOM is hexagonal. This figure shows the neuron locations in the topology, and indicates how many of the training data are associated with each of the neurons (cluster centers). The topology is a 10-by-10 grid, so there are 100 neurons. The maximum number of hits associated with any neuron is 22. Thus, there are 22 input vectors in that cluster. 10 You can also visualize the SOM by displaying weight planes (also referred to as component planes). Click SOM Weight Planes in the Neural Network Clustering Tool. 1-57 1 Getting Started This figure shows a weight plane for each element of the input vector (two, in this case). They are visualizations of the weights that connect each input to each of the neurons. (Darker colors represent larger weights.) If the connection patterns of two inputs were very similar, you can assume that the inputs are highly correlated. In this case, input 1 has connections that are very different than those of input 2. 11 In the Neural Network Clustering Tool, click Next to evaluate the network. 1-58 Clustering Data At this point you can test the network against new data. If you are dissatisfied with the network’s performance on the original or new data, you can increase the number of neurons, or perhaps get a larger training data set. 12 When you are satisfied with the network performance, click Next. 13 Use the buttons on this screen to save your results. 1-59 1 Getting Started • You can click Simple Script or Advanced Script to create MATLAB code that can be used to reproduce all of the previous steps from the command line. Creating MATLAB code can be helpful if you want to learn how to use the command-line functionality of the toolbox to customize the training process. In “Using Command-Line Functions” on page 1-60, you will investigate the generated scripts in more detail. • You can also save the network as net in the workspace. You can perform additional tests on it or put it to work on new inputs. 14 When you have generated scripts and saved your results, click Finish. Using Command-Line Functions The easiest way to learn how to use the command-line functionality of the toolbox is to generate scripts from the GUIs, and then modify them to customize the network training. As an example, look at the simple script that was created in step 14 of the previous section. % Solve a Clustering Problem with a Self-Organizing Map % Script generated by NCTOOL 1-60 Clustering Data % % This script assumes these variables are defined: % % simpleclusterInputs - input data. inputs = simpleclusterInputs; % Create a Self-Organizing Map dimension1 = 10; dimension2 = 10; net = selforgmap([dimension1 dimension2]); % Train the Network [net,tr] = train(net,inputs); % Test the Network outputs = net(inputs); % View the Network view(net) % Plots % Uncomment these lines to enable various plots. %figure, plotsomtop(net) %figure, plotsomnc(net) %figure, plotsomnd(net) %figure, plotsomplanes(net) %figure, plotsomhits(net,inputs) %figure, plotsompos(net,inputs) You can save the script, and then run it from the command line to reproduce the results of the previous GUI session. You can also edit the script to customize the training process. In this case, let’s follow each of the steps in the script. 1 The script assumes that the input vectors are already loaded into the workspace. To show the command-line operations, you can use a different data set than you used for the GUI operation. Use the flower data set as an example. The iris data set consists of 150 four-element input vectors. 1-61 1 Getting Started load iris_dataset inputs = irisInputs; 2 Create a network. For this example, you use a self-organizing map (SOM). This network has one layer, with the neurons organized in a grid. (For more information, see “Self-Organizing Feature Maps”.) When creating the network with selforgmap, you specify the number of rows and columns in the grid: dimension1 = 10; dimension2 = 10; net = selforgmap([dimension1 dimension2]); 3 Train the network. The SOM network uses the default batch SOM algorithm for training. [net,tr] = train(net,inputs); 4 During training, the training window opens and displays the training progress. To interrupt training at any point, click Stop Training. 1-62 Clustering Data 5 Test the network. After the network has been trained, you can use it to compute the network outputs. outputs = net(inputs); 6 View the network diagram. view(net) This creates the following diagram of the network. 1-63 1 Getting Started 7 For SOM training, the weight vector associated with each neuron moves to become the center of a cluster of input vectors. In addition, neurons that are adjacent to each other in the topology should also move close to each other in the input space, therefore it is possible to visualize a high-dimensional inputs space in the two dimensions of the network topology. The default SOM topology is hexagonal; to view it, enter the following commands. figure, plotsomtop(net) In this figure, each of the hexagons represents a neuron. The grid is 10-by-10, so there are a total of 100 neurons in this network. There are four elements in each input vector, so the input space is four-dimensional. The weight vectors (cluster centers) fall within this space. Because this SOM has a two-dimensional topology, you can visualize in two dimensions the relationships among the four-dimensional cluster centers. One visualization tool for the SOM is the weight distance matrix (also called the U-matrix). 1-64 Clustering Data 8 To view the U-matrix, click SOM Neighbor Distances in the training window. In this figure, the blue hexagons represent the neurons. The red lines connect neighboring neurons. The colors in the regions containing the red lines indicate the distances between neurons. The darker colors represent larger distances, and the lighter colors represent smaller distances. A band of dark segments crosses from the lower-center region to the upper-right region. The SOM network appears to have clustered the flowers into two distinct groups. 1-65 1 Getting Started To get more experience in command-line operations, try some of these tasks: • During training, open a plot window (such as the SOM weight position plot) and watch it animate. • Plot from the command line with functions such as plotsomhits, plotsomnc, plotsomnd, plotsomplanes, plotsompos, and plotsomtop. (For more information on using these functions, see their reference pages.) Also, see the advanced script for more options, when training from the command line. 1-66 Time Series Prediction Time Series Prediction Dynamic neural networks are good at time series prediction. Suppose, for instance, that you have data from a pH neutralization process. You want to design a network that can predict the pH of a solution in a tank from past values of the pH and past values of the acid and base flow rate into the tank. You have a total of 2001 time steps for which you have those series. You can solve this problem in two ways: • Use a graphical user interface, ntstool, as described in “Using the Neural Network Time Series Tool” on page 1-68. • Use command-line functions, as described in “Using Command-Line Functions” on page 1-80. It is generally best to start with the GUI, and then to use the GUI to automatically generate command-line scripts. Before using either method, the first step is to define the problem by selecting a data set. Each GUI has access to many sample data sets that you can use to experiment with the toolbox. If you have a specific problem that you want to solve, you can load your own data into the workspace. The next section describes the data format. Defining a Problem To define a time series problem for the toolbox, arrange a set of TS input vectors as columns in a cell array. Then, arrange another set of TS target vectors (the correct output vectors for each of the input vectors) into a second cell array (see “Data Structures” for a detailed description of data formatting for static and time series data). However, there are cases in which you only need to have a target data set. For example, you can define the following time series problem, in which you want to use previous values of a series to predict the next value: targets = {1 2 3 4 5}; The next section shows how to train a network to fit a time series data set, using the neural network time series tool GUI, ntstool. This example uses the pH neutralization data set provided with the toolbox. 1-67 1 Getting Started Using the Neural Network Time Series Tool 1 If needed, open the Neural Network Start GUI with this command: nnstart 2 Click Time Series Tool to open the Neural Network Time Series Tool. (You can also use the command ntstool.) 1-68 Time Series Prediction Notice that this opening pane is different than the opening panes for the other GUIs. This is because ntstool can be used to solve three different kinds of time series problems. • In the first type of time series problem, you would like to predict future values of a time series y(t) from past values of that time series and past values of a second time series x(t). This form of prediction is called nonlinear autoregressive with exogenous (external) input, or NARX (see “NARX Network” (narxnet, closeloop)), and can be written as follows: y(t) = f(y(t – 1), ..., y(t – d), x(t – 1), ..., (t – d)) This model could be used to predict future values of a stock or bond, based on such economic variables as unemployment rates, GDP, etc. It could also be used for system identification, in which models are developed to represent dynamic systems, such as chemical processes, manufacturing systems, robotics, aerospace vehicles, etc. • In the second type of time series problem, there is only one series involved. The future values of a time series y(t) are predicted only from past values of that series. This form of prediction is called nonlinear autoregressive, or NAR, and can be written as follows: y(t) = f(y(t – 1), ..., y(t – d)) This model could also be used to predict financial instruments, but without the use of a companion series. • The third time series problem is similar to the first type, in that two series are involved, an input series x(t) and an output/target series y(t). Here you want to predict values of y(t) from previous values of x(t), but without knowledge of previous values of y(t). This input/output model can be written as follows: y(t) = f(x(t – 1), ..., x(t – d)) The NARX model will provide better predictions than this input-output model, because it uses the additional information contained in the previous values of y(t). However, there may be some applications in which the previous values of y(t) would not be available. Those are the 1-69 1 Getting Started only cases where you would want to use the input-output model instead of the NARX model. 3 For this example, select the NARX model and click Next to proceed. 4 Click Load Example Data Set in the Select Data window. The Time Series Data Set Chooser window opens. Note Use the Inputs and Targets options in the Select Data window when you need to load data from the MATLAB workspace. 1-70 Time Series Prediction 5 Select pH Neutralization Process, and click Import. This returns you to the Select Data window. 6 Click Next to open the Validation and Test Data window, shown in the following figure. The validation and test data sets are each set to 15% of the original data. 1-71 1 Getting Started With these settings, the input vectors and target vectors will be randomly divided into three sets as follows: • 70% will be used for training. • 15% will be used to validate that the network is generalizing and to stop training before overfitting. • The last 15% will be used as a completely independent test of network generalization. (See “Dividing the Data” for more discussion of the data division process.) 7 Click Next. 1-72 Time Series Prediction The standard NARX network is a two-layer feedforward network, with a sigmoid transfer function in the hidden layer and a linear transfer function in the output layer. This network also uses tapped delay lines to store previous values of the x(t) and y(t) sequences. Note that the output of the NARX network, y(t), is fed back to the input of the network (through delays), since y(t) is a function of y(t – 1), y(t – 2), ..., y(t – d). However, for efficient training this feedback loop can be opened. Because the true output is available during the training of the network, you can use the open-loop architecture shown above, in which the true output is used instead of feeding back the estimated output. This has two advantages. The first is that the input to the feedforward network is more accurate. The second is that the resulting network has a purely feedforward architecture, and therefore a more efficient algorithm can be used for training. This network is discussed in more detail in “NARX Network” (narxnet, closeloop). 1-73 1 Getting Started The default number of hidden neurons is set to 10. The default number of delays is 2. Change this value to 4. You might want to adjust these numbers if the network training performance is poor. 8 Click Next. 9 Click Train. The training continued until the validation error failed to decrease for six iterations (validation stop). 1-74 Time Series Prediction 10 Under Plots, click Error Autocorrelation. This is used to validate the network performance. The following plot displays the error autocorrelation function. It describes how the prediction errors are related in time. For a perfect prediction model, there should only be one nonzero value of the autocorrelation function, and it should occur at zero lag. (This is the mean square error.) This would mean that the prediction errors were completely uncorrelated with each other (white noise). If there was significant correlation in the prediction errors, then it should be possible to improve the prediction perhaps by increasing the number of delays in the tapped delay lines. In this case, the correlations, except for the one at zero lag, fall approximately 1-75 1 Getting Started within the 95% confidence limits around zero, so the model seems to be adequate. If even more accurate results were required, you could retrain the network by clicking Retrain in ntstool. This will change the initial weights and biases of the network, and may produce an improved network after retraining. 11 View the input-error cross-correlation function to obtain additional verification of network performance. Under the Plots pane, click Input-Error Cross-correlation. 1-76 Time Series Prediction This input-error cross-correlation function illustrates how the errors are correlated with the input sequence x(t). For a perfect prediction model, all of the correlations should be zero. If the input is correlated with the error, then it should be possible to improve the prediction, perhaps by increasing the number of delays in the tapped delay lines. In this case, all of the correlations fall within the confidence bounds around zero. 12 Under Plots, click Time Series Response. This displays the inputs, targets and errors versus time. It also indicates which time points were selected for training, testing and validation. 1-77 1 Getting Started 13 Click Next in the Neural Network Time Series Tool to evaluate the network. 1-78 Time Series Prediction At this point, you can test the network against new data. If you are dissatisfied with the network’s performance on the original or new data, you can do any of the following: • Train it again. • Increase the number of neurons and/or the number of delays. • Get a larger training data set. If the performance on the training set is good, but the test set performance is significantly worse, which could indicate overfitting, then reducing the number of neurons can improve your results. 14 If you are satisfied with the network performance, click Next. 15 Use the buttons on this screen to generate scripts or to save your results. 1-79 1 Getting Started • You can click Simple Script or Advanced Script to create MATLAB code that can be used to reproduce all of the previous steps from the command line. Creating MATLAB code can be helpful if you want to learn how to use the command-line functionality of the toolbox to customize the training process. In “Using Command-Line Functions” on page 1-80, you will investigate the generated scripts in more detail. • You can also have the network saved as net in the workspace. You can perform additional tests on it or put it to work on new inputs. 16 After creating MATLAB code and saving your results, click Finish. Using Command-Line Functions The easiest way to learn how to use the command-line functionality of the toolbox is to generate scripts from the GUIs, and then modify them to customize the network training. As an example, look at the simple script that was created at step 15 of the previous section. % Solve an Autoregression Problem with External % Input with a NARX Neural Network 1-80 Time Series Prediction % Script generated by NTSTOOL % % This script assumes the variables on the right of % these equalities are defined: % % phInputs - input time series. % phTargets - feedback time series. inputSeries = phInputs; targetSeries = phTargets; % Create a Nonlinear Autoregressive Network with External Input inputDelays = 1:4; feedbackDelays = 1:4; hiddenLayerSize = 10; net = narxnet(inputDelays,feedbackDelays,hiddenLayerSize); % Prepare the Data for Training and Simulation % The function PREPARETS prepares time series data % for a particular network, shifting time by the minimum % amount to fill input states and layer states. % Using PREPARETS allows you to keep your original % time series data unchanged, while easily customizing it % for networks with differing numbers of delays, with % open loop or closed loop feedback modes. [inputs,inputStates,layerStates,targets] = ... preparets(net,inputSeries,{},targetSeries); % Set up Division of Data for Training, Validation, Testing net.divideParam.trainRatio = 70/100; net.divideParam.valRatio = 15/100; net.divideParam.testRatio = 15/100; % Train the Network [net,tr] = train(net,inputs,targets,inputStates,layerStates); % Test the Network outputs = net(inputs,inputStates,layerStates); errors = gsubtract(targets,outputs); performance = perform(net,targets,outputs) 1-81 1 Getting Started % View the Network view(net) % Plots % Uncomment these lines to enable various plots. %figure, plotperform(tr) %figure, plottrainstate(tr) %figure, plotregression(targets,outputs) %figure, plotresponse(targets,outputs) %figure, ploterrcorr(errors) %figure, plotinerrcorr(inputs,errors) % Closed Loop Network % Use this network to do multi-step prediction. % The function CLOSELOOP replaces the feedback input with a direct % connection from the outout layer. netc = closeloop(net); netc.name = [net.name ' - Closed Loop']; view(netc) [xc,xic,aic,tc] = preparets(netc,inputSeries,{},targetSeries); yc = netc(xc,xic,aic); closedLoopPerformance = perform(netc,tc,yc) % Early Prediction Network % For some applications it helps to get the prediction a % timestep early. % The original network returns predicted y(t+1) at the same % time it is given y(t+1). % For some applications such as decision making, it would % help to have predicted y(t+1) once y(t) is available, but % before the actual y(t+1) occurs. % The network can be made to return its output a timestep early % by removing one delay so that its minimal tap delay is now % 0 instead of 1. The new network returns the same outputs as % the original network, but outputs are shifted left one timestep. nets = removedelay(net); nets.name = [net.name ' - Predict One Step Ahead']; view(nets) [xs,xis,ais,ts] = preparets(nets,inputSeries,{},targetSeries); 1-82 Time Series Prediction ys = nets(xs,xis,ais); earlyPredictPerformance = perform(nets,ts,ys) You can save the script, and then run it from the command line to reproduce the results of the previous GUI session. You can also edit the script to customize the training process. In this case, follow each of the steps in the script. 1 The script assumes that the input vectors and target vectors are already loaded into the workspace. If the data are not loaded, you can load them as follows: load pH_dataset inputSeries = phInputs; targetSeries = phTargets; 2 Create a network. The NARX network, narxnet, is a feedforward network with the default tan-sigmoid transfer function in the hidden layer and linear transfer function in the output layer. This network has two inputs. One is an external input, and the other is a feedback connection from the network output. (After the network has been trained, this feedback connection can be closed, as you will see at a later step.) For each of these inputs, there is a tapped delay line to store previous values. To assign the network architecture for a NARX network, you must select the delays associated with each tapped delay line, and also the number of hidden layer neurons. In the following steps, you assign the input delays and the feedback delays to range from 1 to 4 and the number of hidden neurons to be 10. inputDelays = 1:4; feedbackDelays = 1:4; hiddenLayerSize = 10; net = narxnet(inputDelays,feedbackDelays,hiddenLayerSize); 1-83 1 Getting Started Note Increasing the number of neurons and the number of delays requires more computation, and this has a tendency to overfit the data when the numbers are set too high, but it allows the network to solve more complicated problems. More layers require more computation, but their use might result in the network solving complex problems more efficiently. To use more than one hidden layer, enter the hidden layer sizes as elements of an array in the fitnet command. 3 Prepare the data for training. When training a network containing tapped delay lines, it is necessary to fill the delays with initial values of the inputs and outputs of the network. There is a toolbox command that facilitates this process - preparets. This function has three input arguments: the network, the input sequence and the target sequence. The function returns the initial conditions that are needed to fill the tapped delay lines in the network, and modified input and target sequences, where the initial conditions have been removed. You can call the function as follows: [inputs,inputStates,layerStates,targets] = ... preparets(net,inputSeries,{},targetSeries); 4 Set up the division of data. net.divideParam.trainRatio = 70/100; net.divideParam.valRatio = 15/100; net.divideParam.testRatio = 15/100; With these settings, the input vectors and target vectors will be randomly divided, with 70% used for training, 15% for validation and 15% for testing. 5 Train the network. The network uses the default Levenberg-Marquardt algorithm (trainlm) for training. To train the network, enter: [net,tr] = train(net,inputs,targets,inputStates,layerStates); During training, the following training window opens. This window displays training progress and allows you to interrupt training at any point by clicking Stop Training. 1-84 Time Series Prediction This training stopped when the validation error increased for six iterations, which occurred at iteration 70. 6 Test the network. After the network has been trained, you can use it to compute the network outputs. The following code calculates the network outputs, errors and overall performance. Note that to simulate a network with tapped delay lines, you need to assign the initial values for these delayed signals. This is done with inputStates and layerStates provided by preparets at an earlier stage. 1-85 1 Getting Started outputs = net(inputs,inputStates,layerStates); errors = gsubtract(targets,outputs); performance = perform(net,targets,outputs) 7 View the network diagram. view(net) This creates the following diagram of the network. 8 Plot the performance training record to check for potential overfitting. figure, plotperform(tr) This creates the following figure, which shows that training, validation and testing errors all decreased until iteration 64. It does not appear that any overfitting has occurred, since neither testing or validation error increased before iteration 64. 1-86 Time Series Prediction 9 Close the loop on the NARX network. When the feedback loop is open on the NARX network, it is performing a one-step-ahead prediction. It is predicting the next value of y(t) from previous values of y(t) and x(t). With the feedback loop closed, it can be used to perform multi-step-ahead predictions. This is because predictions of y(t) will be used in place of actual future values of y(t). The following commands can be used to close the loop and calculate closed-loop performance netc = closeloop(net); netc.name = [net.name ' - Closed Loop']; view(netc) [xc,xic,aic,tc] = preparets(netc,inputSeries,{},targetSeries); yc = netc(xc,xic,aic); perfc = perform(netc,tc,yc) The following figure shows the closed loop network. 1-87 1 Getting Started 10 Remove a delay from the network, to get the prediction one time step early. nets = removedelay(net); nets.name = [net.name ' - Predict One Step Ahead']; view(nets) [xs,xis,ais,ts] = preparets(nets,inputSeries,{},targetSeries); ys = nets(xs,xis,ais); earlyPredictPerformance = perform(nets,ts,ys) From this figure, you can see that the network is identical to the previous open-loop network, except that one delay has been removed from each of the tapped delay lines. The output of the network is then y(t + 1) instead of y(t). This may sometimes be helpful when a network is deployed for certain applications. If the network performance is not satisfactory, you could try any of these approaches: 1-88 Time Series Prediction • Reset the initial network weights and biases to new values with init and train again (see “Initializing Weights” (init)). • Increase the number of hidden neurons or the number of delays. • Increase the number of training vectors. • Increase the number of input values, if more relevant information is available. • Try a different training algorithm (see “Training Algorithms”). To get more experience in command-line operations, try some of these tasks: • During training, open a plot window (such as the error correlation plot), and watch it animate. • Plot from the command line with functions such as plotresponse, ploterrcorr and plotperform. (For more information on using these functions, see their reference pages.) Also, see the advanced script for more options, when training from the command line. 1-89 1 Getting Started Parallel Computing on CPUs and GPUs In this section... “Parallel Computing Toolbox” on page 1-90 “Parallel CPU Workers” on page 1-90 “GPU Computing” on page 1-91 “Multiple GPU/CPU Computing” on page 1-91 “Cluster Computing with MATLAB Distributed Computing Server” on page 1-92 “Load Balancing, Large Problems, and Beyond” on page 1-92 Parallel Computing Toolbox Neural network training and simulation involves many parallel calculations. Multicore CPUs, graphical processing units (GPUs), and clusters of computers with multiple CPUs and GPUs can all take advantage of parallel calculations. Together, Neural Network Toolbox and Parallel Computing Toolbox enable the multiple CPU cores and GPUs of a single computer to speed up training and simulation of large problems. The following is a standard single-threaded training and simulation session. (While the benefits of parallelism are most visible for large problems, this example uses a small dataset that ships with Neural Network Toolbox.) [x,t] = house_dataset; net1 = feedforwardnet(10); net2 = train(net1,x,t); y = net2(x); Parallel CPU Workers Intel® processors ship with as many as eight cores. Workstations with two processors can have as many as 16 cores, with even more possible in the future. Using multiple CPU cores in parallel can dramatically speed up calculations. 1-90 Parallel Computing on CPUs and GPUs Open a MATLAB pool of parallel CPU workers and determine the number of workers in the pool. matlabpool open matlabpool('size') An “Undefined function or variable” appears when you do not have a license for Parallel Computing Toolbox. When a MATLAB pool of CPU workers is open, set the train function’s 'useParallel' option to 'yes' to specify that training and simulation be performed across the pool. net2 = train(net1,x,t,'useParallel','yes'); y = net2(x,'useParallel','yes'); GPU Computing GPUs can have as many as 3072 cores on a single card, and possibly more in the future. These cards are highly efficient on parallel algorithms like neural networks. Use gpuDeviceCount to check whether a supported GPU card is available in your system. Use the function gpuDevice to review the currently selected GPU information or to select a different GPU. gpuDeviceCount gpuDevice gpuDevice(2) % Select device 2, if available An “Undefined function or variable” error appears if you do not have a license for Parallel Computing Toolbox. When you have selected the GPU device, set the train or sim function’s 'useGPU' option to 'yes' to perform training and simulation on it. net2 = train(net1,x,t,'useGPU','yes'); y = net2(x,'useGPU','yes'); Multiple GPU/CPU Computing You can use multiple GPUs for higher levels of parallelism. 1-91 1 Getting Started After opening a MATLAB pool, set both 'useParallel' and 'useGPU' to 'yes' to harness all the GPUs and CPU cores on a single computer. Each worker associated with a unique GPU uses that GPU. The rest of the workers perform calculations on their CPU core. net2 = train(net1,x,t,'useParallel','yes','useGPU','yes'); y = net2(x,'useParallel','yes','useGPU','yes'); For some problems, using GPUs and CPUs together can result in the highest computing speed. For other problems, the CPUs might not keep up with the GPUs, and so using only GPUs is faster. Set 'useGPU' to 'only', to restrict the parallel computing to workers with unique GPUs. net2 = train(net1,x,t,'useParallel','yes','useGPU','only'); y = net2(x,'useParallel','yes','useGPU','only'); Cluster Computing with MATLAB Distributed Computing Server MATLAB Distributed Computing Server™ allows you to harness all the CPUs and GPUs on a network cluster of computers. To take advantage of a cluster, open a MATLAB pool with a cluster profile. Use the MATLAB Home tab Environment area Parallel menu to manage and select profiles. After opening a cluster pool, call train and the network with the 'useParallel' and 'useGPU' options. net2 = train(net1,x,t,'useParallel','yes'); y = net2(x,'useParallel','yes'); net2 = train(net1,x,t,'useParallel','yes'); y = net2(x,'useParallel','yes'); net2 = train(net1,x,t,'useParallel','yes','useGPU','only'); y = net2(x,'useParallel','yes','useGPU','only'); Load Balancing, Large Problems, and Beyond For more information on parallel computing with Neural Network Toolbox, see “Parallel and GPU Computing”, which introduces other topics, such as how to manually distribute data sets across CPU and GPU workers to best take advantage of differences in machine speed and memory. 1-92 Parallel Computing on CPUs and GPUs Distributing data manually also allows worker data to load sequentially, so that data sets are limited in size only by the total RAM of a cluster instead of the RAM of a single computer. This lets you apply neural networks to very large problems. 1-93 1 Getting Started Sample Data Sets The Neural Network Toolbox software contains a number of sample data sets that you can use to experiment with the functionality of the toolbox. To view the data sets that are available, use the following command: help nndatasets Neural Network Datasets ----------------------Function Fitting, Function approximation and Curve fitting. Function fitting is the process of training a neural network on a set of inputs in order to produce an associated set of target outputs. Once the neural network has fit the data, it forms a generalization of the input-output relationship and can be used to generate outputs for inputs it was not trained on. simplefit_dataset abalone_dataset bodyfat_dataset building_dataset chemical_dataset cho_dataset engine_dataset house_dataset - Simple fitting dataset. Abalone shell rings dataset. Body fat percentage dataset. Building energy dataset. Chemical sensor dataset. Cholesterol dataset. Engine behavior dataset. House value dataset ---------Pattern Recognition and Classification Pattern recognition is the process of training a neural network to assign the correct target classes to a set of input patterns. Once trained the network can be used to classify patterns it has not seen before. simpleclass_dataset cancer_dataset crab_dataset 1-94 - Simple pattern recognition dataset. - Breast cancer dataset. - Crab gender dataset. Sample Data Sets glass_dataset iris_dataset thyroid_dataset wine_dataset - Glass chemical dataset. Iris flower dataset. Thyroid function dataset. Italian wines dataset. ---------Clustering, Feature extraction and Data dimension reduction Clustering is the process of training a neural network on patterns so that the network comes up with its own classifications according to pattern similarity and relative topology. This is useful for gaining insight into data, or simplifying it before further processing. simplecluster_dataset - Simple clustering dataset. The inputs of fitting or pattern recognition datasets may also clustered. ---------Input-Output Time-Series Prediction, Forecasting, Dynamic modelling, Nonlinear autoregression, System identification and Filtering Input-output time series problems consist of predicting the next value of one time-series given another time-series. Past values of both series (for best accuracy), or only one of the series (for a simpler system) may be used to predict the target series. simpleseries_dataset simplenarx_dataset exchanger_dataset maglev_dataset ph_dataset pollution_dataset valve_dataset - Simple time-series prediction dataset. Simple time-series prediction dataset. Heat exchanger dataset. Magnetic levitation dataset. Solution PH dataset. Pollution mortality dataset. Valve fluid flow dataset. 1-95 1 Getting Started ---------Single Time-Series Prediction, Forecasting, Dynamic modelling, Nonlinear autoregression, System identification, and Filtering Single time-series prediction involves predicting the next value of a time-series given its past values. simplenar_dataset chickenpox_dataset ice_dataset laser_dataset oil_dataset river_dataset solar_dataset - Simple single series prediction dataset. Monthly chickenpox instances dataset. Gobal ice volume dataset. Chaotic far-infrared laser dataset. Monthly oil price dataset. River flow dataset. Sunspot activity dataset Notice that all of the data sets have file names of the form name_dataset. Inside these files will be the arrays nameInputs and nameTargets. You can load a data set into the workspace with a command such as load simplefit_dataset This will load simplefitInputs and simplefitTargets into the workspace. If you want to load the input and target arrays into different names, you can use a command such as [x,t] = simplefit_datasets; This will load the inputs and targets into the arrays x and t. You can get a description of a data set with a command such as help maglev_dataset 1-96 Glossary Glossary ADALINE Acronym for a linear neuron: ADAptive LINear Element. adaption Training method that proceeds through the specified sequence of inputs, calculating the output, error, and network adjustment for each input vector in the sequence as the inputs are presented. adaptive filter Network that contains delays and whose weights are adjusted after each new input vector is presented. The network adapts to changes in the input signal properties if such occur. This kind of filter is used in long distance telephone lines to cancel echoes. adaptive learning rate Learning rate that is adjusted according to an algorithm during training to minimize training time. architecture Description of the number of the layers in a neural network, each layer’s transfer function, the number of neurons per layer, and the connections between layers. backpropagation learning rule Learning rule in which weights and biases are adjusted by error-derivative (delta) vectors backpropagated through the network. Backpropagation is commonly applied to feedforward multilayer networks. Sometimes this rule is called the generalized delta rule. backtracking search Linear search routine that begins with a step multiplier of 1 and then backtracks until an acceptable reduction in performance is obtained. batch Matrix of input (or target) vectors applied to the network simultaneously. Changes to the network weights and biases are made just once for the entire set of vectors in the input matrix. (The term batch is being replaced by the more descriptive expression “concurrent vectors.”) Glossary-1 Glossary batching Process of presenting a set of input vectors for simultaneous calculation of a matrix of output vectors and/or new weights and biases. Bayesian framework Assumes that the weights and biases of the network are random variables with specified distributions. BFGS quasi-Newton algorithm Variation of Newton’s optimization algorithm, in which an approximation of the Hessian matrix is obtained from gradients computed at each iteration of the algorithm. bias Neuron parameter that is summed with the neuron’s weighted inputs and passed through the neuron’s transfer function to generate the neuron’s output. bias vector Column vector of bias values for a layer of neurons. Brent’s search Linear search that is a hybrid of the golden section search and a quadratic interpolation. cascade-forward network Layered network in which each layer only receives inputs from previous layers. Charalambous’ search Hybrid line search that uses a cubic interpolation together with a type of sectioning. classification Association of an input vector with a particular target vector. competitive layer Layer of neurons in which only the neuron with maximum net input has an output of 1 and all other neurons have an output of 0. Neurons compete with each other for the right to respond to a given input vector. Glossary-2 Glossary competitive learning Unsupervised training of a competitive layer with the instar rule or Kohonen rule. Individual neurons learn to become feature detectors. After training, the layer categorizes input vectors among its neurons. competitive transfer function Accepts a net input vector for a layer and returns neuron outputs of 0 for all neurons except for the winner, the neuron associated with the most positive element of the net input n. concurrent input vectors Name given to a matrix of input vectors that are to be presented to a network simultaneously. All the vectors in the matrix are used in making just one set of changes in the weights and biases. conjugate gradient algorithm In the conjugate gradient algorithms, a search is performed along conjugate directions, which produces generally faster convergence than a search along the steepest descent directions. connection One-way link between neurons in a network. connection strength Strength of a link between two neurons in a network. The strength, often called weight, determines the effect that one neuron has on another. cycle Single presentation of an input vector, calculation of output, and new weights and biases. dead neuron Competitive layer neuron that never won any competition during training and so has not become a useful feature detector. Dead neurons do not respond to any of the training vectors. decision boundary Line, determined by the weight and bias vectors, for which the net input n is zero. Glossary-3 Glossary delta rule See Widrow-Hoff learning rule. delta vector The delta vector for a layer is the derivative of a network’s output error with respect to that layer’s net input vector. distance Distance between neurons, calculated from their positions with a distance function. distance function Particular way of calculating distance, such as the Euclidean distance between two vectors. early stopping Technique based on dividing the data into three subsets. The first subset is the training set, used for computing the gradient and updating the network weights and biases. The second subset is the validation set. When the validation error increases for a specified number of iterations, the training is stopped, and the weights and biases at the minimum of the validation error are returned. The third subset is the test set. It is used to verify the network design. epoch Presentation of the set of training (input and/or target) vectors to a network and the calculation of new weights and biases. Note that training vectors can be presented one at a time or all together in a batch. error jumping Sudden increase in a network’s sum-squared error during training. This is often due to too large a learning rate. error ratio Training parameter used with adaptive learning rate and momentum training of backpropagation networks. error vector Difference between a network’s output vector in response to an input vector and an associated target output vector. Glossary-4 Glossary feedback network Network with connections from a layer’s output to that layer’s input. The feedback connection can be direct or pass through several layers. feedforward network Layered network in which each layer only receives inputs from previous layers. Fletcher-Reeves update Method for computing a set of conjugate directions. These directions are used as search directions as part of a conjugate gradient optimization procedure. function approximation Task performed by a network trained to respond to inputs with an approximation of a desired function. generalization Attribute of a network whose output for a new input vector tends to be close to outputs for similar input vectors in its training set. generalized regression network Approximates a continuous function to an arbitrary accuracy, given a sufficient number of hidden neurons. global minimum Lowest value of a function over the entire range of its input parameters. Gradient descent methods adjust weights and biases in order to find the global minimum of error for a network. golden section search Linear search that does not require the calculation of the slope. The interval containing the minimum of the performance is subdivided at each iteration of the search, and one subdivision is eliminated at each iteration. gradient descent Process of making changes to weights and biases, where the changes are proportional to the derivatives of network error with respect to those weights and biases. This is done to minimize network error. Glossary-5 Glossary hard-limit transfer function Transfer function that maps inputs greater than or equal to 0 to 1, and all other values to 0. Hebb learning rule Historically the first proposed learning rule for neurons. Weights are adjusted proportional to the product of the outputs of pre- and postweight neurons. hidden layer Layer of a network that is not connected to the network output (for instance, the first layer of a two-layer feedforward network). home neuron Neuron at the center of a neighborhood. hybrid bisection-cubic search Line search that combines bisection and cubic interpolation. initialization Process of setting the network weights and biases to their original values. input layer Layer of neurons receiving inputs directly from outside the network. input space Range of all possible input vectors. input vector Vector presented to the network. input weight vector Row vector of weights going to a neuron. input weights Weights connecting network inputs to layers. Glossary-6 Glossary Jacobian matrix Contains the first derivatives of the network errors with respect to the weights and biases. Kohonen learning rule Learning rule that trains a selected neuron’s weight vectors to take on the values of the current input vector. layer Group of neurons having connections to the same inputs and sending outputs to the same destinations. layer diagram Network architecture figure showing the layers and the weight matrices connecting them. Each layer’s transfer function is indicated with a symbol. Sizes of input, output, bias, and weight matrices are shown. Individual neurons and connections are not shown. (See Network Objects, Data and Training Styles in the Neural Network Toolbox User’s Guide.) layer weights Weights connecting layers to other layers. Such weights need to have nonzero delays if they form a recurrent connection (i.e., a loop). learning Process by which weights and biases are adjusted to achieve some desired network behavior. learning rate Training parameter that controls the size of weight and bias changes during learning. learning rule Method of deriving the next changes that might be made in a network or a procedure for modifying the weights and biases of a network. Levenberg-Marquardt Algorithm that trains a neural network 10 to 100 times faster than the usual gradient descent backpropagation method. It always computes the approximate Hessian matrix, which has dimensions n-by-n. Glossary-7 Glossary line search function Procedure for searching along a given search direction (line) to locate the minimum of the network performance. linear transfer function Transfer function that produces its input as its output. link distance Number of links, or steps, that must be taken to get to the neuron under consideration. local minimum Minimum of a function over a limited range of input values. A local minimum might not be the global minimum. log-sigmoid transfer function Squashing function of the form shown below that maps the input to the interval (0,1). (The toolbox function is logsig.) f (n) = 1 1 + e− n Manhattan distance The Manhattan distance between two vectors x and y is calculated as D = sum(abs(x-y)) maximum performance increase Maximum amount by which the performance is allowed to increase in one iteration of the variable learning rate training algorithm. maximum step size Maximum step size allowed during a linear search. The magnitude of the weight vector is not allowed to increase by more than this maximum step size in one iteration of a training algorithm. mean square error function Performance function that calculates the average squared error between the network outputs a and the target outputs t. Glossary-8 Glossary momentum Technique often used to make it less likely for a backpropagation network to get caught in a shallow minimum. momentum constant Training parameter that controls how much momentum is used. mu parameter Initial value for the scalar µ. neighborhood Group of neurons within a specified distance of a particular neuron. The neighborhood is specified by the indices for all the neurons that lie within a radius d of the winning neuron i*: Ni(d) = {j,dij ≤ d} net input vector Combination, in a layer, of all the layer’s weighted input vectors with its bias. neuron Basic processing element of a neural network. Includes weights and bias, a summing junction, and an output transfer function. Artificial neurons, such as those simulated and trained with this toolbox, are abstractions of biological neurons. neuron diagram Network architecture figure showing the neurons and the weights connecting them. Each neuron’s transfer function is indicated with a symbol. ordering phase Period of training during which neuron weights are expected to order themselves in the input space consistent with the associated neuron positions. output layer Layer whose output is passed to the world outside the network. Glossary-9 Glossary output vector Output of a neural network. Each element of the output vector is the output of a neuron. output weight vector Column vector of weights coming from a neuron or input. (See also outstar learning rule.) outstar learning rule Learning rule that trains a neuron’s (or input’s) output weight vector to take on the values of the current output vector of the postweight layer. Changes in the weights are proportional to the neuron’s output. overfitting Case in which the error on the training set is driven to a very small value, but when new data is presented to the network, the error is large. pass Each traverse through all the training input and target vectors. pattern A vector. pattern association Task performed by a network trained to respond with the correct output vector for each input vector presented. pattern recognition Task performed by a network trained to respond when an input vector close to a learned vector is presented. The network “recognizes” the input as one of the original target vectors. perceptron Single-layer network with a hard-limit transfer function. This network is often trained with the perceptron learning rule. perceptron learning rule Learning rule for training single-layer hard-limit networks. It is guaranteed to result in a perfectly functioning network in finite time, given that the network is capable of doing so. Glossary-10 Glossary performance Behavior of a network. performance function Commonly the mean squared error of the network outputs. However, the toolbox also considers other performance functions. Type nnets and look under performance functions. Polak-Ribiére update Method for computing a set of conjugate directions. These directions are used as search directions as part of a conjugate gradient optimization procedure. positive linear transfer function Transfer function that produces an output of zero for negative inputs and an output equal to the input for positive inputs. postprocessing Converts normalized outputs back into the same units that were used for the original targets. Powell-Beale restarts Method for computing a set of conjugate directions. These directions are used as search directions as part of a conjugate gradient optimization procedure. This procedure also periodically resets the search direction to the negative of the gradient. preprocessing Transformation of the input or target data before it is presented to the neural network. principal component analysis Orthogonalize the components of network input vectors. This procedure can also reduce the dimension of the input vectors by eliminating redundant components. quasi-Newton algorithm Class of optimization algorithm based on Newton’s method. An approximate Hessian matrix is computed at each iteration of the algorithm based on the gradients. Glossary-11 Glossary radial basis networks Neural network that can be designed directly by fitting special response elements where they will do the most good. radial basis transfer function The transfer function for a radial basis neuron is radbas(n) = e− n 2 regularization Modification of the performance function, which is normally chosen to be the sum of squares of the network errors on the training set, by adding some fraction of the squares of the network weights. resilient backpropagation Training algorithm that eliminates the harmful effect of having a small slope at the extreme ends of the sigmoid squashing transfer functions. saturating linear transfer function Function that is linear in the interval (-1,+1) and saturates outside this interval to -1 or +1. (The toolbox function is satlin.) scaled conjugate gradient algorithm Avoids the time-consuming line search of the standard conjugate gradient algorithm. sequential input vectors Set of vectors that are to be presented to a network one after the other. The network weights and biases are adjusted on the presentation of each input vector. sigma parameter Determines the change in weight for the calculation of the approximate Hessian matrix in the scaled conjugate gradient algorithm. sigmoid Monotonic S-shaped function that maps numbers in the interval (-∞,∞) to a finite interval such as (-1,+1) or (0,1). Glossary-12 Glossary simulation Takes the network input p, and the network object net, and returns the network outputs a. spread constant Distance an input vector must be from a neuron’s weight vector to produce an output of 0.5. squashing function Monotonically increasing function that takes input values between -∞ and +∞ and returns values in a finite interval. star learning rule Learning rule that trains a neuron’s weight vector to take on the values of the current input vector. Changes in the weights are proportional to the neuron’s output. sum-squared error Sum of squared differences between the network targets and actual outputs for a given input vector or set of vectors. supervised learning Learning process in which changes in a network’s weights and biases are due to the intervention of any external teacher. The teacher typically provides output targets. symmetric hard-limit transfer function Transfer that maps inputs greater than or equal to 0 to +1, and all other values to -1. symmetric saturating linear transfer function Produces the input as its output as long as the input is in the range -1 to 1. Outside that range the output is -1 and +1, respectively. Glossary-13 Glossary tan-sigmoid transfer function Squashing function of the form shown below that maps the input to the interval (-1,1). (The toolbox function is tansig.) f (n) = 1 1 + e− n tapped delay line Sequential set of delays with outputs available at each delay output. target vector Desired output vector for a given input vector. test vectors Set of input vectors (not used directly in training) that is used to test the trained network. topology functions Ways to arrange the neurons in a grid, box, hexagonal, or random topology. training Procedure whereby a network is adjusted to do a particular job. Commonly viewed as an offline job, as opposed to an adjustment made during each time interval, as is done in adaptive training. training vector Input and/or target vector used to train a network. transfer function Function that maps a neuron’s (or layer’s) net output n to its actual output. tuning phase Period of SOFM training during which weights are expected to spread out relatively evenly over the input space while retaining their topological order found during the ordering phase. Glossary-14 Glossary underdetermined system System that has more variables than constraints. unsupervised learning Learning process in which changes in a network’s weights and biases are not due to the intervention of any external teacher. Commonly changes are a function of the current network input vectors, output vectors, and previous weights and biases. update Make a change in weights and biases. The update can occur after presentation of a single input vector or after accumulating changes over several input vectors. validation vectors Set of input vectors (not used directly in training) that is used to monitor training progress so as to keep the network from overfitting. weight function Weight functions apply weights to an input to get weighted inputs, as specified by a particular function. weight matrix Matrix containing connection strengths from a layer’s inputs to its neurons. The element wi,j of a weight matrix W refers to the connection strength from input j to neuron i. weighted input vector Result of applying a weight to a layer’s input, whether it is a network input or the output of another layer. Widrow-Hoff learning rule Learning rule used to train single-layer linear networks. This rule is the predecessor of the backpropagation rule and is sometimes referred to as the delta rule. Glossary-15 Glossary Glossary-16 Index A F applications aerospace 1-6 automotive 1-6 banking 1-6 defense 1-6 electronics 1-6 entertainment 1-6 financial 1-6 industrial 1-6 insurance 1-7 manufacturing 1-7 medical 1-7 oil and gas exploration 1-7 robotics 1-7 securities 1-7 speech 1-7 telecommunications 1-7 transportation 1-7 fitting functions 1-9 fitting problems defining 1-9 solving with command-line functions 1-23 1-43 solving with nftool 1-11 1-51 functions fitting 1-9 Index C clustering 1-50 clustering problems defining 1-50 solving with command-line functions 1-61 solving with nctool 1-51 command-line functions solving clustering problems 1-61 solving fitting problems 1-9 solving time series problems 1-67 confusion matrix 1-37 1-47 I input-error cross-correlation function 1-76 N nctool solving clustering problems 1-51 neighbor distances plot 1-65 Neural Network Toolbox Clustering Tool See nctool. 1-51 Neural Network Toolbox Fitting Tool. See nftool. 1-11 1-31 Neural Network Toolbox Pattern Recognition Tool. See nprtool. 1-31 Neural Network Toolbox Time Series Tool. See ntstool. 1-68 neural networks definition 1-3 self-organizing feature map 1-53 nftool solving fitting problems 1-11 1-68 nprtool solving pattern recognition problems 1-31 E error autocorrelation plot 1-75 error histogram 1-18 P pattern recognition 1-29 pattern recognition problems defining 1-29 Index-1 Index R SOM topology 1-64 regression plots 1-17 ROC curve 1-38 T S sample hits plot 1-56 self-organizing feature map (SOFM) networks 1-53 neighbor distances plot 1-65 sample hits plot 1-56 SOM topology 1-64 weight planes plot 1-57 Index-2 time series 1-67 time series problems defining 1-67 solving with command-line functions 1-83 solving with ntstool 1-68 W weight planes plot 1-57

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement