Neural Network Toolbox
Neural Network Toolbox™
Getting Started Guide
R2013a
Mark Hudson Beale
Martin T. Hagan
Howard B. Demuth
How to Contact MathWorks
Web
Newsgroup
www.mathworks.com/contact_TS.html Technical Support
www.mathworks.com
comp.soft-sys.matlab
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
Product enhancement suggestions
Bug reports
Documentation error reports
Order status, license renewals, passcodes
Sales, pricing, and general information
508-647-7000 (Phone)
508-647-7001 (Fax)
The MathWorks, Inc.
3 Apple Hill Drive
Natick, MA 01760-2098
For contact information about worldwide offices, see the MathWorks Web site.
Neural Network Toolbox™ Getting Started Guide
© COPYRIGHT 1992–2013 by The MathWorks, Inc.
The software described in this document is furnished under a license agreement. The software may be used
or copied only under the terms of the license agreement. No part of this manual may be photocopied or
reproduced in any form without prior written consent from The MathWorks, Inc.
FEDERAL ACQUISITION: This provision applies to all acquisitions of the Program and Documentation
by, for, or through the federal government of the United States. By accepting delivery of the Program
or Documentation, the government hereby agrees that this software or documentation qualifies as
commercial computer software or commercial computer software documentation as such terms are used
or defined in FAR 12.212, DFARS Part 227.72, and DFARS 252.227-7014. Accordingly, the terms and
conditions of this Agreement and only those rights specified in this Agreement, shall pertain to and govern
the use, modification, reproduction, release, performance, display, and disclosure of the Program and
Documentation by the federal government (or other entity acquiring for or through the federal government)
and shall supersede any conflicting contractual terms or conditions. If this License fails to meet the
government’s needs or is inconsistent in any respect with federal procurement law, the government agrees
to return the Program and Documentation, unused, to The MathWorks, Inc.
Trademarks
MATLAB and Simulink are registered trademarks of The MathWorks, Inc. See
www.mathworks.com/trademarks for a list of additional trademarks. Other product or brand
names may be trademarks or registered trademarks of their respective holders.
Patents
MathWorks products are protected by one or more U.S. patents. Please see
www.mathworks.com/patents for more information.
Revision History
June 1992
April 1993
January 1997
July 1997
January 1998
September 2000
June 2001
July 2002
January 2003
June 2004
October 2004
October 2004
March 2005
March 2006
September 2006
March 2007
September 2007
March 2008
October 2008
March 2009
September 2009
March 2010
September 2010
April 2011
September 2011
March 2012
September 2012
March 2013
First printing
Second printing
Third printing
Fourth printing
Fifth printing
Sixth printing
Seventh printing
Online only
Online only
Online only
Online only
Eighth printing
Online only
Online only
Ninth printing
Online only
Online only
Online only
Online only
Online only
Online only
Online only
Tenth printing
Online only
Online only
Online only
Online only
Online only
Revised for Version 3 (Release 11)
Revised for Version 4 (Release 12)
Minor revisions (Release 12.1)
Minor revisions (Release 13)
Minor revisions (Release 13SP1)
Revised for Version 4.0.3 (Release 14)
Revised for Version 4.0.4 (Release 14SP1)
Revised for Version 4.0.4
Revised for Version 4.0.5 (Release 14SP2)
Revised for Version 5.0 (Release 2006a)
Minor revisions (Release 2006b)
Minor revisions (Release 2007a)
Revised for Version 5.1 (Release 2007b)
Revised for Version 6.0 (Release 2008a)
Revised for Version 6.0.1 (Release 2008b)
Revised for Version 6.0.2 (Release 2009a)
Revised for Version 6.0.3 (Release 2009b)
Revised for Version 6.0.4 (Release 2010a)
Revised for Version 7.0 (Release 2010b)
Revised for Version 7.0.1 (Release 2011a)
Revised for Version 7.0.2 (Release 2011b)
Revised for Version 7.0.3 (Release 2012a)
Revised for Version 8.0 (Release 2012b)
Revised for Version 8.0.1 (Release 2013a)
_
Acknowledgments
v
Acknowledgments
Acknowledgments
The authors would like to thank the following people:
Joe Hicklin of MathWorks for getting Howard into neural network research
years ago at the University of Idaho, for encouraging Howard and Mark to
write the toolbox, for providing crucial help in getting the first toolbox Version
1.0 out the door, for continuing to help with the toolbox in many ways, and
for being such a good friend.
Roy Lurie of MathWorks for his continued enthusiasm for the possibilities
for Neural Network Toolbox™ software.
Mary Ann Freeman of MathWorks for general support and for her
leadership of a great team of people we enjoy working with.
Rakesh Kumar of MathWorks for cheerfully providing technical and
practical help, encouragement, ideas and always going the extra mile for us.
Alan LaFleur of MathWorks for facilitating our documentation work.
Stephen Vanreusel of MathWorks for help with testing.
Dan Doherty of MathWorks for marketing support and ideas.
Orlando De Jesús of Oklahoma State University for his excellent work in
developing and programming the dynamic training algorithms described in
“Time Series and Dynamic Systems” and in programming the neural network
controllers described in “Neural Network Control Systems” in the Neural
Network Toolbox User’s Guide.
Martin T. Hagan, Howard B. Demuth, and Mark Hudson Beale for
permission to include various problems, examples, and other material from
Neural Network Design, January, 1996.
vi
Contents
Acknowledgments
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
vi
Getting Started
1
Product Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Key Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1-2
1-2
Neural Networks Overview . . . . . . . . . . . . . . . . . . . . . . . . .
1-3
Using the Toolbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Automatic Script Generation . . . . . . . . . . . . . . . . . . . . . . . .
1-4
1-5
Neural Network Toolbox Applications . . . . . . . . . . . . . . .
1-6
Neural Network Design Steps . . . . . . . . . . . . . . . . . . . . . . .
1-8
Fitting a Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Defining a Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Using the Neural Network Fitting Tool . . . . . . . . . . . . . . . .
Using Command-Line Functions . . . . . . . . . . . . . . . . . . . . .
1-9
1-9
1-11
1-21
Recognizing Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Defining a Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Using the Neural Network Pattern Recognition Tool . . . . .
Using Command-Line Functions . . . . . . . . . . . . . . . . . . . . .
1-29
1-29
1-31
1-41
Clustering Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Defining a Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1-50
1-50
vii
Using the Neural Network Clustering Tool . . . . . . . . . . . . .
Using Command-Line Functions . . . . . . . . . . . . . . . . . . . . .
1-51
1-60
Time Series Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Defining a Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Using the Neural Network Time Series Tool . . . . . . . . . . .
Using Command-Line Functions . . . . . . . . . . . . . . . . . . . . .
1-67
1-67
1-68
1-80
Parallel Computing on CPUs and GPUs . . . . . . . . . . . . . .
Parallel Computing Toolbox . . . . . . . . . . . . . . . . . . . . . . . . .
Parallel CPU Workers . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
GPU Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Multiple GPU/CPU Computing . . . . . . . . . . . . . . . . . . . . . .
Cluster Computing with MATLAB Distributed Computing
Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Load Balancing, Large Problems, and Beyond . . . . . . . . . .
1-90
1-90
1-90
1-91
1-91
Sample Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1-94
1-92
1-92
Glossary
Index
viii
Contents
1
Getting Started
• “Product Description” on page 1-2
• “Neural Networks Overview” on page 1-3
• “Using the Toolbox” on page 1-4
• “Neural Network Toolbox Applications” on page 1-6
• “Neural Network Design Steps” on page 1-8
• “Fitting a Function” on page 1-9
• “Recognizing Patterns” on page 1-29
• “Clustering Data” on page 1-50
• “Time Series Prediction” on page 1-67
• “Parallel Computing on CPUs and GPUs” on page 1-90
• “Sample Data Sets” on page 1-94
1
Getting Started
Product Description
Create, train, and simulate neural networks
Neural Network Toolbox™ provides functions and apps for modeling complex
nonlinear systems that are not easily modeled with a closed-form equation.
Neural Network Toolbox supports supervised learning with feedforward,
radial basis, and dynamic networks. It also supports unsupervised learning
with self-organizing maps and competitive layers. With the toolbox you can
design, train, visualize, and simulate neural networks. You can use Neural
Network Toolbox for applications such as data fitting, pattern recognition,
clustering, time-series prediction, and dynamic system modeling and control.
To speed up training and handle large data sets, you can distribute
computations and data across multicore processors, GPUs, and computer
clusters using Parallel Computing Toolbox™.
Key Features
• Supervised networks, including multilayer, radial basis, learning vector
quantization (LVQ), time-delay, nonlinear autoregressive (NARX), and
layer-recurrent
• Unsupervised networks, including self-organizing maps and competitive
layers
• Apps for data-fitting, pattern recognition, and clustering
• Parallel computing and GPU support for accelerating training (using
Parallel Computing Toolbox)
• Preprocessing and postprocessing for improving the efficiency of network
training and assessing network performance
• Modular network representation for managing and visualizing networks
of arbitrary size
• Simulink® blocks for building and evaluating neural networks and for
control systems applications
1-2
Neural Networks Overview
Neural Networks Overview
Neural networks are composed of simple elements operating in parallel.
These elements are inspired by biological nervous systems. As in nature, the
connections between elements largely determine the network function. You
can train a neural network to perform a particular function by adjusting the
values of the connections (weights) between elements.
Typically, neural networks are adjusted, or trained, so that a particular input
leads to a specific target output. The next figure illustrates such a situation.
There, the network is adjusted, based on a comparison of the output and the
target, until the network output matches the target. Typically, many such
input/target pairs are needed to train a network.
Neural networks have been trained to perform complex functions in various
fields, including pattern recognition, identification, classification, speech,
vision, and control systems.
Neural networks can also be trained to solve problems that are difficult for
conventional computers or human beings. The toolbox emphasizes the use
of neural network paradigms that build up to—or are themselves used in—
engineering, financial, and other practical applications.
This topic explains how to use four graphical tools for training neural
networks to solve problems in function fitting, pattern recognition, clustering,
and time series. Using these four tools will give you an excellent introduction
to the use of the Neural Network Toolbox software.
1-3
1
Getting Started
Using the Toolbox
There are four ways you can use the Neural Network Toolbox software.
The first way is through the four graphical user interfaces (GUIs). You can
open these GUIs from a master GUI, which you can open with the command
nnstart. These provide a quick and easy way to access the power of the
toolbox for the following tasks:
• Function fitting
• Pattern recognition
• Data clustering
• Time series analysis
The second way to use the toolbox is through basic command-line operations.
The command-line operations offer more flexibility than the GUIs, but with
some added complexity. If this is your first experience with the toolbox, the
GUIs provide the best introduction. In addition, the GUIs can generate
scripts of documented MATLAB® code to provide you with templates for
creating your own customized command-line functions. The process of using
the GUIs first, and then generating and modifying MATLAB scripts, is an
excellent way to learn about the functionality of the toolbox.
The third way to use the toolbox is through customization. This advanced
capability allows you to create your own custom neural networks, while still
having access to the full functionality of the toolbox. You can create networks
with arbitrary connections, and you will still be able to train them using
existing toolbox training functions (as long as the network components are
differentiable).
The fourth way to use the toolbox is through the ability to modify any of the
functions contained in the toolbox. Every computational component is written
in MATLAB code and is fully accessible.
These four levels of toolbox usage span the novice to the expert - simple
wizards guide the new user through specific applications, and network
customization allows researchers to try novel architectures with minimal
1-4
Using the Toolbox
effort. Whatever your level of neural network and MATLAB knowledge, there
are toolbox features to suit your needs.
Automatic Script Generation
The GUIs described in this topic form an important part of the documentation
for the Neural Network Toolbox software. The GUIs guide you through the
process of designing neural networks to solve problems in four important
application areas, without requiring any background in neural networks or
sophistication in using MATLAB. In addition, the GUIs can automatically
generate both simple and advanced MATLAB scripts that can reproduce the
steps performed by the GUI, but with the option to override default settings.
These scripts can provide you with a template for creating customized
code, and they can aid you in becoming familiar with the command-line
functionality of the toolbox. It is highly recommended that you use the
automatic script generation facility of the GUIs.
1-5
1
Getting Started
Neural Network Toolbox Applications
It would be impossible to cover the total range of applications for which neural
networks have provided outstanding solutions. The remaining sections of
this topic describe only a few of the applications in function fitting, pattern
recognition, clustering, and time series analysis. The following table provides
an idea of the diversity of applications for which neural networks provide
state-of-the-art solutions.
1-6
Industry
Business Applications
Aerospace
High-performance aircraft autopilot, flight path
simulation, aircraft control systems, autopilot
enhancements, aircraft component simulation, and
aircraft component fault detection
Automotive
Automobile automatic guidance system, and
warranty activity analysis
Banking
Check and other document reading and credit
application evaluation
Defense
Weapon steering, target tracking, object
discrimination, facial recognition, new kinds of
sensors, sonar, radar and image signal processing
including data compression, feature extraction and
noise suppression, and signal/image identification
Electronics
Code sequence prediction, integrated circuit chip
layout, process control, chip failure analysis,
machine vision, voice synthesis, and nonlinear
modeling
Entertainment
Animation, special effects, and market forecasting
Financial
Real estate appraisal, loan advising, mortgage
screening, corporate bond rating, credit-line use
analysis, credit card activity tracking, portfolio
trading program, corporate financial analysis, and
currency price prediction
Industrial
Prediction of industrial processes, such as the
output gases of furnaces, replacing complex and
costly equipment used for this purpose in the past
Neural Network Toolbox™ Applications
Industry
Business Applications
Insurance
Policy application evaluation and product
optimization
Manufacturing
Manufacturing process control, product design
and analysis, process and machine diagnosis,
real-time particle identification, visual quality
inspection systems, beer testing, welding quality
analysis, paper quality prediction, computer-chip
quality analysis, analysis of grinding operations,
chemical product design analysis, machine
maintenance analysis, project bidding, planning and
management, and dynamic modeling of chemical
process system
Medical
Breast cancer cell analysis, EEG and ECG analysis,
prosthesis design, optimization of transplant
times, hospital expense reduction, hospital quality
improvement, and emergency-room test advisement
Oil and gas
Exploration
Robotics
Trajectory control, forklift robot, manipulator
controllers, and vision systems
Securities
Market analysis, automatic bond rating, and stock
trading advisory systems
Speech
Speech recognition, speech compression, vowel
classification, and text-to-speech synthesis
Telecommunications
Image and data compression, automated information
services, real-time translation of spoken language,
and customer payment processing systems
Transportation
Truck brake diagnosis systems, vehicle scheduling,
and routing systems
1-7
1
Getting Started
Neural Network Design Steps
In the remaining sections of this topic, you will follow the standard steps
for designing neural networks to solve problems in four application areas:
function fitting, pattern recognition, clustering, and time series analysis.
The work flow for any of these problems has seven primary steps. (Data
collection in step 1, while important, generally occurs outside the MATLAB
environment.)
1 Collect data
2 Create the network
3 Configure the network
4 Initialize the weights and biases
5 Train the network
6 Validate the network
7 Use the network
You will follow these steps using both the GUI tools and command-line
operations in the next four sections.
1-8
Fitting a Function
Fitting a Function
Neural networks are good at fitting functions. In fact, there is proof that a
fairly simple neural network can fit any practical function.
Suppose, for instance, that you have data from a housing application. You
want to design a network that can predict the value of a house (in $1000s),
given 13 pieces of geographical and real estate information. You have a total
of 506 example homes for which you have those 13 items of data and their
associated market values.
You can solve this problem in two ways:
• Use a graphical user interface, nftool, as described in “Using the Neural
Network Fitting Tool” on page 1-11.
• Use command-line functions, as described in “Using Command-Line
Functions” on page 1-21.
It is generally best to start with the GUI, and then to use the GUI to
automatically generate command-line scripts. Before using either method,
first define the problem by selecting a data set. Each GUI has access to
many sample data sets that you can use to experiment with the toolbox
(see “Sample Data Sets” on page 1-94). If you have a specific problem that
you want to solve, you can load your own data into the workspace. The next
section describes the data format.
Defining a Problem
To define a fitting problem for the toolbox, arrange a set of Q input vectors
as columns in a matrix. Then, arrange another set of Q target vectors (the
correct output vectors for each of the input vectors) into a second matrix (see
“Data Structures” for a detailed description of data formatting for static
and time series data). For example, you can define the fitting problem
for a Boolean AND gate with four sets of two-element input vectors and
one-element targets as follows:
inputs = [0 1 0 1; 0 0 1 1];
targets = [0 0 0 1];
1-9
1
Getting Started
The next section shows how to train a network to fit a data set, using the
neural network fitting tool GUI, nftool. This example uses the housing data
set provided with the toolbox.
1-10
Fitting a Function
Using the Neural Network Fitting Tool
1 Open the Neural Network Start GUI with this command:
nnstart
2 Click Fitting Tool to open the Neural Network Fitting Tool. (You can also
use the command nftool.)
1-11
1
Getting Started
3 Click Next to proceed.
4 Click Load Example Data Set in the Select Data window. The Fitting
Data Set Chooser window opens.
Note Use the Inputs and Targets options in the Select Data window
when you need to load data from the MATLAB workspace.
1-12
Fitting a Function
5 Select House Pricing, and click Import. This returns you to the Select
Data window.
6 Click Next to display the Validation and Test Data window, shown in the
following figure.
The validation and test data sets are each set to 15% of the original data.
1-13
1
Getting Started
With these settings, the input vectors and target vectors will be randomly
divided into three sets as follows:
• 70% will be used for training.
• 15% will be used to validate that the network is generalizing and to stop
training before overfitting.
• The last 15% will be used as a completely independent test of network
generalization.
(See “Dividing the Data” for more discussion of the data division process.)
7 Click Next.
The standard network that is used for function fitting is a two-layer
feedforward network, with a sigmoid transfer function in the hidden layer
and a linear transfer function in the output layer. The default number of
hidden neurons is set to 10. You might want to increase this number later,
if the network training performance is poor.
1-14
Fitting a Function
8 Click Next.
1-15
1
Getting Started
9 Click Train.
The training continued until the validation error failed to decrease for six
iterations (validation stop).
1-16
Fitting a Function
10 Under Plots, click Regression. This is used to validate the network
performance.
The following regression plots display the network outputs with respect
to targets for training, validation, and test sets. For a perfect fit, the data
should fall along a 45 degree line, where the network outputs are equal
to the targets. For this problem, the fit is reasonably good for all data
sets, with R values in each case of 0.93 or above. If even more accurate
results were required, you could retrain the network by clicking Retrain
in nftool. This will change the initial weights and biases of the network,
1-17
1
Getting Started
and may produce an improved network after retraining. Other options
are provided on the following pane.
11 View the error histogram to obtain additional verification of network
performance. Under the Plots pane, click Error Histogram.
1-18
Fitting a Function
The blue bars represent training data, the green bars represent validation
data, and the red bars represent testing data. The histogram can give you
an indication of outliers, which are data points where the fit is significantly
worse than the majority of data. In this case, you can see that while most
errors fall between -5 and 5, there is a training point with an error of 17 and
validation points with errors of 12 and 13. These outliers are also visible on
the testing regression plot. The first corresponds to the point with a target
of 50 and output near 33. It is a good idea to check the outliers to determine
if the data is bad, or if those data points are different than the rest of the
data set. If the outliers are valid data points, but are unlike the rest of the
data, then the network is extrapolating for these points. You should collect
more data that looks like the outlier points, and retrain the network.
12 Click Next in the Neural Network Fitting Tool to evaluate the network.
1-19
1
Getting Started
At this point, you can test the network against new data.
If you are dissatisfied with the network’s performance on the original or
new data, you can do one of the following:
• Train it again.
• Increase the number of neurons.
• Get a larger training data set.
If the performance on the training set is good, but the test set performance
is significantly worse, which could indicate overfitting, then reducing the
number of neurons can improve your results. If training performance is
poor, then you may want to increase the number of neurons.
13 If you are satisfied with the network performance, click Next.
14 Use the buttons on this screen to generate scripts or to save your results.
1-20
Fitting a Function
• You can click Simple Script or Advanced Script to create MATLAB
code that can be used to reproduce all of the previous steps from the
command line. Creating MATLAB code can be helpful if you want
to learn how to use the command-line functionality of the toolbox to
customize the training process. In “Using Command-Line Functions” on
page 1-21, you will investigate the generated scripts in more detail.
• You can also have the network saved as net in the workspace. You can
perform additional tests on it or put it to work on new inputs.
15 When you have created the MATLAB code and saved your results, click
Finish.
Using Command-Line Functions
The easiest way to learn how to use the command-line functionality of
the toolbox is to generate scripts from the GUIs, and then modify them to
customize the network training. As an example, look at the simple script that
was created at step 14 of the previous section.
% Solve an Input-Output Fitting problem with a Neural Network
1-21
1
Getting Started
% Script generated by NFTOOL
%
% This script assumes these variables are defined:
%
%
houseInputs - input data.
%
houseTargets - target data.
inputs = houseInputs;
targets = houseTargets;
% Create a Fitting Network
hiddenLayerSize = 10;
net = fitnet(hiddenLayerSize);
% Set up Division of Data for Training, Validation, Testing
net.divideParam.trainRatio = 70/100;
net.divideParam.valRatio = 15/100;
net.divideParam.testRatio = 15/100;
% Train the Network
[net,tr] = train(net,inputs,targets);
% Test the Network
outputs = net(inputs);
errors = gsubtract(outputs,targets);
performance = perform(net,targets,outputs)
% View the Network
view(net)
% Plots
% Uncomment these lines to enable various plots.
%figure, plotperform(tr)
%figure, plottrainstate(tr)
%figure, plotfit(targets,outputs)
%figure, plotregression(targets,outputs)
%figure, ploterrhist(errors)
1-22
Fitting a Function
You can save the script, and then run it from the command line to reproduce
the results of the previous GUI session. You can also edit the script to
customize the training process. In this case, follow each step in the script.
1 The script assumes that the input vectors and target vectors are already
loaded into the workspace. If the data are not loaded, you can load them as
follows:
load house_dataset
inputs = houseInputs;
targets = houseTargets;
This data set is one of the sample data sets that is part of the toolbox (see
“Sample Data Sets” on page 1-94). You can see a list of all available data
sets by entering the command help nndatasets. The load command also
allows you to load the variables from any of these data sets using your own
variable names. For example, the command
[inputs,targets] = house_dataset;
will load the housing inputs into the array inputs and the housing targets
into the array targets.
2 Create a network. The default network for function fitting (or regression)
problems, fitnet, is a feedforward network with the default tan-sigmoid
transfer function in the hidden layer and linear transfer function in the
output layer. You assigned ten neurons (somewhat arbitrary) to the one
hidden layer in the previous section. The network has one output neuron,
because there is only one target value associated with each input vector.
hiddenLayerSize = 10;
net = fitnet(hiddenLayerSize);
Note More neurons require more computation, and they have a tendency
to overfit the data when the number is set too high, but they allow the
network to solve more complicated problems. More layers require more
computation, but their use might result in the network solving complex
problems more efficiently. To use more than one hidden layer, enter the
hidden layer sizes as elements of an array in the fitnet command.
1-23
1
Getting Started
3 Set up the division of data.
net.divideParam.trainRatio = 70/100;
net.divideParam.valRatio = 15/100;
net.divideParam.testRatio = 15/100;
With these settings, the input vectors and target vectors will be randomly
divided, with 70% used for training, 15% for validation and 15% for testing.
(See “Dividing the Data” for more discussion of the data division process.)
4 Train the network. The network uses the default Levenberg-Marquardt
algorithm for training (trainlm). To train the network, enter:
[net,tr] = train(net,inputs,targets);
During training, the following training window opens. This window
displays training progress and allows you to interrupt training at any point
by clicking Stop Training.
1-24
Fitting a Function
This training stopped when the validation error increased for six iterations,
which occurred at iteration 23. If you click Performance in the training
window, a plot of the training errors, validation errors, and test errors
appears, as shown in the following figure. In this example, the result is
reasonable because of the following considerations:
• The final mean-square error is small.
• The test set error and the validation set error have similar
characteristics.
• No significant overfitting has occurred by iteration 17 (where the best
validation performance occurs).
1-25
1
Getting Started
5 Test the network. After the network has been trained, you can use it to
compute the network outputs. The following code calculates the network
outputs, errors and overall performance.
outputs = net(inputs);
errors = gsubtract(targets,outputs);
performance = perform(net,targets,outputs)
It is also possible to calculate the network performance only on the test
set, by using the testing indices, which are located in the training record.
(See “Post-Training Analysis (Network Validation)” for a full description of
the training record.)
tInd = tr.testInd;
tstOutputs = net(inputs(tInd));
tstPerform = perform(net,targets(tInd),tstOutputs)
1-26
Fitting a Function
6 Perform some analysis of the network response. If you click Regression
in the training window, you can perform a linear regression between the
network outputs and the corresponding targets.
The following figure shows the results.
The output tracks the targets very well for training, testing, and validation,
and the R-value is over 0.95 for the total response. If even more accurate
results were required, you could try any of these approaches:
• Reset the initial network weights and biases to new values with init
and train again (see “Initializing Weights” (init)).
1-27
1
Getting Started
• Increase the number of hidden neurons.
• Increase the number of training vectors.
• Increase the number of input values, if more relevant information is
available.
• Try a different training algorithm (see “Training Algorithms”).
In this case, the network response is satisfactory, and you can now put
the network to use on new inputs.
7 View the network diagram.
view(net)
This creates the following diagram of the network.
To get more experience in command-line operations, try some of these tasks:
• During training, open a plot window (such as the regression plot), and
watch it animate.
• Plot from the command line with functions such as plotfit,
plotregression, plottrainstate and plotperform. (For more
information on using these functions, see their reference pages.)
Also, see the advanced script for more options, when training from the
command line.
1-28
Recognizing Patterns
Recognizing Patterns
In addition to function fitting, neural networks are also good at recognizing
patterns.
For example, suppose you want to classify a tumor as benign or malignant,
based on uniformity of cell size, clump thickness, mitosis, etc. You have 699
example cases for which you have 9 items of data and the correct classification
as benign or malignant.
As with function fitting, there are two ways to solve this problem:
• Use the nprtool GUI, as described in “Using the Neural Network Pattern
Recognition Tool” on page 1-31.
• Use a command-line solution, as described in “Using Command-Line
Functions” on page 1-60.
It is generally best to start with the GUI, and then to use the GUI to
automatically generate command-line scripts. Before using either method,
the first step is to define the problem by selecting a data set. The next section
describes the data format.
Defining a Problem
To define a pattern recognition problem, arrange a set of Q input vectors as
columns in a matrix. Then arrange another set of Q target vectors so that
they indicate the classes to which the input vectors are assigned (see “Data
Structures” for a detailed description of data formatting for static and time
series data). There are two approaches to creating the target vectors.
One approach can be used when there are only two classes; you set each
scalar target value to either 1 or 0, indicating which class the corresponding
input belongs to. For instance, you can define the exclusive-or classification
problem as follows:
inputs = [0 1 0 1; 0 0 1 1];
targets = [0 1 1 0];
1-29
1
Getting Started
Alternately, target vectors can have N elements, where for each target vector,
one element is 1 and the others are 0. This defines a problem where inputs
are to be classified into N different classes. For example, the following lines
show how to define a classification problem that divides the corners of a
5-by-5-by-5 cube into three classes:
• The origin (the first input vector) in one class
• The corner farthest from the origin (the last input vector) in a second class
• All other points in a third class
inputs = [0 0 0 0 5 5 5 5; 0 0 5 5 0 0 5 5; 0 5 0 5 0 5 0 5];
targets = [1 0 0 0 0 0 0 0; 0 1 1 1 1 1 1 0; 0 0 0 0 0 0 0 1];
Classification problems involving only two classes can be represented using
either format. The targets can consist of either scalar 1/0 elements or
two-element vectors, with one element being 1 and the other element being 0.
The next section shows how to train a network to recognize patterns, using
the neural network pattern recognition tool GUI, nprtool. This example uses
the cancer data set provided with the toolbox. This data set consists of 699
nine-element input vectors and two-element target vectors. There are two
elements in each target vector, because there are two categories (benign or
malignant) associated with each input vector.
1-30
Recognizing Patterns
Using the Neural Network Pattern Recognition Tool
1 If needed, open the Neural Network Start GUI with this command:
nnstart
2 Click Pattern Recognition Tool to open the Neural Network Pattern
Recognition Tool. (You can also use the command nprtool.)
1-31
1
Getting Started
3 Click Next to proceed. The Select Data window opens.
4 Click Load Example Data Set. The Pattern Recognition Data Set
Chooser window opens.
1-32
Recognizing Patterns
5 Select Breast Cancer and click Import. You return to the Select Data
window.
6 Click Next to continue to the Validation and Test Data window.
1-33
1
Getting Started
Validation and test data sets are each set to 15% of the original data.
With these settings, the input vectors and target vectors will be randomly
divided into three sets as follows:
• 70% are used for training.
• 15% are used to validate that the network is generalizing and to stop
training before overfitting.
• The last 15% are used as a completely independent test of network
generalization.
(See “Dividing the Data” for more discussion of the data division process.)
7 Click Next.
The standard network that is used for pattern recognition is a two-layer
feedforward network, with sigmoid transfer functions in both the hidden
layer and the output layer. The default number of hidden neurons is set to
10. You might want to come back and increase this number if the network
does not perform as well as you expect. The number of output neurons
1-34
Recognizing Patterns
is set to 2, which is equal to the number of elements in the target vector
(the number of categories).
8 Click Next.
1-35
1
Getting Started
9 Click Train.
1-36
Recognizing Patterns
The training continues for 55 iterations.
10 Under the Plots pane, click Confusion in the Neural Network Pattern
Recognition Tool.
The next figure shows the confusion matrices for training, testing, and
validation, and the three kinds of data combined. The network outputs are
very accurate, as you can see by the high numbers of correct responses in
the green squares and the low numbers of incorrect responses in the red
squares. The lower right blue squares illustrate the overall accuracies.
1-37
1
Getting Started
11 Plot the Receiver Operating Characteristic (ROC) curve. Under the Plots
pane, click Receiver Operating Characteristic in the Neural Network
Pattern Recognition Tool.
1-38
Recognizing Patterns
The colored lines in each axis represent the ROC curves. The ROC curve is
a plot of the true positive rate (sensitivity) versus the false positive rate (1 specificity) as the threshold is varied. A perfect test would show points in
the upper-left corner, with 100% sensitivity and 100% specificity. For this
problem, the network performs very well.
12 In the Neural Network Pattern Recognition Tool, click Next to evaluate
the network.
1-39
1
Getting Started
At this point, you can test the network against new data.
If you are dissatisfied with the network’s performance on the original
or new data, you can train it again, increase the number of neurons, or
perhaps get a larger training data set. If the performance on the training
set is good, but the test set performance is significantly worse, which could
indicate overfitting, then reducing the number of neurons can improve
your results.
13 When you are satisfied with the network performance, click Next.
1-40
Recognizing Patterns
14 Use the buttons on this screen to save your results.
• You can click Simple Script or Advanced Script to create MATLAB
code that can be used to reproduce all of the previous steps from the
command line. Creating MATLAB code can be helpful if you want
to learn how to use the command-line functionality of the toolbox to
customize the training process. In “Using Command-Line Functions” on
page 1-41, you will investigate the generated scripts in more detail.
• You can also save the network as net in the workspace. You can perform
additional tests on it or put it to work on new inputs.
15 When you have saved your results, click Finish.
Using Command-Line Functions
The easiest way to learn how to use the command-line functionality of
the toolbox is to generate scripts from the GUIs, and then modify them to
customize the network training. As an example, let’s look at the simple script
that was created at step 14 of the previous section.
1-41
1
Getting Started
% Solve a Pattern Recognition Problem with a Neural Network
% Script generated by NPRTOOL
%
% This script assumes these variables are defined:
%
%
cancerInputs - input data.
%
cancerTargets - target data.
inputs = cancerInputs;
inputs = cancerTargets;
% Create a Pattern Recognition Network
hiddenLayerSize = 10;
net = patternnet(hiddenLayerSize);
% Set up Division of Data for Training, Validation, Testing
net.divideParam.trainRatio = 70/100;
net.divideParam.valRatio = 15/100;
net.divideParam.testRatio = 15/100;
% Train the Network
[net,tr] = train(net,inputs,targets);
% Test the Network
outputs = net(inputs);
errors = gsubtract(targets,outputs);
performance = perform(net,targets,outputs)
% View the Network
view(net)
% Plots
% Uncomment these lines to enable various plots.
%figure, plotperform(tr)
%figure, plottrainstate(tr)
%figure, plotconfusion(targets,outputs)
%figure, ploterrhist(errors)
1-42
Recognizing Patterns
You can save the script, and then run it from the command line to reproduce
the results of the previous GUI session. You can also edit the script to
customize the training process. In this case, follow each step in the script.
1 The script assumes that the input vectors and target vectors are already
loaded into the workspace. If the data are not loaded, you can load them as
follows:
load cancer_dataset
inputs = cancerInputs;
targets = cancerTargets;
2 Create the network. The default network for function fitting (or regression)
problems, patternnet, is a feedforward network with the default
tan-sigmoid transfer functions in both the hidden and output layers. You
assigned ten neurons (somewhat arbitrary) to the one hidden layer in the
previous section.
• The network has two output neurons, because there are two target
values (categories) associated with each input vector.
• Each output neuron represents a category.
• When an input vector of the appropriate category is applied to the
network, the corresponding neuron should produce a 1, and the other
neurons should output a 0.
To create the network, enter these commands:
hiddenLayerSize = 10;
net = patternnet(hiddenLayerSize);
Note The choice of network architecture for pattern recognition problems
follows similar guidelines to function fitting problems. More neurons
require more computation, and they have a tendency to overfit the data
when the number is set too high, but they allow the network to solve more
complicated problems. More layers require more computation, but their use
might result in the network solving complex problems more efficiently. To
use more than one hidden layer, enter the hidden layer sizes as elements of
an array in the patternnet command.
1-43
1
Getting Started
3 Set up the division of data.
net.divideParam.trainRatio = 70/100;
net.divideParam.valRatio = 15/100;
net.divideParam.testRatio = 15/100;
With these settings, the input vectors and target vectors will be randomly
divided, with 70% used for training, 15% for validation and 15% for testing.
(See “Dividing the Data” for more discussion of the data division process.)
4 Train the network. The pattern recognition network uses the default
Scaled Conjugate Gradient (trainscg) algorithm for training. To train
the network, enter this command:
[net,tr] = train(net,inputs,targets);
During training, as in function fitting, the training window opens. This
window displays training progress. To interrupt training at any point,
click Stop Training.
1-44
Recognizing Patterns
This training stopped when the validation error increased for six iterations,
which occurred at iteration 24.
5 Test the network. After the network has been trained, you can use it to
compute the network outputs. The following code calculates the network
outputs, errors and overall performance.
outputs = net(inputs);
errors = gsubtract(targets,outputs);
performance = perform(net,targets,outputs)
It is also possible to calculate the network performance only on the test set,
by using the testing indices, which are located in the training record.
1-45
1
Getting Started
tInd = tr.testInd;
tstOutputs = net(inputs(tInd));
tstPerform = perform(net,targets(tInd),tstOutputs)
6 View the network diagram.
view(net)
This creates the following diagram of the network.
7 Plot the training, validation, and test performance.
figure, plotperform(tr)
1-46
Recognizing Patterns
8 Use the plotconfusion function to plot the confusion matrix. It shows the
various types of errors that occurred for the final trained network.
figure, plotconfusion(targets,outputs)
The next figure shows the results.
1-47
1
Getting Started
The diagonal cells show the number of cases that were correctly classified,
and the off-diagonal cells show the misclassified cases. The blue cell in the
bottom right shows the total percent of correctly classified cases (in green)
and the total percent of misclassified cases (in red). The results show very
good recognition. If you needed even more accurate results, you could try any
of the following approaches:
• Reset the initial network weights and biases to new values with init and
train again.
• Increase the number of hidden neurons.
• Increase the number of training vectors.
• Increase the number of input values, if more relevant information is
available.
• Try a different training algorithm (see “Training Algorithms”).
1-48
Recognizing Patterns
In this case, the network response is satisfactory, and you can now put the
network to use on new inputs.
To get more experience in command-line operations, here are some tasks
you can try:
• During training, open a plot window (such as the confusion plot), and
watch it animate.
• Plot from the command line with functions such as plotroc and
plottrainstate.
Also, see the advanced script for more options, when training from the
command line.
1-49
1
Getting Started
Clustering Data
Clustering data is another excellent application for neural networks. This
process involves grouping data by similarity. For example, you might perform:
• Market segmentation by grouping people according to their buying patterns
• Data mining by partitioning data into related subsets
• Bioinformatic analysis by grouping genes with related expression patterns
Suppose that you want to cluster flower types according to petal length, petal
width, sepal length, and sepal width. You have 150 example cases for which
you have these four measurements.
As with function fitting and pattern recognition, there are two ways to solve
this problem:
• Use the nctool GUI, as described in “Using the Neural Network Clustering
Tool” on page 1-51.
• Use a command-line solution, as described in “Using Command-Line
Functions” on page 1-60.
Defining a Problem
To define a clustering problem, simply arrange Q input vectors to be clustered
as columns in an input matrix (see “Data Structures” for a detailed description
of data formatting for static and time series data). For instance, you might
want to cluster this set of 10 two-element vectors:
inputs = [7 0 6 2 6 5 6 1 0 1; 6 2 5 0 7 5 5 1 2 2]
The next section shows how to train a network using the nctool GUI.
1-50
Clustering Data
Using the Neural Network Clustering Tool
1 If needed, open the Neural Network Start GUI with this command:
nnstart
2 Click Clustering Tool to open the Neural Network Clustering Tool. (You
can also use the command nctool.)
1-51
1
Getting Started
3 Click Next. The Select Data window appears.
4 Click Load Example Data Set. The Clustering Data Set Chooser window
appears.
1-52
Clustering Data
5 In this window, select Simple Clusters, and click Import. You return
to the Select Data window.
6 Click Next to continue to the Network Size window, shown in the following
figure.
For clustering problems, the self-organizing feature map (SOM) is the most
commonly used network, because after the network has been trained, there
are many visualization tools that can be used to analyze the resulting
clusters. This network has one layer, with neurons organized in a grid. (For
more information on the SOM, see “Self-Organizing Feature Maps”.) When
creating the network, you specify the numbers of rows and columns in the
grid. Here, the number of rows and columns is set to 10. The total number
of neurons is 100. You can change this number in another run if you want.
1-53
1
Getting Started
7 Click Next. The Train Network window appears.
1-54
Clustering Data
8 Click Train.
1-55
1
Getting Started
The training runs for the maximum number of epochs, which is 200.
9 For SOM training, the weight vector associated with each neuron moves
to become the center of a cluster of input vectors. In addition, neurons
that are adjacent to each other in the topology should also move close
to each other in the input space, therefore it is possible to visualize a
high-dimensional inputs space in the two dimensions of the network
topology. Investigate some of the visualization tools for the SOM. Under
the Plots pane, click SOM Sample Hits.
1-56
Clustering Data
The default topology of the SOM is hexagonal. This figure shows the
neuron locations in the topology, and indicates how many of the training
data are associated with each of the neurons (cluster centers). The topology
is a 10-by-10 grid, so there are 100 neurons. The maximum number of
hits associated with any neuron is 22. Thus, there are 22 input vectors
in that cluster.
10 You can also visualize the SOM by displaying weight planes (also referred
to as component planes). Click SOM Weight Planes in the Neural
Network Clustering Tool.
1-57
1
Getting Started
This figure shows a weight plane for each element of the input vector (two,
in this case). They are visualizations of the weights that connect each input
to each of the neurons. (Darker colors represent larger weights.) If the
connection patterns of two inputs were very similar, you can assume that
the inputs are highly correlated. In this case, input 1 has connections that
are very different than those of input 2.
11 In the Neural Network Clustering Tool, click Next to evaluate the network.
1-58
Clustering Data
At this point you can test the network against new data.
If you are dissatisfied with the network’s performance on the original or
new data, you can increase the number of neurons, or perhaps get a larger
training data set.
12 When you are satisfied with the network performance, click Next.
13 Use the buttons on this screen to save your results.
1-59
1
Getting Started
• You can click Simple Script or Advanced Script to create MATLAB
code that can be used to reproduce all of the previous steps from the
command line. Creating MATLAB code can be helpful if you want
to learn how to use the command-line functionality of the toolbox to
customize the training process. In “Using Command-Line Functions” on
page 1-60, you will investigate the generated scripts in more detail.
• You can also save the network as net in the workspace. You can perform
additional tests on it or put it to work on new inputs.
14 When you have generated scripts and saved your results, click Finish.
Using Command-Line Functions
The easiest way to learn how to use the command-line functionality of
the toolbox is to generate scripts from the GUIs, and then modify them to
customize the network training. As an example, look at the simple script that
was created in step 14 of the previous section.
% Solve a Clustering Problem with a Self-Organizing Map
% Script generated by NCTOOL
1-60
Clustering Data
%
% This script assumes these variables are defined:
%
%
simpleclusterInputs - input data.
inputs = simpleclusterInputs;
% Create a Self-Organizing Map
dimension1 = 10;
dimension2 = 10;
net = selforgmap([dimension1 dimension2]);
% Train the Network
[net,tr] = train(net,inputs);
% Test the Network
outputs = net(inputs);
% View the Network
view(net)
% Plots
% Uncomment these lines to enable various plots.
%figure, plotsomtop(net)
%figure, plotsomnc(net)
%figure, plotsomnd(net)
%figure, plotsomplanes(net)
%figure, plotsomhits(net,inputs)
%figure, plotsompos(net,inputs)
You can save the script, and then run it from the command line to reproduce
the results of the previous GUI session. You can also edit the script to
customize the training process. In this case, let’s follow each of the steps
in the script.
1 The script assumes that the input vectors are already loaded into the
workspace. To show the command-line operations, you can use a different
data set than you used for the GUI operation. Use the flower data set as an
example. The iris data set consists of 150 four-element input vectors.
1-61
1
Getting Started
load iris_dataset
inputs = irisInputs;
2 Create a network. For this example, you use a self-organizing map (SOM).
This network has one layer, with the neurons organized in a grid. (For
more information, see “Self-Organizing Feature Maps”.) When creating the
network with selforgmap, you specify the number of rows and columns
in the grid:
dimension1 = 10;
dimension2 = 10;
net = selforgmap([dimension1 dimension2]);
3 Train the network. The SOM network uses the default batch SOM
algorithm for training.
[net,tr] = train(net,inputs);
4 During training, the training window opens and displays the training
progress. To interrupt training at any point, click Stop Training.
1-62
Clustering Data
5 Test the network. After the network has been trained, you can use it to
compute the network outputs.
outputs = net(inputs);
6 View the network diagram.
view(net)
This creates the following diagram of the network.
1-63
1
Getting Started
7 For SOM training, the weight vector associated with each neuron moves to
become the center of a cluster of input vectors. In addition, neurons that are
adjacent to each other in the topology should also move close to each other
in the input space, therefore it is possible to visualize a high-dimensional
inputs space in the two dimensions of the network topology. The default
SOM topology is hexagonal; to view it, enter the following commands.
figure, plotsomtop(net)
In this figure, each of the hexagons represents a neuron. The grid is
10-by-10, so there are a total of 100 neurons in this network. There are four
elements in each input vector, so the input space is four-dimensional. The
weight vectors (cluster centers) fall within this space.
Because this SOM has a two-dimensional topology, you can visualize in
two dimensions the relationships among the four-dimensional cluster
centers. One visualization tool for the SOM is the weight distance matrix
(also called the U-matrix).
1-64
Clustering Data
8 To view the U-matrix, click SOM Neighbor Distances in the training
window.
In this figure, the blue hexagons represent the neurons. The red lines
connect neighboring neurons. The colors in the regions containing the red
lines indicate the distances between neurons. The darker colors represent
larger distances, and the lighter colors represent smaller distances. A band
of dark segments crosses from the lower-center region to the upper-right
region. The SOM network appears to have clustered the flowers into two
distinct groups.
1-65
1
Getting Started
To get more experience in command-line operations, try some of these tasks:
• During training, open a plot window (such as the SOM weight position
plot) and watch it animate.
• Plot from the command line with functions such as plotsomhits,
plotsomnc, plotsomnd, plotsomplanes, plotsompos, and plotsomtop.
(For more information on using these functions, see their reference pages.)
Also, see the advanced script for more options, when training from the
command line.
1-66
Time Series Prediction
Time Series Prediction
Dynamic neural networks are good at time series prediction.
Suppose, for instance, that you have data from a pH neutralization process.
You want to design a network that can predict the pH of a solution in a tank
from past values of the pH and past values of the acid and base flow rate into
the tank. You have a total of 2001 time steps for which you have those series.
You can solve this problem in two ways:
• Use a graphical user interface, ntstool, as described in “Using the Neural
Network Time Series Tool” on page 1-68.
• Use command-line functions, as described in “Using Command-Line
Functions” on page 1-80.
It is generally best to start with the GUI, and then to use the GUI to
automatically generate command-line scripts. Before using either method,
the first step is to define the problem by selecting a data set. Each GUI has
access to many sample data sets that you can use to experiment with the
toolbox. If you have a specific problem that you want to solve, you can load
your own data into the workspace. The next section describes the data format.
Defining a Problem
To define a time series problem for the toolbox, arrange a set of TS input
vectors as columns in a cell array. Then, arrange another set of TS target
vectors (the correct output vectors for each of the input vectors) into a second
cell array (see “Data Structures” for a detailed description of data formatting
for static and time series data). However, there are cases in which you only
need to have a target data set. For example, you can define the following
time series problem, in which you want to use previous values of a series to
predict the next value:
targets = {1 2 3 4 5};
The next section shows how to train a network to fit a time series data set,
using the neural network time series tool GUI, ntstool. This example uses
the pH neutralization data set provided with the toolbox.
1-67
1
Getting Started
Using the Neural Network Time Series Tool
1 If needed, open the Neural Network Start GUI with this command:
nnstart
2 Click Time Series Tool to open the Neural Network Time Series Tool.
(You can also use the command ntstool.)
1-68
Time Series Prediction
Notice that this opening pane is different than the opening panes for the
other GUIs. This is because ntstool can be used to solve three different
kinds of time series problems.
• In the first type of time series problem, you would like to predict future
values of a time series y(t) from past values of that time series and past
values of a second time series x(t). This form of prediction is called
nonlinear autoregressive with exogenous (external) input, or NARX (see
“NARX Network” (narxnet, closeloop)), and can be written as follows:
y(t) = f(y(t – 1), ..., y(t – d), x(t – 1), ..., (t – d))
This model could be used to predict future values of a stock or bond,
based on such economic variables as unemployment rates, GDP, etc.
It could also be used for system identification, in which models are
developed to represent dynamic systems, such as chemical processes,
manufacturing systems, robotics, aerospace vehicles, etc.
• In the second type of time series problem, there is only one series
involved. The future values of a time series y(t) are predicted only from
past values of that series. This form of prediction is called nonlinear
autoregressive, or NAR, and can be written as follows:
y(t) = f(y(t – 1), ..., y(t – d))
This model could also be used to predict financial instruments, but
without the use of a companion series.
• The third time series problem is similar to the first type, in that two
series are involved, an input series x(t) and an output/target series y(t).
Here you want to predict values of y(t) from previous values of x(t), but
without knowledge of previous values of y(t). This input/output model
can be written as follows:
y(t) = f(x(t – 1), ..., x(t – d))
The NARX model will provide better predictions than this input-output
model, because it uses the additional information contained in the
previous values of y(t). However, there may be some applications in
which the previous values of y(t) would not be available. Those are the
1-69
1
Getting Started
only cases where you would want to use the input-output model instead
of the NARX model.
3 For this example, select the NARX model and click Next to proceed.
4 Click Load Example Data Set in the Select Data window. The Time
Series Data Set Chooser window opens.
Note Use the Inputs and Targets options in the Select Data window
when you need to load data from the MATLAB workspace.
1-70
Time Series Prediction
5 Select pH Neutralization Process, and click Import. This returns you
to the Select Data window.
6 Click Next to open the Validation and Test Data window, shown in the
following figure.
The validation and test data sets are each set to 15% of the original data.
1-71
1
Getting Started
With these settings, the input vectors and target vectors will be randomly
divided into three sets as follows:
• 70% will be used for training.
• 15% will be used to validate that the network is generalizing and to stop
training before overfitting.
• The last 15% will be used as a completely independent test of network
generalization.
(See “Dividing the Data” for more discussion of the data division process.)
7 Click Next.
1-72
Time Series Prediction
The standard NARX network is a two-layer feedforward network, with a
sigmoid transfer function in the hidden layer and a linear transfer function
in the output layer. This network also uses tapped delay lines to store
previous values of the x(t) and y(t) sequences. Note that the output of
the NARX network, y(t), is fed back to the input of the network (through
delays), since y(t) is a function of y(t – 1), y(t – 2), ..., y(t – d). However, for
efficient training this feedback loop can be opened.
Because the true output is available during the training of the network,
you can use the open-loop architecture shown above, in which the true
output is used instead of feeding back the estimated output. This has
two advantages. The first is that the input to the feedforward network
is more accurate. The second is that the resulting network has a purely
feedforward architecture, and therefore a more efficient algorithm can
be used for training. This network is discussed in more detail in “NARX
Network” (narxnet, closeloop).
1-73
1
Getting Started
The default number of hidden neurons is set to 10. The default number
of delays is 2. Change this value to 4. You might want to adjust these
numbers if the network training performance is poor.
8 Click Next.
9 Click Train.
The training continued until the validation error failed to decrease for six
iterations (validation stop).
1-74
Time Series Prediction
10 Under Plots, click Error Autocorrelation. This is used to validate the
network performance.
The following plot displays the error autocorrelation function. It describes
how the prediction errors are related in time. For a perfect prediction
model, there should only be one nonzero value of the autocorrelation
function, and it should occur at zero lag. (This is the mean square error.)
This would mean that the prediction errors were completely uncorrelated
with each other (white noise). If there was significant correlation in the
prediction errors, then it should be possible to improve the prediction perhaps by increasing the number of delays in the tapped delay lines. In
this case, the correlations, except for the one at zero lag, fall approximately
1-75
1
Getting Started
within the 95% confidence limits around zero, so the model seems to be
adequate. If even more accurate results were required, you could retrain
the network by clicking Retrain in ntstool. This will change the initial
weights and biases of the network, and may produce an improved network
after retraining.
11 View the input-error cross-correlation function to obtain additional
verification of network performance. Under the Plots pane, click
Input-Error Cross-correlation.
1-76
Time Series Prediction
This input-error cross-correlation function illustrates how the errors are
correlated with the input sequence x(t). For a perfect prediction model, all
of the correlations should be zero. If the input is correlated with the error,
then it should be possible to improve the prediction, perhaps by increasing
the number of delays in the tapped delay lines. In this case, all of the
correlations fall within the confidence bounds around zero.
12 Under Plots, click Time Series Response. This displays the inputs,
targets and errors versus time. It also indicates which time points were
selected for training, testing and validation.
1-77
1
Getting Started
13 Click Next in the Neural Network Time Series Tool to evaluate the
network.
1-78
Time Series Prediction
At this point, you can test the network against new data.
If you are dissatisfied with the network’s performance on the original or
new data, you can do any of the following:
• Train it again.
• Increase the number of neurons and/or the number of delays.
• Get a larger training data set.
If the performance on the training set is good, but the test set performance
is significantly worse, which could indicate overfitting, then reducing the
number of neurons can improve your results.
14 If you are satisfied with the network performance, click Next.
15 Use the buttons on this screen to generate scripts or to save your results.
1-79
1
Getting Started
• You can click Simple Script or Advanced Script to create MATLAB
code that can be used to reproduce all of the previous steps from the
command line. Creating MATLAB code can be helpful if you want
to learn how to use the command-line functionality of the toolbox to
customize the training process. In “Using Command-Line Functions” on
page 1-80, you will investigate the generated scripts in more detail.
• You can also have the network saved as net in the workspace. You can
perform additional tests on it or put it to work on new inputs.
16 After creating MATLAB code and saving your results, click Finish.
Using Command-Line Functions
The easiest way to learn how to use the command-line functionality of
the toolbox is to generate scripts from the GUIs, and then modify them to
customize the network training. As an example, look at the simple script that
was created at step 15 of the previous section.
% Solve an Autoregression Problem with External
% Input with a NARX Neural Network
1-80
Time Series Prediction
% Script generated by NTSTOOL
%
% This script assumes the variables on the right of
% these equalities are defined:
%
%
phInputs - input time series.
%
phTargets - feedback time series.
inputSeries = phInputs;
targetSeries = phTargets;
% Create a Nonlinear Autoregressive Network with External Input
inputDelays = 1:4;
feedbackDelays = 1:4;
hiddenLayerSize = 10;
net = narxnet(inputDelays,feedbackDelays,hiddenLayerSize);
% Prepare the Data for Training and Simulation
% The function PREPARETS prepares time series data
% for a particular network, shifting time by the minimum
% amount to fill input states and layer states.
% Using PREPARETS allows you to keep your original
% time series data unchanged, while easily customizing it
% for networks with differing numbers of delays, with
% open loop or closed loop feedback modes.
[inputs,inputStates,layerStates,targets] = ...
preparets(net,inputSeries,{},targetSeries);
% Set up Division of Data for Training, Validation, Testing
net.divideParam.trainRatio = 70/100;
net.divideParam.valRatio = 15/100;
net.divideParam.testRatio = 15/100;
% Train the Network
[net,tr] = train(net,inputs,targets,inputStates,layerStates);
% Test the Network
outputs = net(inputs,inputStates,layerStates);
errors = gsubtract(targets,outputs);
performance = perform(net,targets,outputs)
1-81
1
Getting Started
% View the Network
view(net)
% Plots
% Uncomment these lines to enable various plots.
%figure, plotperform(tr)
%figure, plottrainstate(tr)
%figure, plotregression(targets,outputs)
%figure, plotresponse(targets,outputs)
%figure, ploterrcorr(errors)
%figure, plotinerrcorr(inputs,errors)
% Closed Loop Network
% Use this network to do multi-step prediction.
% The function CLOSELOOP replaces the feedback input with a direct
% connection from the outout layer.
netc = closeloop(net);
netc.name = [net.name ' - Closed Loop'];
view(netc)
[xc,xic,aic,tc] = preparets(netc,inputSeries,{},targetSeries);
yc = netc(xc,xic,aic);
closedLoopPerformance = perform(netc,tc,yc)
% Early Prediction Network
% For some applications it helps to get the prediction a
% timestep early.
% The original network returns predicted y(t+1) at the same
% time it is given y(t+1).
% For some applications such as decision making, it would
% help to have predicted y(t+1) once y(t) is available, but
% before the actual y(t+1) occurs.
% The network can be made to return its output a timestep early
% by removing one delay so that its minimal tap delay is now
% 0 instead of 1. The new network returns the same outputs as
% the original network, but outputs are shifted left one timestep.
nets = removedelay(net);
nets.name = [net.name ' - Predict One Step Ahead'];
view(nets)
[xs,xis,ais,ts] = preparets(nets,inputSeries,{},targetSeries);
1-82
Time Series Prediction
ys = nets(xs,xis,ais);
earlyPredictPerformance = perform(nets,ts,ys)
You can save the script, and then run it from the command line to reproduce
the results of the previous GUI session. You can also edit the script to
customize the training process. In this case, follow each of the steps in the
script.
1 The script assumes that the input vectors and target vectors are already
loaded into the workspace. If the data are not loaded, you can load them as
follows:
load pH_dataset
inputSeries = phInputs;
targetSeries = phTargets;
2 Create a network. The NARX network, narxnet, is a feedforward network
with the default tan-sigmoid transfer function in the hidden layer and
linear transfer function in the output layer. This network has two inputs.
One is an external input, and the other is a feedback connection from
the network output. (After the network has been trained, this feedback
connection can be closed, as you will see at a later step.) For each of these
inputs, there is a tapped delay line to store previous values. To assign
the network architecture for a NARX network, you must select the delays
associated with each tapped delay line, and also the number of hidden
layer neurons. In the following steps, you assign the input delays and the
feedback delays to range from 1 to 4 and the number of hidden neurons
to be 10.
inputDelays = 1:4;
feedbackDelays = 1:4;
hiddenLayerSize = 10;
net = narxnet(inputDelays,feedbackDelays,hiddenLayerSize);
1-83
1
Getting Started
Note Increasing the number of neurons and the number of delays requires
more computation, and this has a tendency to overfit the data when
the numbers are set too high, but it allows the network to solve more
complicated problems. More layers require more computation, but their use
might result in the network solving complex problems more efficiently. To
use more than one hidden layer, enter the hidden layer sizes as elements of
an array in the fitnet command.
3 Prepare the data for training. When training a network containing tapped
delay lines, it is necessary to fill the delays with initial values of the inputs
and outputs of the network. There is a toolbox command that facilitates
this process - preparets. This function has three input arguments: the
network, the input sequence and the target sequence. The function returns
the initial conditions that are needed to fill the tapped delay lines in
the network, and modified input and target sequences, where the initial
conditions have been removed. You can call the function as follows:
[inputs,inputStates,layerStates,targets] = ...
preparets(net,inputSeries,{},targetSeries);
4 Set up the division of data.
net.divideParam.trainRatio = 70/100;
net.divideParam.valRatio = 15/100;
net.divideParam.testRatio = 15/100;
With these settings, the input vectors and target vectors will be randomly
divided, with 70% used for training, 15% for validation and 15% for testing.
5 Train the network. The network uses the default Levenberg-Marquardt
algorithm (trainlm) for training. To train the network, enter:
[net,tr] = train(net,inputs,targets,inputStates,layerStates);
During training, the following training window opens. This window
displays training progress and allows you to interrupt training at any point
by clicking Stop Training.
1-84
Time Series Prediction
This training stopped when the validation error increased for six iterations,
which occurred at iteration 70.
6 Test the network. After the network has been trained, you can use it to
compute the network outputs. The following code calculates the network
outputs, errors and overall performance. Note that to simulate a network
with tapped delay lines, you need to assign the initial values for these
delayed signals. This is done with inputStates and layerStates provided
by preparets at an earlier stage.
1-85
1
Getting Started
outputs = net(inputs,inputStates,layerStates);
errors = gsubtract(targets,outputs);
performance = perform(net,targets,outputs)
7 View the network diagram.
view(net)
This creates the following diagram of the network.
8 Plot the performance training record to check for potential overfitting.
figure, plotperform(tr)
This creates the following figure, which shows that training, validation and
testing errors all decreased until iteration 64. It does not appear that any
overfitting has occurred, since neither testing or validation error increased
before iteration 64.
1-86
Time Series Prediction
9 Close the loop on the NARX network. When the feedback loop is open
on the NARX network, it is performing a one-step-ahead prediction. It
is predicting the next value of y(t) from previous values of y(t) and x(t).
With the feedback loop closed, it can be used to perform multi-step-ahead
predictions. This is because predictions of y(t) will be used in place of actual
future values of y(t). The following commands can be used to close the loop
and calculate closed-loop performance
netc = closeloop(net);
netc.name = [net.name ' - Closed Loop'];
view(netc)
[xc,xic,aic,tc] = preparets(netc,inputSeries,{},targetSeries);
yc = netc(xc,xic,aic);
perfc = perform(netc,tc,yc)
The following figure shows the closed loop network.
1-87
1
Getting Started
10 Remove a delay from the network, to get the prediction one time step early.
nets = removedelay(net);
nets.name = [net.name ' - Predict One Step Ahead'];
view(nets)
[xs,xis,ais,ts] = preparets(nets,inputSeries,{},targetSeries);
ys = nets(xs,xis,ais);
earlyPredictPerformance = perform(nets,ts,ys)
From this figure, you can see that the network is identical to the previous
open-loop network, except that one delay has been removed from each of
the tapped delay lines. The output of the network is then y(t + 1) instead
of y(t). This may sometimes be helpful when a network is deployed for
certain applications.
If the network performance is not satisfactory, you could try any of these
approaches:
1-88
Time Series Prediction
• Reset the initial network weights and biases to new values with init and
train again (see “Initializing Weights” (init)).
• Increase the number of hidden neurons or the number of delays.
• Increase the number of training vectors.
• Increase the number of input values, if more relevant information is
available.
• Try a different training algorithm (see “Training Algorithms”).
To get more experience in command-line operations, try some of these tasks:
• During training, open a plot window (such as the error correlation plot),
and watch it animate.
• Plot from the command line with functions such as plotresponse,
ploterrcorr and plotperform. (For more information on using these
functions, see their reference pages.)
Also, see the advanced script for more options, when training from the
command line.
1-89
1
Getting Started
Parallel Computing on CPUs and GPUs
In this section...
“Parallel Computing Toolbox” on page 1-90
“Parallel CPU Workers” on page 1-90
“GPU Computing” on page 1-91
“Multiple GPU/CPU Computing” on page 1-91
“Cluster Computing with MATLAB Distributed Computing Server” on page
1-92
“Load Balancing, Large Problems, and Beyond” on page 1-92
Parallel Computing Toolbox
Neural network training and simulation involves many parallel calculations.
Multicore CPUs, graphical processing units (GPUs), and clusters of computers
with multiple CPUs and GPUs can all take advantage of parallel calculations.
Together, Neural Network Toolbox and Parallel Computing Toolbox enable
the multiple CPU cores and GPUs of a single computer to speed up training
and simulation of large problems.
The following is a standard single-threaded training and simulation session.
(While the benefits of parallelism are most visible for large problems, this
example uses a small dataset that ships with Neural Network Toolbox.)
[x,t] = house_dataset;
net1 = feedforwardnet(10);
net2 = train(net1,x,t);
y = net2(x);
Parallel CPU Workers
Intel® processors ship with as many as eight cores. Workstations with two
processors can have as many as 16 cores, with even more possible in the
future. Using multiple CPU cores in parallel can dramatically speed up
calculations.
1-90
Parallel Computing on CPUs and GPUs
Open a MATLAB pool of parallel CPU workers and determine the number of
workers in the pool.
matlabpool open
matlabpool('size')
An “Undefined function or variable” appears when you do not have a license
for Parallel Computing Toolbox.
When a MATLAB pool of CPU workers is open, set the train function’s
'useParallel' option to 'yes' to specify that training and simulation be
performed across the pool.
net2 = train(net1,x,t,'useParallel','yes');
y = net2(x,'useParallel','yes');
GPU Computing
GPUs can have as many as 3072 cores on a single card, and possibly more
in the future. These cards are highly efficient on parallel algorithms like
neural networks.
Use gpuDeviceCount to check whether a supported GPU card is available in
your system. Use the function gpuDevice to review the currently selected
GPU information or to select a different GPU.
gpuDeviceCount
gpuDevice
gpuDevice(2) % Select device 2, if available
An “Undefined function or variable” error appears if you do not have a license
for Parallel Computing Toolbox.
When you have selected the GPU device, set the train or sim function’s
'useGPU' option to 'yes' to perform training and simulation on it.
net2 = train(net1,x,t,'useGPU','yes');
y = net2(x,'useGPU','yes');
Multiple GPU/CPU Computing
You can use multiple GPUs for higher levels of parallelism.
1-91
1
Getting Started
After opening a MATLAB pool, set both 'useParallel' and 'useGPU' to
'yes' to harness all the GPUs and CPU cores on a single computer. Each
worker associated with a unique GPU uses that GPU. The rest of the workers
perform calculations on their CPU core.
net2 = train(net1,x,t,'useParallel','yes','useGPU','yes');
y = net2(x,'useParallel','yes','useGPU','yes');
For some problems, using GPUs and CPUs together can result in the highest
computing speed. For other problems, the CPUs might not keep up with the
GPUs, and so using only GPUs is faster. Set 'useGPU' to 'only', to restrict
the parallel computing to workers with unique GPUs.
net2 = train(net1,x,t,'useParallel','yes','useGPU','only');
y = net2(x,'useParallel','yes','useGPU','only');
Cluster Computing with MATLAB Distributed
Computing Server
MATLAB Distributed Computing Server™ allows you to harness all the CPUs
and GPUs on a network cluster of computers. To take advantage of a cluster,
open a MATLAB pool with a cluster profile. Use the MATLAB Home tab
Environment area Parallel menu to manage and select profiles.
After opening a cluster pool, call train and the network with the
'useParallel' and 'useGPU' options.
net2 = train(net1,x,t,'useParallel','yes');
y = net2(x,'useParallel','yes');
net2 = train(net1,x,t,'useParallel','yes');
y = net2(x,'useParallel','yes');
net2 = train(net1,x,t,'useParallel','yes','useGPU','only');
y = net2(x,'useParallel','yes','useGPU','only');
Load Balancing, Large Problems, and Beyond
For more information on parallel computing with Neural Network Toolbox,
see “Parallel and GPU Computing”, which introduces other topics, such as
how to manually distribute data sets across CPU and GPU workers to best
take advantage of differences in machine speed and memory.
1-92
Parallel Computing on CPUs and GPUs
Distributing data manually also allows worker data to load sequentially, so
that data sets are limited in size only by the total RAM of a cluster instead
of the RAM of a single computer. This lets you apply neural networks to
very large problems.
1-93
1
Getting Started
Sample Data Sets
The Neural Network Toolbox software contains a number of sample data sets
that you can use to experiment with the functionality of the toolbox. To view
the data sets that are available, use the following command:
help nndatasets
Neural Network Datasets
----------------------Function Fitting, Function approximation and Curve fitting.
Function fitting is the process of training a neural network on a
set of inputs in order to produce an associated set of target
outputs. Once the neural network has fit the data, it forms a
generalization of the input-output relationship and can be used
to generate outputs for inputs it was not trained on.
simplefit_dataset
abalone_dataset
bodyfat_dataset
building_dataset
chemical_dataset
cho_dataset
engine_dataset
house_dataset
-
Simple fitting dataset.
Abalone shell rings dataset.
Body fat percentage dataset.
Building energy dataset.
Chemical sensor dataset.
Cholesterol dataset.
Engine behavior dataset.
House value dataset
---------Pattern Recognition and Classification
Pattern recognition is the process of training a neural network
to assign the correct target classes to a set of input patterns.
Once trained the network can be used to classify patterns it has
not seen before.
simpleclass_dataset
cancer_dataset
crab_dataset
1-94
- Simple pattern recognition dataset.
- Breast cancer dataset.
- Crab gender dataset.
Sample Data Sets
glass_dataset
iris_dataset
thyroid_dataset
wine_dataset
-
Glass chemical dataset.
Iris flower dataset.
Thyroid function dataset.
Italian wines dataset.
---------Clustering, Feature extraction and Data dimension reduction
Clustering is the process of training a neural network on
patterns so that the network comes up with its own
classifications according to pattern similarity and relative
topology. This is useful for gaining insight into data, or
simplifying it before further processing.
simplecluster_dataset - Simple clustering dataset.
The inputs of fitting or pattern recognition datasets may also
clustered.
---------Input-Output Time-Series Prediction, Forecasting, Dynamic
modelling, Nonlinear autoregression, System identification
and Filtering
Input-output time series problems consist of predicting the
next value of one time-series given another time-series.
Past values of both series (for best accuracy), or only one
of the series (for a simpler system) may be used to predict the
target series.
simpleseries_dataset
simplenarx_dataset
exchanger_dataset
maglev_dataset
ph_dataset
pollution_dataset
valve_dataset
-
Simple time-series prediction dataset.
Simple time-series prediction dataset.
Heat exchanger dataset.
Magnetic levitation dataset.
Solution PH dataset.
Pollution mortality dataset.
Valve fluid flow dataset.
1-95
1
Getting Started
---------Single Time-Series Prediction, Forecasting, Dynamic modelling,
Nonlinear autoregression, System identification, and Filtering
Single time-series prediction involves predicting the next value
of a time-series given its past values.
simplenar_dataset
chickenpox_dataset
ice_dataset
laser_dataset
oil_dataset
river_dataset
solar_dataset
-
Simple single series prediction dataset.
Monthly chickenpox instances dataset.
Gobal ice volume dataset.
Chaotic far-infrared laser dataset.
Monthly oil price dataset.
River flow dataset.
Sunspot activity dataset
Notice that all of the data sets have file names of the form name_dataset.
Inside these files will be the arrays nameInputs and nameTargets. You can
load a data set into the workspace with a command such as
load simplefit_dataset
This will load simplefitInputs and simplefitTargets into the workspace.
If you want to load the input and target arrays into different names, you
can use a command such as
[x,t] = simplefit_datasets;
This will load the inputs and targets into the arrays x and t. You can get a
description of a data set with a command such as
help maglev_dataset
1-96
Glossary
Glossary
ADALINE
Acronym for a linear neuron: ADAptive LINear Element.
adaption
Training method that proceeds through the specified sequence of inputs,
calculating the output, error, and network adjustment for each input
vector in the sequence as the inputs are presented.
adaptive filter
Network that contains delays and whose weights are adjusted after
each new input vector is presented. The network adapts to changes in
the input signal properties if such occur. This kind of filter is used in
long distance telephone lines to cancel echoes.
adaptive learning rate
Learning rate that is adjusted according to an algorithm during training
to minimize training time.
architecture
Description of the number of the layers in a neural network, each layer’s
transfer function, the number of neurons per layer, and the connections
between layers.
backpropagation learning rule
Learning rule in which weights and biases are adjusted by
error-derivative (delta) vectors backpropagated through the network.
Backpropagation is commonly applied to feedforward multilayer
networks. Sometimes this rule is called the generalized delta rule.
backtracking search
Linear search routine that begins with a step multiplier of 1 and then
backtracks until an acceptable reduction in performance is obtained.
batch
Matrix of input (or target) vectors applied to the network simultaneously.
Changes to the network weights and biases are made just once for
the entire set of vectors in the input matrix. (The term batch is being
replaced by the more descriptive expression “concurrent vectors.”)
Glossary-1
Glossary
batching
Process of presenting a set of input vectors for simultaneous calculation
of a matrix of output vectors and/or new weights and biases.
Bayesian framework
Assumes that the weights and biases of the network are random
variables with specified distributions.
BFGS quasi-Newton algorithm
Variation of Newton’s optimization algorithm, in which an
approximation of the Hessian matrix is obtained from gradients
computed at each iteration of the algorithm.
bias
Neuron parameter that is summed with the neuron’s weighted inputs
and passed through the neuron’s transfer function to generate the
neuron’s output.
bias vector
Column vector of bias values for a layer of neurons.
Brent’s search
Linear search that is a hybrid of the golden section search and a
quadratic interpolation.
cascade-forward network
Layered network in which each layer only receives inputs from previous
layers.
Charalambous’ search
Hybrid line search that uses a cubic interpolation together with a type
of sectioning.
classification
Association of an input vector with a particular target vector.
competitive layer
Layer of neurons in which only the neuron with maximum net input
has an output of 1 and all other neurons have an output of 0. Neurons
compete with each other for the right to respond to a given input vector.
Glossary-2
Glossary
competitive learning
Unsupervised training of a competitive layer with the instar rule or
Kohonen rule. Individual neurons learn to become feature detectors.
After training, the layer categorizes input vectors among its neurons.
competitive transfer function
Accepts a net input vector for a layer and returns neuron outputs of 0
for all neurons except for the winner, the neuron associated with the
most positive element of the net input n.
concurrent input vectors
Name given to a matrix of input vectors that are to be presented to
a network simultaneously. All the vectors in the matrix are used in
making just one set of changes in the weights and biases.
conjugate gradient algorithm
In the conjugate gradient algorithms, a search is performed along
conjugate directions, which produces generally faster convergence than
a search along the steepest descent directions.
connection
One-way link between neurons in a network.
connection strength
Strength of a link between two neurons in a network. The strength,
often called weight, determines the effect that one neuron has on
another.
cycle
Single presentation of an input vector, calculation of output, and new
weights and biases.
dead neuron
Competitive layer neuron that never won any competition during
training and so has not become a useful feature detector. Dead neurons
do not respond to any of the training vectors.
decision boundary
Line, determined by the weight and bias vectors, for which the net
input n is zero.
Glossary-3
Glossary
delta rule
See Widrow-Hoff learning rule.
delta vector
The delta vector for a layer is the derivative of a network’s output error
with respect to that layer’s net input vector.
distance
Distance between neurons, calculated from their positions with a
distance function.
distance function
Particular way of calculating distance, such as the Euclidean distance
between two vectors.
early stopping
Technique based on dividing the data into three subsets. The first
subset is the training set, used for computing the gradient and updating
the network weights and biases. The second subset is the validation set.
When the validation error increases for a specified number of iterations,
the training is stopped, and the weights and biases at the minimum of
the validation error are returned. The third subset is the test set. It is
used to verify the network design.
epoch
Presentation of the set of training (input and/or target) vectors to a
network and the calculation of new weights and biases. Note that
training vectors can be presented one at a time or all together in a batch.
error jumping
Sudden increase in a network’s sum-squared error during training. This
is often due to too large a learning rate.
error ratio
Training parameter used with adaptive learning rate and momentum
training of backpropagation networks.
error vector
Difference between a network’s output vector in response to an input
vector and an associated target output vector.
Glossary-4
Glossary
feedback network
Network with connections from a layer’s output to that layer’s input.
The feedback connection can be direct or pass through several layers.
feedforward network
Layered network in which each layer only receives inputs from previous
layers.
Fletcher-Reeves update
Method for computing a set of conjugate directions. These directions are
used as search directions as part of a conjugate gradient optimization
procedure.
function approximation
Task performed by a network trained to respond to inputs with an
approximation of a desired function.
generalization
Attribute of a network whose output for a new input vector tends to be
close to outputs for similar input vectors in its training set.
generalized regression network
Approximates a continuous function to an arbitrary accuracy, given a
sufficient number of hidden neurons.
global minimum
Lowest value of a function over the entire range of its input parameters.
Gradient descent methods adjust weights and biases in order to find the
global minimum of error for a network.
golden section search
Linear search that does not require the calculation of the slope. The
interval containing the minimum of the performance is subdivided at
each iteration of the search, and one subdivision is eliminated at each
iteration.
gradient descent
Process of making changes to weights and biases, where the changes are
proportional to the derivatives of network error with respect to those
weights and biases. This is done to minimize network error.
Glossary-5
Glossary
hard-limit transfer function
Transfer function that maps inputs greater than or equal to 0 to 1, and
all other values to 0.
Hebb learning rule
Historically the first proposed learning rule for neurons. Weights
are adjusted proportional to the product of the outputs of pre- and
postweight neurons.
hidden layer
Layer of a network that is not connected to the network output (for
instance, the first layer of a two-layer feedforward network).
home neuron
Neuron at the center of a neighborhood.
hybrid bisection-cubic search
Line search that combines bisection and cubic interpolation.
initialization
Process of setting the network weights and biases to their original
values.
input layer
Layer of neurons receiving inputs directly from outside the network.
input space
Range of all possible input vectors.
input vector
Vector presented to the network.
input weight vector
Row vector of weights going to a neuron.
input weights
Weights connecting network inputs to layers.
Glossary-6
Glossary
Jacobian matrix
Contains the first derivatives of the network errors with respect to the
weights and biases.
Kohonen learning rule
Learning rule that trains a selected neuron’s weight vectors to take on
the values of the current input vector.
layer
Group of neurons having connections to the same inputs and sending
outputs to the same destinations.
layer diagram
Network architecture figure showing the layers and the weight matrices
connecting them. Each layer’s transfer function is indicated with a
symbol. Sizes of input, output, bias, and weight matrices are shown.
Individual neurons and connections are not shown. (See Network
Objects, Data and Training Styles in the Neural Network Toolbox User’s
Guide.)
layer weights
Weights connecting layers to other layers. Such weights need to have
nonzero delays if they form a recurrent connection (i.e., a loop).
learning
Process by which weights and biases are adjusted to achieve some
desired network behavior.
learning rate
Training parameter that controls the size of weight and bias changes
during learning.
learning rule
Method of deriving the next changes that might be made in a network
or a procedure for modifying the weights and biases of a network.
Levenberg-Marquardt
Algorithm that trains a neural network 10 to 100 times faster than the
usual gradient descent backpropagation method. It always computes
the approximate Hessian matrix, which has dimensions n-by-n.
Glossary-7
Glossary
line search function
Procedure for searching along a given search direction (line) to locate
the minimum of the network performance.
linear transfer function
Transfer function that produces its input as its output.
link distance
Number of links, or steps, that must be taken to get to the neuron
under consideration.
local minimum
Minimum of a function over a limited range of input values. A local
minimum might not be the global minimum.
log-sigmoid transfer function
Squashing function of the form shown below that maps the input to the
interval (0,1). (The toolbox function is logsig.)
f (n) =
1
1 + e− n
Manhattan distance
The Manhattan distance between two vectors x and y is calculated as
D = sum(abs(x-y))
maximum performance increase
Maximum amount by which the performance is allowed to increase in
one iteration of the variable learning rate training algorithm.
maximum step size
Maximum step size allowed during a linear search. The magnitude of
the weight vector is not allowed to increase by more than this maximum
step size in one iteration of a training algorithm.
mean square error function
Performance function that calculates the average squared error between
the network outputs a and the target outputs t.
Glossary-8
Glossary
momentum
Technique often used to make it less likely for a backpropagation
network to get caught in a shallow minimum.
momentum constant
Training parameter that controls how much momentum is used.
mu parameter
Initial value for the scalar µ.
neighborhood
Group of neurons within a specified distance of a particular neuron.
The neighborhood is specified by the indices for all the neurons that lie
within a radius d of the winning neuron i*:
Ni(d) = {j,dij ≤ d}
net input vector
Combination, in a layer, of all the layer’s weighted input vectors with
its bias.
neuron
Basic processing element of a neural network. Includes weights and
bias, a summing junction, and an output transfer function. Artificial
neurons, such as those simulated and trained with this toolbox, are
abstractions of biological neurons.
neuron diagram
Network architecture figure showing the neurons and the weights
connecting them. Each neuron’s transfer function is indicated with a
symbol.
ordering phase
Period of training during which neuron weights are expected to order
themselves in the input space consistent with the associated neuron
positions.
output layer
Layer whose output is passed to the world outside the network.
Glossary-9
Glossary
output vector
Output of a neural network. Each element of the output vector is the
output of a neuron.
output weight vector
Column vector of weights coming from a neuron or input. (See also
outstar learning rule.)
outstar learning rule
Learning rule that trains a neuron’s (or input’s) output weight vector to
take on the values of the current output vector of the postweight layer.
Changes in the weights are proportional to the neuron’s output.
overfitting
Case in which the error on the training set is driven to a very small
value, but when new data is presented to the network, the error is large.
pass
Each traverse through all the training input and target vectors.
pattern
A vector.
pattern association
Task performed by a network trained to respond with the correct output
vector for each input vector presented.
pattern recognition
Task performed by a network trained to respond when an input vector
close to a learned vector is presented. The network “recognizes” the
input as one of the original target vectors.
perceptron
Single-layer network with a hard-limit transfer function. This network
is often trained with the perceptron learning rule.
perceptron learning rule
Learning rule for training single-layer hard-limit networks. It is
guaranteed to result in a perfectly functioning network in finite time,
given that the network is capable of doing so.
Glossary-10
Glossary
performance
Behavior of a network.
performance function
Commonly the mean squared error of the network outputs. However,
the toolbox also considers other performance functions. Type nnets and
look under performance functions.
Polak-Ribiére update
Method for computing a set of conjugate directions. These directions are
used as search directions as part of a conjugate gradient optimization
procedure.
positive linear transfer function
Transfer function that produces an output of zero for negative inputs
and an output equal to the input for positive inputs.
postprocessing
Converts normalized outputs back into the same units that were used
for the original targets.
Powell-Beale restarts
Method for computing a set of conjugate directions. These directions are
used as search directions as part of a conjugate gradient optimization
procedure. This procedure also periodically resets the search direction
to the negative of the gradient.
preprocessing
Transformation of the input or target data before it is presented to the
neural network.
principal component analysis
Orthogonalize the components of network input vectors. This procedure
can also reduce the dimension of the input vectors by eliminating
redundant components.
quasi-Newton algorithm
Class of optimization algorithm based on Newton’s method. An
approximate Hessian matrix is computed at each iteration of the
algorithm based on the gradients.
Glossary-11
Glossary
radial basis networks
Neural network that can be designed directly by fitting special response
elements where they will do the most good.
radial basis transfer function
The transfer function for a radial basis neuron is
radbas(n) = e− n
2
regularization
Modification of the performance function, which is normally chosen
to be the sum of squares of the network errors on the training set, by
adding some fraction of the squares of the network weights.
resilient backpropagation
Training algorithm that eliminates the harmful effect of having a small
slope at the extreme ends of the sigmoid squashing transfer functions.
saturating linear transfer function
Function that is linear in the interval (-1,+1) and saturates outside this
interval to -1 or +1. (The toolbox function is satlin.)
scaled conjugate gradient algorithm
Avoids the time-consuming line search of the standard conjugate
gradient algorithm.
sequential input vectors
Set of vectors that are to be presented to a network one after the other.
The network weights and biases are adjusted on the presentation of
each input vector.
sigma parameter
Determines the change in weight for the calculation of the approximate
Hessian matrix in the scaled conjugate gradient algorithm.
sigmoid
Monotonic S-shaped function that maps numbers in the interval (-∞,∞)
to a finite interval such as (-1,+1) or (0,1).
Glossary-12
Glossary
simulation
Takes the network input p, and the network object net, and returns the
network outputs a.
spread constant
Distance an input vector must be from a neuron’s weight vector to
produce an output of 0.5.
squashing function
Monotonically increasing function that takes input values between -∞
and +∞ and returns values in a finite interval.
star learning rule
Learning rule that trains a neuron’s weight vector to take on the values
of the current input vector. Changes in the weights are proportional to
the neuron’s output.
sum-squared error
Sum of squared differences between the network targets and actual
outputs for a given input vector or set of vectors.
supervised learning
Learning process in which changes in a network’s weights and biases
are due to the intervention of any external teacher. The teacher
typically provides output targets.
symmetric hard-limit transfer function
Transfer that maps inputs greater than or equal to 0 to +1, and all other
values to -1.
symmetric saturating linear transfer function
Produces the input as its output as long as the input is in the range -1 to
1. Outside that range the output is -1 and +1, respectively.
Glossary-13
Glossary
tan-sigmoid transfer function
Squashing function of the form shown below that maps the input to the
interval (-1,1). (The toolbox function is tansig.)
f (n) =
1
1 + e− n
tapped delay line
Sequential set of delays with outputs available at each delay output.
target vector
Desired output vector for a given input vector.
test vectors
Set of input vectors (not used directly in training) that is used to test
the trained network.
topology functions
Ways to arrange the neurons in a grid, box, hexagonal, or random
topology.
training
Procedure whereby a network is adjusted to do a particular job.
Commonly viewed as an offline job, as opposed to an adjustment made
during each time interval, as is done in adaptive training.
training vector
Input and/or target vector used to train a network.
transfer function
Function that maps a neuron’s (or layer’s) net output n to its actual
output.
tuning phase
Period of SOFM training during which weights are expected to spread
out relatively evenly over the input space while retaining their
topological order found during the ordering phase.
Glossary-14
Glossary
underdetermined system
System that has more variables than constraints.
unsupervised learning
Learning process in which changes in a network’s weights and biases
are not due to the intervention of any external teacher. Commonly
changes are a function of the current network input vectors, output
vectors, and previous weights and biases.
update
Make a change in weights and biases. The update can occur after
presentation of a single input vector or after accumulating changes
over several input vectors.
validation vectors
Set of input vectors (not used directly in training) that is used to monitor
training progress so as to keep the network from overfitting.
weight function
Weight functions apply weights to an input to get weighted inputs, as
specified by a particular function.
weight matrix
Matrix containing connection strengths from a layer’s inputs to its
neurons. The element wi,j of a weight matrix W refers to the connection
strength from input j to neuron i.
weighted input vector
Result of applying a weight to a layer’s input, whether it is a network
input or the output of another layer.
Widrow-Hoff learning rule
Learning rule used to train single-layer linear networks. This rule is
the predecessor of the backpropagation rule and is sometimes referred
to as the delta rule.
Glossary-15
Glossary
Glossary-16
Index
A
F
applications
aerospace 1-6
automotive 1-6
banking 1-6
defense 1-6
electronics 1-6
entertainment 1-6
financial 1-6
industrial 1-6
insurance 1-7
manufacturing 1-7
medical 1-7
oil and gas exploration 1-7
robotics 1-7
securities 1-7
speech 1-7
telecommunications 1-7
transportation 1-7
fitting functions 1-9
fitting problems
defining 1-9
solving with command-line functions 1-23
1-43
solving with nftool 1-11 1-51
functions
fitting 1-9
Index
C
clustering 1-50
clustering problems
defining 1-50
solving with command-line functions 1-61
solving with nctool 1-51
command-line functions
solving clustering problems 1-61
solving fitting problems 1-9
solving time series problems 1-67
confusion matrix 1-37 1-47
I
input-error cross-correlation function 1-76
N
nctool
solving clustering problems 1-51
neighbor distances plot 1-65
Neural Network Toolbox Clustering Tool
See nctool. 1-51
Neural Network Toolbox Fitting Tool. See
nftool. 1-11 1-31
Neural Network Toolbox Pattern Recognition
Tool. See nprtool. 1-31
Neural Network Toolbox Time Series Tool. See
ntstool. 1-68
neural networks
definition 1-3
self-organizing feature map 1-53
nftool
solving fitting problems 1-11 1-68
nprtool
solving pattern recognition problems 1-31
E
error autocorrelation plot 1-75
error histogram 1-18
P
pattern recognition 1-29
pattern recognition problems
defining 1-29
Index-1
Index
R
SOM topology 1-64
regression plots 1-17
ROC curve 1-38
T
S
sample hits plot 1-56
self-organizing feature map (SOFM)
networks 1-53
neighbor distances plot 1-65
sample hits plot 1-56
SOM topology 1-64
weight planes plot 1-57
Index-2
time series 1-67
time series problems
defining 1-67
solving with command-line functions 1-83
solving with ntstool 1-68
W
weight planes plot 1-57
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement