Contents

1.

Exercise 1: Model of a neuron and the learning process . . . . .

2

1.1.

Model of a neuron

. . . . . . . . . . . . . . . . . . . . .

2

1.2.

Three neuron network . . . . . . . . . . . . . . . . . . .

2

1.3.

Delta rule . . . . . . . . . . . . . . . . . . . . . . . . . .

3

2.

Exercise 2: Associative memory . . . . . . . . . . . . . . . . . .

5

2.1.

Exercise preparation . . . . . . . . . . . . . . . . . . . .

5

2.2.

Directly forming the correlation matrix . . . . . . . . . .

5

2.3.

Finding the correlation matrix using supervised learning

8

3.

Exercise 3: Perceptron . . . . . . . . . . . . . . . . . . . . . . . 11

3.1.

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.2.

Perceptron for data classification

. . . . . . . . . . . . . 11

3.3.

Classification of examples with Gaussian distribution . . 13

4.

Exercise 4: LMS algorithm for stock price prediction

. . . . . . 16

4.1.

Priprema za vjeˇ

4.2.

Stock price movement

. . . . . . . . . . . . . . . . . . . 16

1

Exercise 1: Model of a neuron and the learning process

Model of a neuron

Write a MATLAB function, which calculates output of a neuron. Assume a

model of a neuron shown in figure 1 with three inputs and a threshold. The

threshold can be interpreted as an additional input with fixed input of -1 and weight w

10

. The output of the function has to correspond to output of the neuron.

Figure 1: Model of a neuron

Use the scalar product of input vector [x

0 x

1 x

2 x

3

] and weights [x

0 x

1 x

2 in order to calculate the neuron activation. The function has to have an adx

3

] ditional input, which is used to select different nonlinear activation functions.

The function should support the following nonlinear functions:

1. Step function

2. Piecewise linear function (ramp)

3. Sigmoid function defined as φ(v) =

1

1+exp(−av)

, with a=1.

1. Pick a random weight vector w. Write down the chosen weights and calculate the neuron response for following inputs (for each activation function): x1 = [0.5, 1, 0.7]’; x2 = [0, 0.8, 0.2]’;

Three neuron network

Write a function for a three neuron network (Figure 2) using the function de-

veloped in section 1.1.. Assume that neurons use the sigmoid transfer function,

where a = 1 and assume the weights are given as follows: w1 = [ 1, 0.5, 1, -0.4 ]; w2 = [ 0.5, 0.6, -1.5, -0.7 ]; w3 = [-0.5, -1.5, 0.6 ];

2

Remark : The first element of the weight vector is the threshold of a neuron and is shown as w i0

in Figure 2.

Figure 2: Three neuron network

1. What is the output of the network for input given as x = [0.3, 0.7,

0.9]’?

2. Does the output of the network depend on neuron weights?

Delta rule

The goal of this experiment is to better understand the learning process. In this experiment we will implement a logical AND function using one neuron with

two inputs and a threshold (see Figure 3). We will use the sigmoid nonlinear

activation function with a = 1.

For the learning phase, we have to define the following input output pairs x i

, y i for the logical AND function: For inputs x1 = [-1, 0, 0]’, x2 = [-1, 0, 1]’ and x3 = [-1, 1, 0]’ the output y should be equal to 0. For input vector x4

= [-1, 1, 1]’ the output value y should be equal to 1. The first component of all input vectors has value -1 and defines the neuron threshold visible as w

10 in the Figure. At the beginning we set the neuron weights to random values.

We use the delta rule in order to update the weights:

∆w kj

(n) = ηe k

(n)x j

(n) where e k

(n) = d k

(n) − y k

(n) where d k

(n) is the expected neuron output and y k

(n) is the obtained neuron output. This iterative procedure is repeated until the error is sufficiently small.

3

Figure 3: One neuron network

1. Experiment with different starting weights and different learning rates.

(In case of instabilities, perform the experiment using a small learning rate, for example η=0.05.). Show the error function (y-axis) and number of iterations (x-axis) for different learning rates.

a) What is the best learning rate? How does the learning rate affect the neural network?

b) How did you define the sufficiently small error used to terminate the algorithm?

c) After how many iterations does the algorithm terminate?

4

Exercise 2: Associative memory

Exercise preparation

1. Which operation does the operator ’ perform over a vector? What is the output of a Matlab function transpose() when applied to a vector?

(Remark : There is difference between these two functions!)

2. How do we analytically express the difference between two vectors?

3. Which conditions should a function satisfy in order to be used as an error function?

4. Suggest an error function.

5. What does the Matlab function sse() do?.

Directly forming the correlation matrix

In this part of the exercise we will use the direct approach in forming the correlation matrix. Memory based on the correlation matrix should memorize input-output association pairs represented as vectors. For each input vector

(key) the memory has to memorize the output pattern i.e. vector in an ASCII code formulation. In this example we will use 4-dimensional input and output vectors. Words (output vectos), which have to be memorized are: ‘vrat’ , ‘kraj’

, ‘cres’ , ‘otac’. Vectors b i

, which represent those words should be formed as follows: b1 = real(’vrat’)’; b2 = real(’kraj’)’; b3 = real(’cres’)’; b4 = real(’otac’)’;

2.2.1.

Orthogonal input vectors

This experiment demonstrates how to create an associative memory. Ortonormalized set of vectors defined as follows: a1 = [1,0,0,0]’; a2 = [0,1,0,0]’; a3 = [0,0,1,0]’; a4 = [0,0,0,1]’; is used as input vector set (set of keys). We form the memory correlation matrix M using input output pairs as follows:

M = b1 * a1’ + b2 * a2’ + b3 * a3’ + b4 * a4’;

5

In order to verify whether the memory is functioning properly we have to calculate outputs for each input vector. For example, output for the key a

1 can be obtained as follows: char(M * a1)’

1. What is the response for each key? Were all input-output pairs memorized correctly?

2. How many input-output pairs would be memorized if vectors a i were not normalized?

2.2.2.

Correlation matrix properties

The goal of this experiment is to demonstrate the capacity of obtained memory. In this part of the exercise we will try to memorize one more (fifth) word

(’mrak’). In 4-dimensional vector space the maximum number of linearly independent vectors is four. Because of this fact we pick an arbitrary unit vector as the fifth key, for example: a5 = (a1 + a3) / sqrt(2);

Form vectors b

5

(‘mrak’) and a using the following expression:

5 as explained and add them into the memory

M = M + b5 * a5’;

1. Was the new association properly memorized?

2. Did other associations stay correctly memorized?

(a) If not - which were not memorized correctly and why?

(b) If yes - which were memorized correctly and why?

2.2.3.

Word pairs as associations

In this experiment we will form the associative memory, which memorizes word pairs. The associations, which have to be memorized are: ruka-vrat, kset-kraj, more-cres, mama-otac. Generate input vectors (keys) as follows: a1 = real(’ruka’)’; a2 = real(’kset’)’; a3 = real(’more’)’; a4 = real(’mama’)’;

6

Vectors bi don’t have to be created again because they are the ones used in the first part of the exercise. Form the matrix M using the same procedure as in the first part of the exercise.

1. What is the response for each input key?

2. Which associations were memorized correctly?

3. Which associations were not memorized correctly and why?

4. How can we fix this problem?

2.2.4.

Input vector orthogonalization

In this experiment we show an associative memory, which uses keys that are orthonormalized. We use the Gram-Schmidt orthogonalization method as follows. We first form the matrix A using vectors a i

:

A= [a1, a2, a3, a4];

After this step we perform the orthonormalization step:

C = orth(A);

We extract individual orthonormal vectors c i

: c1 = C(1,:)’ , ..., c4 = C(4,:)’;

In the next step we form a new matrix M using vectors ci instead of vectors ai when creating the matrix M. Verify the responses of matrix M with vectors c i as inputs: char(round(M * c1))’ , ...

1. What is the effect of vector orthonormalization?

2. How many pairs were correctly memorized?

3. What can we expect when normalizing the vectors?

4. What can we expect when only orthogonalizing the vectors?

5. What can we expect if vectors c i thogonal?

are linearly independent but not or-

7

2.2.5.

Finding the correlation matrix using matrix inversion

For previously used word pairs (ruka-vrat, kset-kraj, more-cres, mama-otac) find a 4x4 correlation matrix M as M = BA

−1

, where matrix B is defined as:

B = [b1, b2, b3, b4]

1. Were all associations properly memorized? Remark: The result should be rounded to the nearest number before comparison.

2.2.6.

Finding the correlation matrix using pseudo-inversion

A pseudo-inverse matrix can be used in order to find the correlation matrix when number of associations is larger than dimensionality of vectors representing the associations. In this case, the correlation matrix can be found as M =

BA

+

, where A

+ is a pseudo-inverse matrix defined as A

+

= A

T

(AA

T

)

−1

.

Assume that the vectors a i and b i are defined previously (five associations in total). Find the pseudo-inverse matrix for this case.

1. Were all pairs memorized correctly?

2. If not, what is the error between expected and obtained values? (Remark:

Use the error function defined in section 2.1..)

Finding the correlation matrix using supervised learning

This experiment shows us how to form the matrix M using supervised learning.

In two following experiments we will use learning with error correction.

2.3.1.

Learning with error correction

Form matrices A and B where each contains 4 vectors stacked in columns as explained in previous experiments. Check the contents of obtained matrices with following operations: char(A)’ , char(B)’

In order to start the learning procedure we have to initialize the matrix M

(For example, random values uniformly generated in [-0.5, 0.5] interval):

M = -0.5 + rand (4, 4);

For the learning part use the function trainlms, which is the implementation of the Widrow-Hoff LMS learning algorithm. The function can be used as follows:

8

[M, e] = trainlms(ni,A,B,M,max_num_iter); where max num iter is the number of iterations and ni is the learning rate.

Find the max num iter variable experimentally. For ni you can use: ni = 0.9999/max(eig(A*A’));

The function trainlms performs the learning until SSE drops below 0.02 or maximum number of iterations is performed. After the learning phase, look at the responses of the correlation matrix M: char(round(M*A))’

If we type round(M * A)’ == B’ we will see, which characters were properly reconstructed: the positions with correct reconstructions will have value 1 and other positions will have value

0. By calling the trainlms multiple times we can extend the learning process and maybe increase the number of memorized characters but the proper way to extend the learning process is to increase the max num iter variable. We can draw a graph, which plots the error with number of iterations (in logaritmic scale) using the following command: loglog(e)

Assignment:

1. Plot a graph showing number of memorized characters tied to number of used iterations. (Caution: When building the graph, start the simulation with the same starting matrix.)

2.3.2.

Effect of larger number of associations

This experiment demonstrates the capacity of the associative memory. What is the capacity of a 4x4 correlation matrix based associative memory?

For additional pair ‘auto’-‘mrak’ create vectors a5 and b5 as explained in the previous part of the exercise. Create new matrices A and B with dimensions 4 (rows) x 5 (columns) in the same way as previously explained. Initialize the matrix M with random starting values. Use the trainlms function in the following way:

[M, e] = trainlms(ni,A,B,M,max_br_iter);

1. How many iterations did you use?

9

2. How many characters were memorized correctly?

3. What is the SSE error?

4. What happens if we call the function from the beginning?

5. How many characters are correctly memorized now and how large is the mistake? Is there any difference and why?

6. Is it possible to train this network in order to memorize all five associations?

7. Why? (Explain the previous answer)

10

Exercise 3: Perceptron

Introduction

Before solving the problems download the zip file from the course website. In the zip file you will find additional materials required for solving the exercises.

1. Define a hyperplane.

2. What is the difference between the figure and plot functions?

3. What does the function initp do?

4. What are the inputs and outputs of random and trainlms p?

Perceptron for data classification

In order to demonstrate the main concepts behind the perceptron we have to define input and output data. We will use N two dimensional vectors ai as input data organized in a 2 × N matrix A (two rows and N columns).

A = [ax1, ax2, ax3, ..., axN; ...

ay1, ay2, ay3, ..., ayN];

Here, N is the number of vectors and ax i

, ay i i-th vector.

are the x and y coordinates of

In this example we will demonstrate how to classify vectors in two classes.

In this case, each vector can belong to only one of two possible classes, for example C

0 and C

1

. Classes of each examples are defined using a MATLAB matrix C, whose dimensions are 1 × N defined as follows:

C = [ c1, c2 ,.. , cN ]

Each element c i has value 0 if vector a i vector belongs to class C

1

.

belongs to class C

0 and has value 1 if

3.2.1.

Classification of linearly separable examples in 2D space

In this experiment we will show how to use the perceptron in order to classify a vector in two linearly separable classes. We will use the following vectors as input vectors: a1 = [1, 1]’, a2 = [1, 1]’, a3 = [2, 0]’, a4 = [1, 2]’, a5 = [2, 1]’;

Here, vectors a1, a2 and a3 belong to class C

0 and other vectors belong to class C

1

. Form the matrices A and C as explained. Plot the vectors using the following syntax: plotpv(A,C)

11

Vectors belonging to the same class have the same symbol in the plot. You can initialize the perceptron as follows:

W = initp(A,C);

Here, vector W is the vector with neural network weights. The first column of W (actually the first element of vector W(:,1) ) represents the threshold value.

The hyperplane can be visualized using following command: plotpc(W(:,2:end),W(:,1));

(Remark: If you deleted the graph created with plotpv plot the graph before calling the function plotpc.)

Previously initialized perceptron can be trained by calling the function trainlms p until correct (or satisfactory) segmentation (division) of the plain is achieved:

[M, e] = trainlms_p(ni,A,B,M,max_num_iter);

The last argument of the function defines the number of iterations used when training the perceptron. Before calling the function, it is required to expand the input matrix A with thresholds (-1). This can be achieved as follows:

A1 = [-ones(1,length(A)); A];

1. Plot the plane and positions of last vectors with the classification plane in two cases: before and after training. Are classes C separated in both cases?

1 and C

2 correctly

2. Show the segmentation error with regards to training iteration.

3. Think of an experiment where you will use the perceptron to find the border in 2D space and train the required perceptron.

4. Think of an experiment where you will use the perceptron to find the border in 3D space and train the required perceptron.

3.2.2.

Linearly inseparable case in 2D

In this experiment we will try to train a perceptron for two linearly inseparable classes. To be more precise, we will try to solve the logical XOR function problem. Input vectors ai will represent the function inputs and classes C

0 and C

1 will represent the function values:

A = [0 0 1 1; 0 1 0 1];

C = [0 1 1 0];

12

1. Use the same training procedure from the first experiment. Plot the obtained results (i.e.. plot the input vectors before and after the training phase in the same window). Plot the error as well.

2. Did perceptron learn to solve the XOR problem? Explain why.

3.2.3.

Classification of linearly separable examples in 3D space

This experiment shows how to classify examples in 3D space. Input vectors are three dimensional and belong to 2 classes which are linearly separable. Input vectors are: a1 = [0 0 0]’; a2 = [0 0 1]’; a3 = [0 1 0]’; a4 = [0 1 1]’; a5 = [1 0 0]’;

Here, vectors a1, a3 and a4 belong to class C

0 class C

1

.

and other vectors belong to

1. Repeat the learning procedure from 3.2.1. and show obtained results

with plot of the error.

2. Change the vector classes until classes C

0 arable. When does this happen?

and C

1 become linearly insep-

Classification of examples with Gaussian distribution

The second part of this exercise tries to show how to classify examples with

Gaussian distribution, which can be typically found in real life problems.

Suppose we have two classes of 2D vectors, where each class represents the realization of the random vector with Gaussian distribution. We will set the mean value and standard deviation of the first class to E(C

0

) = (10, 10) and

S(C

0

) = 2.5 for each of the components. The second class will have the expected value E(C

1

) = (20, 5) and standard deviation S(C

1

) = 2. Create 100 vectors for each class as follows:

A1 = [random(‘norm’, 10, 2.5, 1,100); random(‘norm’, 10, 2.5, 1,100)]

A2 = [random(‘norm’, 20, 2.0, 1,100); random(‘norm’, 5, 2.0, 1,100)]

After this step we have to construct the matrix A containing vectors A

1 and

A

2

. We have to form the vector C which says that first 100 elements belong to class 0 and other elements belong to class 1:

A = [A1 A2]

C = [zeros(1,100) ones(1,100)]

13

1. Repeat the training procedure from the first part of the exercise. Plot the obtained results.

2. How many examples were misclassified?

3. If the input vector is given as ai=(10,3) where would we classify this example? Remark: In order to obtain the output of the net for input vector a we use the function emph run perc as follows: run_perc([-1, a], W)

3.3.1.

Classification of examples using two perceptrons

The third part of the exercise shows how to use more than one perceptron in

order to classify input vectors in larger number of classes. In Figure 4 we can

see a network with two perceptrons which can be used in order to classify the examples in four linearly inseparable classes.

Suppose we have 8 2D input vectors defined with matrix A where each column

Figure 4: Two perceptrons for classification in four classes (outputs are binary coded) of the matrix represents one input vector:

A = [ 0.1, 0.7, 0.8, 0.8, 1.0, 0.3, 0.0, -0.3, -0.5, -1.5;

1.2, 1.8, 1.6, 0.6, 0.8, 0.5, 0.2, 0.8, -1.5, -1.3]

Matrix C is used to define in which class each input vector belongs to:

C = [1 1 1 0 0 1 1 1 0 0; ...

0 0 0 0 0 1 1 1 1 1]

Each column of the matrix C is a 2D vector, where each two bits represent the binary coded class value for each input vector. Using two bits we can binary code four different values, which represent the class names: C

0

, C

1

, C

2

, C

3

.

This network is trained using the same procedure used for the network with only one perceptron.

14

1. Train the network. Plot the obtained results with plot of the error.

15

Exercise 4: LMS algorithm for stock price prediction

Introduction

Before solving the problems download the zip file from the course website. In the zip file you will find additional materials required for solving the exercises.

1. What does the function trainlms do?

Stock price movement

In this experiment we will use the LMS algorithm in order to predict the stock price for a stock, which we will denote XXX-R-A. Load the data from file xxx-r-a into variable xxx (You can also load the xxx-r-a.mat file). Numbers

(elements of vector xxx) show the movement of average daily price of share

XXX-R-A in time. Plot the variable xxx using the following command: plot(xxx)

You should get the same plot as in Figure 5

Figure 5: Price movement of XXX-R-A

The goal of this exercise is to use several (let us say N) previous share prices in order to predict the share price for today. This is useful because we can buy or sell the price before its price rises or falls and in that way increase our profit or lower our loss. The first step is to find the input-output pairs, which will be used on order to train our network. The size of this set will be defined using variable i. Our inputs will be vectors ai, which we put in matrix

A. Outputs are scalar values, which we put in vector y.

Assignments:

1. Write the function memory, which for given day in the year (index of the vector xxx) constructs a column vector a whose elements represent the prices for last B days but without the price for today.

16

2. Using the function memory write a function memorize, which for given inputs xxx, day, N, i) constructs the matrix A in which each column vector represents the ai for the previous day. So, matrix A memorizes i memories which we will use in order to train our network. Remark:

Construct the matrix A in a for loop, where column vectors are easily assigned:

A = [ai,A]; where ai is calculated as in the previous assignment.

ai = memory(xxx, day-i,N);.

Construct the matrix A using the command:

A = memorize(xxx, 151, 100, 50);

The output vector is constructed using the command: y = xxx(day-i+1:day);

We initialize the perceptron as follows:

W = initp(A,y);

We expand the matrix A with thresholds as earlier:

A = [-ones(1,length(A)); A];

We train the perceptron using the command:

[W1, e] = trainlms(ni,A,y,W,max_num_iter);

Weights of the network are memorized in the matrix W1. Variable ni should be experimentally found. Train the network for different values i, N, max num iter

We can visualize the output of the network using the following command: plot(1:length(W1*A),W1*A,’b’,1:length(y),y,’r’)

Here, the blue color shows the predicted values and the red color shows the real outputs.

1. Memorize different weight matrices (W1, W2...) with different combinations of following parameters i = 30, 50 or 100; N = 20, 50 or 80; max num iter = 10000, 50000 or 500000. (Choose 6 different combinations.) For selected parameter combinations show the obtained predictions with real values. Comment the obtained results.

If we do not use any intelligence for stock price prediction and we assume the price will be (almost) the same tomorrow we can calculate our error:

17

a = xxx(day-i:day-1); y = xxx(day-i+1:day); err_oo = sum(abs(y -a));

We can calculate the error of the network using the following command: err_nn = sum(abs(y-W1*A1));

If we trade each day with XXX-R-A stock our error can be easily measured.

Our potential profit can be calculated as follows: profit = err_oo - err_nn;