ADAPTIVE NONLINEAR SYSTEM IDENTIFICATION AND CHANNEL EQUALIZATION USING

ADAPTIVE NONLINEAR SYSTEM IDENTIFICATION AND CHANNEL EQUALIZATION USING

ADAPTIVE NONLINEAR SYSTEM IDENTIFICATION

AND CHANNEL EQUALIZATION USING

FUNCTIONAL LINK ARTIFICIAL NEURAL NETWORK

A THESIS SUBMITTED IN PARTIAL FULFILLMENT

OF THE REQUIREMENTS FOR THE DEGREE OF

Master of Technology

in

Telematics and Signal Processing

By

AJIT KUMAR SAHOO

Department of Electronics and Communication Engineering

National Institute Of Technology

Rourkela

2007

ADAPTIVE NONLINEAR SYSTEM IDENTIFICATION

AND CHANNEL EQUALIZATION USING

FUNCTIONAL LINK ARTIFICIAL NEURAL NETWORK

A THESIS SUBMITTED IN PARTIAL FULFILLMENT

OF THE REQUIREMENTS FOR THE DEGREE OF

Master of Technology

in

Telematics and Signal Processing

By

AJIT KUMAR SAHOO

Under the Guidance of

Prof. G. Panda

Department of Electronics and Communication Engineering

National Institute Of Technology

Rourkela

2007

National Institute Of Technology

Rourkela

CERTIFICATE

This is to certify that the thesis entitled, “Adaptive Nonlinear System Identification and

Channel Equalization Using Functional Link Artificial Neural Network” submitted by

Sri Ajit kumar Sahoo in partial fulfillment of the requirements for the award of Master of

Technology Degree in Electronics & communication Engineering with specialization in

“Telematics and Signal Processing” at the National Institute of Technology, Rourkela

(Deemed University) is an authentic work carried out by him under my supervision and guidance.

To the best of my knowledge, the matter embodied in the thesis has not been submitted to any other University / Institute for the award of any Degree or Diploma.

Date:

Prof. G. Panda

Dept. of Electronics & Communication Engg.

National Institute of Technology

Rourkela-769008

ACKNOWLEDGEMENTS

This project is by far the most significant accomplishment in my life and it would be impossible without people who supported me and believed in me.

I would like to extend my gratitude and my sincere thanks to my honorable, esteemed supervisor Prof. G. Panda, Head, Department of Electronics and Communication

Engineering. He is not only a great lecturer with deep vision but also most importantly a kind person. I sincerely thank for his exemplary guidance and encouragement. His trust and support inspired me in the most important moments of making right decisions and I am glad to work with him.

I want to thank all my teachers Prof. G.S. Rath, Prof. K. K. Mahapatra, Prof. S.K.

Patra and Prof. S.K. Meher for providing a solid background for my studies and research thereafter. They have been great sources of inspiration to me and I thank them from the bottom of my heart.

I would like to thank all my friends and especially my classmates for all the thoughtful and mind stimulating discussions we had, which prompted us to think beyond the obvious. I’ve enjoyed their companionship so much during my stay at NIT, Rourkela.

I would like to thank all those who made my stay in Rourkela an unforgettable and rewarding experience.

Last but not least I would like to thank my parents, who taught me the value of hard work by their own example. They rendered me enormous support during the whole tenure of my stay in NIT Rourkela.

Ajit Kumar Sahoo

CONTENTS

Page No.

Abstract. i

List of Figures. iii

List of Tables.

Abbreviations Used.

v

vi

Chapter 1. Introduction.

1.1. Introduction.

1.2. Motivation.

1.3. Thesis Layout.

Chapter 2. Adaptive Modeling and System Identification.

2.1. Introduction.

2.2. Adaptive Filter.

2.3. Filter Structures.

2.4. Application of Adaptive Filters.

2.4.1. Direct Modeling.

2.4.2. Inverse Modeling.

2.5. Gradient Based Adaptive Algorithm.

2.5.1. General Form of Adaptive FIR Algorithm.

2.5.2. The Mean-Squared Error Cost Function.

2.5.3. The Wiener Solution.

2.5.4. The Method of Steepest Descent.

2.6. Least Mean Square (LMS) Algorithm.

2.7. System Identification.

2.8. Simulation Results.

2.9. Summary.

1

1

3

8

10

10

11

4

5

7

8

11

12

13

14

16

17

21

Chapter 3. System Identification Using Artificial Neural Network (ANN).

3.1. Introduction.

3.2. Single Neuron Structure.

3.2.1. Activation Functions and Bias.

3.2.2. Learning Process.

3.3. Multilayer Perceptron.

3.3.1. Back Propagation Algorithm.

3.4. Functional Link ANN (FLANN).

3.4.1. Learning Algorithm.

3.5. Cascaded FLANN (CFLANN).

3.5.1. Learning Algorithm.

3.6. Simulation Results.

3.7. Summary

Chapter 4. Pruning Using Genetic Algorithm (GA).

4.1. Introduction.

4.2. Genetic Algorithm.

4.2.1. GA Operations.

4.2.2. Population Variable.

4.2.3. Chromosome Selection.

4.2.4. Gene Crossover.

4.2.5. Chromosome Mutation.

4.3. Parameters of GA.

4.4. Pruning Using GA.

4.5. Simulation Results.

4.6. Summary.

Chapter 5. Channel Equalization.

5.1. Introduction.

5.2 .Base Band Communication System.

5.3. Channel Interference.

5.3.1. Multipath Propagation.

60

61

61

62

26

27

29

30

22

23

24

24

32

32

36

42

43

48

49

50

51

44

45

46

46

55

59

5.4. Minimum and Nonminimum Phase Channels.

5.5. Inter Symbol Interference.

5.5.1. Symbol Overlap.

5.6. Channel Equalization.

5.6.1. Transversal Filter.

5.7. Simulation Results.

5.8. Summary.

Chapter 6. Conclusions.

6.1. Conclusions.

6.2. Scope for Future Work.

References.

71

71

72

63

64

64

65

67

68

70

Abstract

In system theory, characterization and identification are fundamental problems. When the plant behavior is completely unknown, it may be characterized using certain model and then, its identification may be carried out with some artificial neural networks(ANN) like multilayer perceptron(MLP) or functional link artificial neural network(FLANN) using some learning rules such as back propagation (BP) algorithm. They offer flexibility, adaptability and versatility, so that a variety of approaches may be used to meet a specific goal, depending upon the circumstances and the requirements of the design specifications. The primary aim of the present thesis is to provide a framework for the systematic design of adaptation laws for nonlinear system identification and channel equalization. While constructing an artificial neural network the designer is often faced with the problem of choosing a network of the right size for the task. The advantages of using a smaller neural network are cheaper cost of computation and better generalization ability. However, a network which is too small may never solve the problem, while a larger network may even have the advantage of a faster learning rate. Thus it makes sense to start with a large network and then reduce its size. For this reason a Genetic Algorithm (GA) based pruning strategy is reported. GA is based upon the process of natural selection and does not require error gradient statistics. As a consequence, a GA is able to find a global error minimum.

Transmission bandwidth is one of the most precious resources in digital communication systems. Communication channels are usually modeled as band-limited linear finite impulse response (FIR) filters with low pass frequency response. When the amplitude and the envelope delay response are not constant within the bandwidth of the filter, the channel distorts the transmitted signal causing intersymbol interference (ISI). The addition of noise during propagation also degrades the quality of the received signal. All the signal processing methods used at the receiver's end to compensate the introduced channel distortion and recover the transmitted symbols are referred as channel equalization techniques

.

When the nonlinearity associated with the system or the channel is more the number of branches in FLANN increases even some cases give poor performance. To decrease the number of branches and increase the performance a two stage FLANN called cascaded

FLANN (CFLANN) is proposed. i

This thesis presents a comprehensive study covering artificial neural network (ANN) implementation for nonlinear system identification and channel equalization. Three ANN structures, MLP, FLANN, CFLANN and their conventional gradient-descent training methods are extensively studied.

Simulation results demonstrate that FLANN and CFLANN methods are directly applicable for a large class of nonlinear control systems and communication problems. ii

LIST OF FIGURES

Figure No Figure Title

Page No.

Fig.2.1 Type of adaptations 5

Fig.2.2 General Adaptive Filtering 6

Fig.2.3 Structure of an FIR Filter 8

Fig.2.4 Direct Modeling 9

Fig.2.5 Inverse Modeling 10

Fig.2.6 Block diagram of system identification 17

Fig.2.7 Response and MSE plot for linear system using LMS algorithm 18

Fig.2.8-2.11 Response and MSE plot for nonlinear systems using LMS algorithm 19

Fig.3.1 A single neuron structure 23

Fig. 3.2 Structure of multilayer perceptron 26

Fig. 3.3 Neural network using BP algorithm

Fig.3.4 Structure of the FLANN model

27

30

Fig. 3.5 Structure of CFLANN Model.

Fig.3.6-3.10 Response comparison between MLP and FLANN

Fig.3.11-3.12 Performance comparison between LMS, FLANN and CFLANN

Fig.4.1. GA Iteration Cycle

Fig.4.2. Biased roulette-wheel for the selection of the mating pool

Fig.4.3. Gene crossover

Fig.4.4 Mutation operation in GA

Fig.4.5. FLANN based identification model showing pruning path

33

37

41

45

47

49

50

53 iii

Fig.4.6 Bit allocation scheme for pruning and weight updating

Fig.4.7. Output plot for static and dynamic systems

Fig.5.1. A baseband Communication System

Fig.5.2. Impulse Response of a transmitted signal in a channel

Fig.5.3. Interaction between two neighboring symbols

Fig.5.4. Block diagram of Channel Equalization

Fig.5.5. Linear Transversal Filter

Fig.5.6. BER plot comparison between LMS, FLANN, CFLANN

54

58

61

62

65

66

67

69 iv

LIST OF TABLES

Table No.

Table Title

Page No.

24 3.1 Common activation functions.

4.1 Comparison of computational complexity between FLANN

and pruned FLANN structure for static systems.

4.2 Comparison of computational complexity between FLANN

and pruned FLANN structure for dynamic systems.

56

59 v

ISI

LAN

LMS

MLANN

MLP

MLSE

MSE

PPN

ANN

BGA

BP

CFLANN

DCR

DSP

FIR

FLANN

FPGA

GA

IIR

ISDN

ABBREVIATIONS USED

Artificial Neural Network

Binary Coded Genetic Algorithm (BGA)

Back Propagation

Cascaded Functional Link Artificial Neural Network

Digital Cellular Radio

Digital Signal Processing

Finite Impulse Response

Functional Link Artificial Neural Network

Field Programmable Gate Array

Genetic Algorithm

Infinite Impulse Response

Integrated Service Digital Network

Inter Symbol Interference

Local Area Network

Least Mean Square

Multilayer Artificial Neural Network

Multilayer Perceptron

Maximum Likelihood Sequence Estimator

Mean Square Error

Polynomial Perceptron Network vi

Chapter

1

INTRODUCTION

1. INTRODUCTION

1.1. INTRODUCTION.

System identification is one of the most important areas in engineering because of its applicability to a wide range of problems.Mathmatical system theory, which has in the past few decades evolved into a powerful scientific discipline of wide applicability, deals with analysis and synthesis of systems. The best developed theory for systems defined by linear operators using well established techniques based on linear algebra, complex variable theory and theory of ordinary linear differential equations. Design techniques for dynamical systems are closely related to their stability properties. Necessary and sufficient conditions for stability of linear time-invariant systems have been generated over past century, well-known design methods have been established for such systems. In contrast to this, the stability of nonlinear systems can be established for the most part only on a system-by-system basis.

In the past few decades major advances have been made in adaptive identification and control for identifying and controlling linear time-invariant plants with unknown parameters.

The choice of the identifier and the controller structures based on well established results in linear systems theory. Stable adaptive laws for the adjustment of parameters in these which assures the global stability of the relevant overall systems are also based on properties of linear systems as well as stability results that are well known for such systems [1.1].

In recent years, with the growth of internet technologies, high speed and efficient data transmission over communication channels has gained significant importance. The rapidly increasing computer communication has necessitated higher speed data transmission over wide spread network of voice bandwidth channels. In digital communications the symbols are sent through linearly dispersive mediums such as telephone, cable and wireless. In band width efficient data transmission systems, the effect of each symbol transmitted over such time-dispersive channel extends to the neighboring symbol intervals. This distortion caused by the resulting overlap of received data is called intersymbol interference (ISI) [1.2].

1.2. MOTIVATION

Adaptive filtering has proven to be useful in many contexts such as linear prediction, channel equalization, noise cancellation, and system identification. The adaptive filter attempts to iteratively determine an optimal model for the unknown system, or “plant”, based on some function of the error between the output of the adaptive filter and the output of the

1

plant. The optimal model or solution is attained when this function of the error is minimized. The adequacy of the resulting model depends on the structure of the adaptive filter, the algorithm used to update the adaptive filter parameters, and the characteristics of the input signal.

When the parameters of a physical system are not available or time dependent it is difficult to obtain the mathematical model of the system. In such situations, the system parameters should be obtained using a system identification procedure. The purpose of system identification is to construct a mathematical model of a physical system from input-output.

Studies on linear system identification have been carried out for more than three decades

[1.3]. However, identification of nonlinear systems is a promising research area. Nonlinear characteristics such as saturation, dead-zone, etc. are inherent in many real systems. In order to analyze and control such systems, identification of nonlinear system is necessary. Hence, adaptive nonlinear system identification has become more challenging and received much attention in recent years [1.4].

High speed data transmission over communication channels is subject to intersymbol interference (ISI) and noise. The intersymbol interference is usually the result of the restricted bandwidth allocated to the channel and/or the presence of multipath distortion in the medium through which the information is transmitted. Equalization is the process which reconstructs the transmitted data jointly combating the ISI and the noise in the communication link. The simplest architecture in the class of equalizers making decisions in a symbol–by–symbol basis is the linear transversal filter. The field of digital data communications has experienced an explosive growth in recent years and its demand reaches at the peak as additional services are being added to existing infrastructure. The telephone networks were originally designed for voice communication but, in recent times, the advances in digital communications using

Integrated Service Digital Network (ISDN), data communications with computers, fax, video conferencing etc. have pushed the use of these facilities far beyond the scope of their original intended use. Similarly, introduction of digital cellular radio (DCR) and wireless local area networks (LAN’s) have stretched the limited available radio spectrum capacity to the limits it can offer. These advances in digital communications have been made possible by the effective use of the existing communication channels with aid of signal processing techniques. Nevertheless these advances on the existing infrastructure have introduced a host of new unanticipated problems. The conventional LMS algorithm [1.5] fails in case of nonlinear channels. Hence non-linear channel estimation is a key problem in communication

2

system. Several approaches based on Artificial Neural Network (ANN) have been discussed recently for estimation of nonlinear channels.

1.3. THESIS LAYOUT

In Chapter 2, adaptive modeling and system identification problem is defined for linear and nonlinear plants. The conventional LMS algorithm and other gradient based algorithm for

FIR system are derived. Nonlinearity problems are discussed briefly and various methods are proposed for its solution.

In Chapter 3, the theory, structure and algorithms of various artificial neural networks are discussed. We focus on Multilayer Perceptron (MLP), Functional Link ANN (FLANN) and

Cascaded Functional Link ANN (CFLANN). We discuss the learning rule in each of the methods. Simulation results are carried out for comparisons of ANN technique with conventional LMS method under different nonlinear condition and noise.

Chapter 4 gives an introduction to evolutionary computing technique and discusses in details about genetic algorithm and its operators. It also discusses various selection schemes for population and crossover. In this chapter Genetic Algorithm is used for simultaneous pruning and weight updation for efficient nonlinear system identification.

In Chapter 5, the adaptive channel equalization is defined for and nonlinear channels.

Different kinds of communication channel and inter symbol interference is discussed. The performance of conventional LMS algorithm based equalizer and other ANN structures such as FLANN and CFLANN equalizer are compared.

Chapter 6 summarizes the work done in this thesis work and points to possible directions for future work.

3

Chapter

2

ADAPTIVE MODELING AND SYSTEM

IDENTIFICATION

2. ADAPTIVE MODELING AND SYSTEM IDENTIFICATION

2.1. INTRODUCTION

Modeling and system identification is a very broad subject, of great importance in the fields of control system, communications, and signal processing. Modeling is also important outside the traditional engineering discipline such as social systems, economic systems, or biological systems. An adaptive filter can be used in modeling that is, imitating the behavior of physical systems which may be regarded as unknown “black boxes” having one or more inputs and one or more outputs.

The essential and principal property of an adaptive system is its time-varying, selfadjusting performance. System identification [2.1, 2.2] is the experimental approach to process modeling. System identification includes the following steps

Experiment design Its purpose is to obtain good experimental data and it includes the choice of the measured variables and of the character of the input signals.

Selection of model structure A suitable model structure is chosen using prior knowledge and trial and error.

Choice of the criterion to fit: A suitable cost function is chosen, which reflects how well the model fits the experimental data.

Parameter estimation An optimization problem is solved to obtain the numerical values of the model parameters.

Model validation: The model is tested in order to reveal any inadequacies.

The adaptive systems have following characteristics

1) They can automatically adapt (self-optimize) in the face of changing (nonstationary) environments and changing system requirements.

2) They can be trained to perform specific filtering and decision making tasks.

3) They can extrapolate a model of behavior to deal with new situations after trained on a finite and often small number of training signals and patterns.

4) They can repair themselves to a limited extent.

5) They can be described as nonlinear systems with time varying parameters.

The adaptation is of two types

(i) open-loop adaptation

The open-loop adaptive process is shown in Fig.2.1.(a). It involves making measurements

4

of input or environment characteristics, applying this information to a formula or to a computational algorithm, and using the results to set the adjustments of the adaptive system.

The adaptation of process parameters don’t depend upon the output signal.

Input

signal

Output

signal

Input

signal

Output

signal

Processor

Processor

Adaptive algorithm

Other

data

Adaptive algorithm

Performance

calculation

(a)

(b)

Fig.2.1. Type of adaptations (a) Open-loop adaptation and (b) Closed-loop adaptation

Other

data

(ii) closed-loop adaptation

Close-loop adaptation, as shown in Fig. 2.1.(b),on the other hand involves the automatic experimentation with these adjustments and knowledge of their outcome in order to optimize a measured system performance. The latter process may be called adaptation by

“performance feedback”. The adaptation of process parameters depends upon the input as well as output signal.

2.2. ADAPTIVE FILTER

An adaptive filter [2.3, 2.4] is a computational device that attempts to model the relationship between two signals in real time in an iterative manner. Adaptive filters are often realized either as a set of program instructions running on an arithmetical processing device such as a microprocessor or digital signal processing (DSP) chip, or as a set of logic operations implemented in a field-programmable gate array (FPGA).

However, ignoring any errors introduced by numerical precision effects in these implementations, the fundamental operation of an adaptive filter can be characterized independently of the specific physical realization that it takes. For this reason, we

5

shall focus on the mathematical forms of adaptive filters as opposed to their specific realizations in software or hardware. An adaptive filter is defined by four aspects:

1. The signals being processed by the filter.

2. The structure that defines how the output signal of the filter is computed from its input signal

3. The parameters within this structure that can be iteratively changed to alter the filter's input-output relationship

4. The adaptive algorithm that describes how the parameters are adjusted from one time instant to the next.

By choosing a particular adaptive filter structure, one specifies the number and type of parameters that can be adjusted. The adaptive algorithm used to update the parameter values of the system can take on an infinite number of forms and is often derived as a form of optimization procedure that minimizes an error.

x(n)

Adaptive Filter

y (n)

Σ

+

e(n) d(n)

Fig.2.2. General Adaptive Filtering

Fig.2.2. shows a block diagram in which a sample from a digital input signal x(n) is fed into a device, called an adaptive filter, that computes a corresponding output signal sample y(n) at time n. For the moment, the structure of the adaptive filter is not important, except for the fact that it contains adjustable parameters whose values affect how y(n) is computed. The output signal is compared to a second signal

d(n), called the desired response signal, by subtracting the two samples at time n. This difference signal, given by

( )

=

( )

( )

(2.1) is known as the error signal. The error signal is fed into a procedure which alters or

6

adapts the parameters of the filter from time n to time (n + 1) in a well-defined manner. As the time index n is incremented, it is hoped that the output of the adaptive filter becomes a better and better match to the desired response signal through this adaptation process, such that the magnitude of decreases over time. In the adaptive filtering task, adaptation refers to the method by which the parameters of the system are changed from time index n to time index (n +1). The number and types of parameters within this system depend on the computational structure chosen for the system. We now discuss different filter structures that have been proven useful for adaptive filtering tasks.

2.3. FILTER STRUCTURES

In general, any system with a finite number of parameters that affect how y(n) is computed from x(n) could be used for the adaptive filter in Fig. 2.2.. Define the parameter or coefficient vector W(n)

W n

=

w n w n w

L

-1

n

T

(2.2) where {w i

(n)}, 0 < i < L - 1 are the L parameters of the system at time n. With this definition, we could define a general input-output relationship for the adaptive filter as

( )

=

(

y n N x n x n l x n M l

))

(2.3) where f ( ) represents any well-defined linear or nonlinear function and M and N are positive integers. Implicit in this definition is the fact that the filter is causal, such that future values of are not needed to compute. While non-causal filters can be handled in practice by suitably buffering or storing the input signal samples, we do not consider this possibility.

Although Equation (2.3) is the most general description of an adaptive filter structure, we are interested in determining the best linear relationship between the input and desired response signals for many problems. This relationship typically takes the form of a finiteimpulse-response (FIR) or infinite-impulse-response (IIR) filter. Figure2.3. shows the structure of a direct-form FIR filter, also known as a tapped-delay-line or transversal filter, where z

-1 denotes the unit delay element and each w

i

(n) is a multiplicative gain within the system. In this case, the parameters in W(n) correspond to the impulse response values of the filter at time n. We can write the output signal y(n) as

=

L

1

i

=

0

i

( ) (

) (2.4)

7

=

T

(2.5)

where

=

x n x n

( -

L

+

)]

T

denotes the input signal vector and -

T

denotes vector transpose. Note that this system requires L multiplies and L - 1 adds to implement and these computations are easily performed by a processor or circuit so long as L is not too large and the sampling period for the signals is not too short. It also requires a total of 2L memory locations to store the L input signal samples and the L coefficient values, respectively.

x(n) z

-1 x(n-1) z

-1 x(n-2)

. . . . . .

z

-1 x(n-L+1) w

0

(n) w

1

(n) w

2

(n)

. . . . . .

. . . . . .

w

L-1

(n)

y(n)

Fig. 2.3. Structure of an FIR Filter

2.4. APPLICATION OF ADAPTIVE FILTERS.

Perhaps the most important driving forces behind the developments in adaptive filters throughout their history have been the wide range of applications in which such systems can be used. We now discuss the forms of these applications in terms of more-general problem classes that describe the assumed relationship between d(n) and x(n). Our discussion illustrates the key issues in selecting an adaptive filter for a particular task.

2.4.1. Direct Modeling (System Identification)

In this type of modeling the adaptive model is kept parallel with the unknown plant.

Modeling a single-input, single-output system is illustrated in Fig.2.4..Both the unknown system and adaptive filter are driven by the same input. The adaptive filter adjusts itself in such a way that its output is match with that of the unknown system. Upon convergence, the structure and parameter values of the adaptive system may or may not resemble those of unknown systems, but the input-output response relationship will match. In this sense, the adaptive system becomes a model of the unknown plant

8

Unknown plant

d(n) x(n)

Σ

+

e(n)

Adaptive model

y (n)

Fig.2.4. Direct Modelling

Let d(n) and y(n) represent the output of the unknown system and adaptive model with

x(n) as its input.

Here, the task of the adaptive filter is to accurately represent the signal d(n) at its output. If

y(n) = d (n), then the adaptive filter has accurately modeled or identified the portion of the unknown system that is driven by x(n).

Since the model typically chosen for the adaptive filter is a linear filter, the practical goal of the adaptive filter is to determine the best linear model that describes the input-output relationship of the unknown system. Such a procedure makes the most sense when the unknown system is also a linear model of the same structure as the adaptive filter, as it is possible that y(n) = d(n) for some set of adaptive filter parameters. For ease of discussion, let the unknown system and the adaptive filter both be FIR filters, such that

( )

=

W

T

OPT

( ) ( ) (2.6) where W

OPT

(n) is an optimum set of filter coefficients for the unknown system at time n. In this problem formulation, the ideal adaptation procedure would adjust W(n) such that W(n) =

W

OPT

(n) as n

→ ∞

. In practice, the adaptive filter can only adjust W(n) such that y(n) closely approximates d(n) over time.

The system identification task is at the heart of numerous adaptive filtering applications. We list several of these applications here[2.3]

• Plant Identification

• Echo Cancellation for Long-Distance Transmission

• Acoustic Echo Cancellation

• Adaptive Noise Canceling

9

2.4.2. Inverse Modeling

We now consider the general problem of inverse modeling, as shown in Fig.2.5. In this diagram, a source signals s(n) is fed into a plant that produces the input signal x(n) for the adaptive filter. The output of the adaptive filter is subtracted from a desired response signal that is a delayed version of the source signal, such that

(2.7)

where ∆ is a positive integer value. The goal of the adaptive filter is to adjust its characteristics such that the output signal is an accurate representation of the delayed source signal.

delay

Plant noise

s(n)

plant

+

+

Σ

x(n)

Adaptive filter

(inverse model)

y (n)

+

Σ

d(n) e(n)

Fig.2.5. Inverse Modelling

2.5. GRADIENT BASED ADAPTIVE ALGORITHM

An adaptive algorithm is a procedure for adjusting the parameters of an adaptive filter to minimize a cost function chosen for the task at hand. In this section, we describe the general form of many adaptive FIR filtering algorithms and present a simple derivation of the LMS adaptive algorithm. In our discussion, we only consider an adaptive FIR filter structure, such that the output signal y(n) is given by (2.5). Such systems are currently more popular than adaptive IIR filters because

(1) The input-output stability of the FIR filter structure is guaranteed for any set

of fixed coefficients, and

(2) The algorithms for adjusting the coefficients of FIR filters are simpler in general

than those for adjusting the coefficients of IIR filters.

10

2.5.1. General Form of Adaptive FIR Algorithm

The general form of an adaptive FIR filtering algorithm is

W

(

n

+

1 )

=

W

(

n

)

+

μ

(

n

)

G

(

e

(

n

),

X

(

n

),

φ

(

n

)) (2.8)

where G(-) is a particular vector-valued nonlinear function, μ(n) is a step size parameter, e(n) and X(n) are the error signal and input signal vector, respectively, and

φ

is a vector of states that store pertinent information about the characteristics of the input and error signals and/or the coefficients at previous time instants. In the simplest algorithms,

φ

( )

is not used, and the only information needed to adjust the coefficients at time n are the error signal, input signal vector, and step size.

The step size is so called because it determines the magnitude of the change or

"step" that is taken by the algorithm in iteratively determining a useful coefficient vector. Much research effort has been spent characterizing the role that

μ

( ) plays in the performance of adaptive filters in terms of the statistical or frequency characteristics of the input and desired response signals. Often, success or failure of an adaptive filtering application depends on how the value of μ(n) is chosen or calculated to obtain the best performance from the adaptive filter.

2.5.2. The Mean-Squared Error Cost Function

The form of G(-) in (2.8) depends on the cost function chosen for the given adaptive filtering task. We now consider one particular cost function that yields a popular adaptive algorithm. Define the mean-squared error (MSE) cost function as

J

MSE

(

n

)

=

1

2

− ∞

e

2

(

n

)

p n

(

e

(

n

))

de

(

n

) (2.9)

=

1

2

E

{

e

2

(

n

)}

(2.10) where p

n

(e(n)) represents the probability density function of the error at time n and

E{-} is shorthand for the expectation integral on the right-hand side of (2.10). The MSE cost function is useful for adaptive FIR filters because

J

MSE

(n) has a well-defined minimum with respect to the parameters in W(n);

11

• the coefficient values obtained at this minimum are the ones that minimize the power in the error signal e(n), indicating that y(n) has approached d{n); and

J

MSE

is a smooth function of each of the parameters in W(n), such that it is differentiable with respect to each of the parameters in W(n).

The third point is important in that it enables us to determine both the optimum coefficient values given knowledge of the statistics of d(n) and x(n) as well as a simple iterative procedure for adjusting the parameters of an FIR filter.

2.5.3. The Wiener Solution.

For the FIR filter structure, the coefficient values in W(n) that minimize J

MSE

(n) are well-defined if the statistics of the input and desired response signals are known. The formulation of this problem for continuous-time signals and the resulting solution was first derived by Wiener [2.5]. Hence, this optimum coefficient vector W

MSE

(n) is often called the Wiener solution to the adaptive filtering problem. The extension of

Wiener's analysis to the discrete-time case is attributed to Levinson [2.6]. To determine W

MSE

(n)

we note that the function J

MSE

(n) in (2.10) is quadratic in the parameters {w

i

(n)}, and the function is also differentiable. Thus, we can use a result from optimization theory that states that the derivatives of a smooth cost function with respect to each of the parameters is zero at a minimizing point on the cost function error surface. Thus, W

MSE

(n) can be found from the solution to the system of equations

J

MSE

w i

(

n

(

n

)

)

=

0

, 0

i

L

1 (2.11)

Taking derivatives of J

MSE

(n) in (2.10) we obtain

J

MSE

w i

(

n

(

n

)

)

=

E

{

e

(

n

)

e

(

w i

(

n

)

n

)

}

(2.12)

= −

E

{

e

(

n

)

y

(

w i

(

n n

)

)

}

(2.13)

= −

E

{

e

(

n

)

x

(

n

i

)} (2.14)

12

=

j

L

-1

=

0

(2.15) where we have used the definitions of e(n) and of y(n) for the FIR filter structure in

(2.1) and (2.5), respectively, to expand the last result in (2.15). By defining the matrix

R

XX

(n)(autocorrelation matrix) and vector P dx

(n)(cross correlation matrix) as

R

XX and

=

dx

( )

=

{ ( ). ( )}

(2.16) respectively, we can combine (2.11) and (2.15) to obtain the system of equations in vector form as

R

XX

(

n

)

W

MSE

(

n

)

P dx

(

n

)

=

0 (2.17) where 0 is the zero vector. Thus, so long as the matrix R

XX

(n) is invertible, the optimum Wiener solution vector for this problem is

W

MSE

(

n

)

=

R

XX

1

(

n

)

P dx

(

n

)

(2.18)

2.5.4. The Method of Steepest Descent

The method of steepest descent is a celebrated optimization procedure for minimizing the value of a cost function J(n) with respect to a set of adjustable parameters W(n). This procedure adjusts each parameter of the system according to

w i

(

n

+

1 )

=

w i

(

n

)

μ

(

n

)

J

(

n

)

w i

(

n

)

(2.19)

In other words, the i

th

parameter of the system is altered according to the derivative of the cost function with respect to the i

th

parameter. Collecting these equations in vector form, we have

W

(

n

+

1 )

=

W

(

n

)

μ

(

n

)

J

(

n

)

W

(

n

)

(2.20) where ∂J(n)/∂W(n) is a vector of derivatives dJ(n)/dw

i

(n).

Substituting these results into (2.19) yields the update equation for W(n) as

W

(

n

+

1 )

=

W

(

n

)

+

μ

(

n

)(

P dx

(

n

)

R

XX

(

n

)

W

(

n

)) (2.21)

13

However, this steepest descent procedure depends on the statistical quantities

E{d(n)x(n-i)} and E{x(n-i)x(n-j)} contained in P

dx

(n) and R

xx

(n), respectively. In practice, we only have measurements of both d(n) and x(n) to be used within the adaptation procedure. While suitable estimates of the statistical quantities needed for

(2.21) could be determined from the signals x(n) and d{n), we instead develop an approximate version of the method of steepest descent that depends on the signal values themselves. This procedure is known as the LMS(least mean square) algorithm.

2.6. LMS ALGORITHM

The cost function J(n) chosen for the steepest descent algorithm of (2.19) determines the coefficient solution obtained by the adaptive filter. If the MSE cost function in

(2.10) is chosen, the resulting algorithm depends on the statistics of x(n) and d(n) because of the expectation operation that defines this cost function. Since we typically only have measurements of d(n) and of x(n) available to us, we substitute an alternative cost function that depends only on these measurements. One such cost function is the least-squares cost function given by

J

LS

(

n

)

=

k n

=

0

α

(

k

)(

d

(

k

)

W

T

(

n

)

X

(

k

))

2

(2.22)

where α(n) is a suitable weighting sequence for the terms within the summation.

This cost function, however, is complicated by the fact that it requires numerous computations to calculate its value as well as its derivatives with respect to each

W(n), although efficient recursive methods for its minimization can be developed.

Alternatively, we can propose the simplified cost function J

LMS

(n)given by

J

LMS

(

n

)

=

1

2

e

2

(

n

) (2.23)

This cost function can be thought of as an instantaneous estimate of the MSE cost function, as J

MSE

(n)

=

E{J

LMS

(n)}. Although it might not appear to be useful, the resulting algorithm obtained when J

LMS

(n) is used for J(n) in (2.19) is extremely useful for practical applications. Taking derivatives of J

LMS

(n)

with respect to the elements of W(n) and substituting the result into (2.19), we obtain the LMS adaptive algorithm given by

14

W

(

n

+

1 )

=

W

(

n

)

+

μ

(

n

)

e

(

n

)

X

(

n

) (2.24)

Equation (2.24) requires only multiplications and additions to implement. In fact, the number and type of operations needed for the LMS algorithm is nearly the same as that of the FIR filter structure with fixed coefficient values, which is one of the reasons for the algorithm's popularity.

The behavior of the LMS algorithm has been widely studied, and numerous results concerning its adaptation characteristics under different situations have been developed. For now, we indicate its useful behavior by noting that the solution obtained by the LMS algorithm near its convergent point is related to the Wiener solution. In fact, analysis of the LMS algorithm under certain statistical assumptions about the input and desired response signals show that lim

n

→∞

{ ( )}

=

W

MSE

( ) (2.25) when the Wiener solution W

MSE

(n) is a fixed vector. Moreover, the average behavior of the LMS algorithm is quite similar to that of the steepest descent algorithm in (2.21) that depends explicitly on the statistics of the input and desired response signals. In effect, the iterative nature of the LMS coefficient updates is a form of time-averaging that smoothes the errors in the instantaneous gradient calculations to obtain a more reasonable estimate of the true gradient.

The problem is that gradient descent is a local optimization technique, which is limited because it is unable to converge to the global optimum on a multimodal error surface if the algorithm is not initialized in the basin of attraction of the global optimum.

Several modifications exist for gradient based algorithms in attempt to enable them to overcome local optima. One approach is to simply add a momentum term [2.3] to the gradient computation of the gradient descent algorithm to enable it to be more likely to escape from a local minimum. This approach is only likely to be successful when the error surface is relatively smooth with minor local minima, or some information can be inferred about the topology of the surface such that the additional gradient parameters can be assigned accordingly. Other approaches attempt to transform the error surface to eliminate or diminish the presence of local minima [2.16], which would ideally result in a unimodal error surface. The problem with these approaches is that the resulting minimum transformed error used to update the adaptive filter can be biased from the true minimum output error and the algorithm may not be able to converge to the desired minimum error

15

condition. These algorithms also tend to be complex, slow to converge, and may not be guaranteed to emerge from a local minimum. Some work has been done with regard to removing the bias of equation error LMS [2.7][2.8] and Steiglitz-McBride [2.9] adaptive IIR filters, which add further complexity with varying degrees of success.

Another approach [2.10], attempts to locate the global optimum by running several LMS algorithms in parallel, initialized with different initial coefficients. The notion is that a larger, concurrent sampling of the error surface will increase the likelihood that one process will be initialized in the global optimum valley. This technique does have potential, but it is inefficient and may still suffer the fate of a standard gradient technique in that it will be unable to locate the global optimum. By using a similar congregational scheme, but one in which information is collectively exchanged between estimates and intelligent randomization is introduced, structured stochastic algorithms are able to hill-climb out of local minima.

This enables the algorithms to achieve better, more consistent results using a fewer number of total estimate.

2.7. SYSTEM IDENTIFICATION

System identification concerns with the determination of a system, on the basis of input output data samples. The identification task is to determine a suitable estimate of finite dimensional parameters which completely characterize the plant. The selection of the estimate is based on comparison between the actual output sample and a predicted value on the basis of input data up to that instant. An adaptive automaton is a system whose structure is alterable or adjustable in such a way that its behavior or performance improves through contact with its environment.

Depending upon input-output relation, the identification of systems can have two groups

A. Static System Identification

In this type of identification the output at any instant depends upon the input at that instant.

These systems are described by the algebraic equations. The system is essentially a memoryless one and mathematically it is represented as y(n) = f [x(n)] where y(n) is the output at the nth instant corresponding to the input x(n).

B. Dynamic System Identification

In this type of identification the output at any instant depends upon the input at that instant as well as the past inputs and outputs. Dynamic systems are described by the difference or differential equations. These systems have memory to store past values and mathematically

16

represented as y(n)=f [x(n), x(n-1),x(n-2)………..y(n-1),y(n-2),……] where y(n) is the output at the nth instant corresponding to the input x(n).

Noise

Nonlinear plant

q(n) x(n)

Plant

h(n)

N.L.

+

d(n)

+

e(n)

_

y(n)

Model

Update algorithm

Fig.2.6. Block diagram of system identification

A system identification structure is shown in Fig.2.6. The model is placed parallel to the nonlinear plant and same input is given to the plant as well as the model. The impulse response of the linear segment of the plant is represented by h(n) which is followed by nonlinearity(NL) associated with it. White Gaussian noise q(n) is added with nonlinear output accounts for measurement noise. The desired output d(n) is compared with the estimated output y(n) of the identifier to generate the error e(n) which is used by some adaptive algorithm for updating the weights of the model. The training of the filter weights is continued until the error becomes minimum and does not decrease further. At this stage the correlation between input signal and error signal is minimum. Then the training is stopped and the weights are stored for testing. For testing purpose new samples are passed through both the plant and the model and their responses are compared.

2.8. SIMULATION RESULTS

The performance of LMS algorithm is tested for both linear and nonlinear systems. For identification purpose a tap delay filter with three taps is used. The parameter of the linear

17

part of the plant is h(n)= [0.26 0.93 0.26].The different type of nonlinearity considered here are

(I)

(II)

b n

=

a n

+

a n

(III)

b n

=

a n

a n

(IV)

b n

=

a n

+

a n

− +

π

a n

(2.26)

For the simulation the initial parameters of the model taken as zeros. Gaussian noise of signal to noise ratio (SNR) 30dB was added which accounts for measurement noise. The input to the plant was taken from a uniformly distributed random signal over the interval

[-0.5, 0.5] .The adaptation is continued for 2000 iterations which is ensembled over 50 iterations. After training filter weights remain fixed. For testing new 20 samples are generated and pass through the plant as well as model. The mean square error (MSE) and responses are plotted for the linear and nonlinear systems.

(i)For linear system:

MSE plot response plot

0

0.5

-5

-10

-15

-20

0

-25

-30 actual lms

-35

0 500 1000 no. of iterations

1500 2000

-0.5

0 5 10 no. of samples

15

(a) (b)

Fig. 2.7. (a) MSE plot ,(b)response plot

20

18

(ii)For nonlinearity (I)

MSE plot

0

0.5

response plot

-5

-10

-15

0

-20

-25

-30 actual lms

-35

0 500 1000 no. of iterations

1500 2000

-0.5

0 5 10 no. of samples

15

(a) (b)

Fig. 2.8. (a) MSE plot ,(b)response plot

(iii)For nonlinearity (II)

MSE plot response plot

0 0.5

-5

-10

-15

-20

0

-25 actual lms

-30

0 500 1000 no. of iterations

1500 2000

-0.5

0 5 10 no. of samples

15

(a) (b)

Fig. 2.9. (a) MSE plot ,(b)response plot

20

20

19

(iv)For nonlinearity (III)

MSE plot

0 response plot

0.4

0.3

-5

0.2

0.1

-10

-15

0

-0.1

-0.2

-20

-0.3

actual lms

-25

0 500 1000 no. of iterations

1500 2000

-0.4

0 5 10 no. of samples

15

(a) (b)

Fig. 2.10. (a) MSE plot ,(b)response plot

(v)For nonlinearity (IV)

MSE plot

0 0.8

response plot

-1

0.6

0.4

0.2

-2

0

-0.2

-3

-0.4

actual lms

-4

0 500 1000 no. of iterations

1500 2000

-0.6

0 5 10 no. of samples

15

(a) (b)

Fig. 2.11. (a) MSE plot ,(b)response plot

20

20

20

2.9. SUMMARY

Application of adaptive filter and two types of modeling is described in this chapter.

System identification deals with direct modeling. The LMS algorithm is used for system identification purpose because of its simplicity. From Fig (2.7) to (2.11) it is observed that for linear system LMS algorithm based model gives best result. As the nonlinearity associated with the system goes on increasing the LMS based model response deviates from the actual response. Taking different types of nonlinearity the MSE and responses are plotted. From

Fig.2.11 it is seen that the actual response and the LMS based model response do not match anywhere. From this we conclude that LMS based models are best for linear systems.

21

Chapter

3

SYSTEM IDENTIFICATION USING ANN

3. SYSTEM IDENTIFICATION USING ANN

3.1. INTRODUCTION

Because of nonlinear signal processing and learning capability, Artificial Neural Networks

(ANN’s) have become a powerful tool for many complex applications including functional approximation, nonlinear system identification and control, pattern recognition and classification, and optimization. The ANN’s are capable of generating complex mapping between the input and the output space and thus, arbitrarily complex nonlinear decision boundaries can be formed by these networks. An artificial neuron basically consists of a computing element that performs the weighted sum of the input signal and the connecting weight. The sum is added with the bias or threshold and the resultant signal is then passed through a non-linear element of tanh(.) type. Each neuron is associated with three parameters whose learning can be adjusted; these are the connecting weights, the bias and the slope of the non-linear function. For the structural point of view a neural network(NN) may be single layer or it may be multi-layer. In multi-layer structure, there is one or many artificial neurons in each layer and for a practical case there may be a number of layers. Each neuron of the one layer is connected to each and every neuron of the next layer.

A neural network is a massively parallel distributed processor made up of simple processing unit, which has a natural propensity for storing experimental knowledge and making it available for use. It resembles the brain in two types

1. Knowledge is acquired by the network from its environment through a learning process.

2. Interneuron connection strengths, known as synaptic weights, are used to store the acquired knowledge.

Artificial Neural Networks (ANN) has emerged as a powerful learning technique to perform complex tasks in highly nonlinear dynamic environments. Some of the prime advantages of using ANN models are their ability to learn based on optimization of an appropriate error function and their excellent performance for approximation of nonlinear function [3.1]. At present, most of the work on system identification using neural networks are based on multilayer feed forward neural networks with back propagation learning or more efficient variations of this algorithm [3.2] ,[3.3].On the otherhand the Functional link

ANN(FLANN) originally proposed by Pao[3.4] is a single layer structure with functionally mapped inputs. The performance of FLANN for system identification of nonlinear systems

22

has been reported [3.5] in the literature. Patra and Kot [3.6] have used Chebyschev expansions for nonlinear system identification and have shown that the identification performance is better than that offered by the multilayer ANN (MLANN) model. Wang and

Chen [3.7] have presented a fully automated recurrent neural network (FARNN) that is capable of self-structuring its network in a minimal representation with satisfactory performance for unknown dynamic system identification and control.

3.2. SINGLE NEURON STRUCTURE

In 1958, Rosenblatt demonstrated some practical applications using the perceptron [3.8].

The perceptron is a single level connection of McCulloch-Pitts neurons sometimes called single-layer feed forward networks. The network is capable of linearly separating the input vectors into pattern of classes by a hyper plane. A linear associative memory is an example of a single-layer neural network. In such an application, the network associates an output pattern

(vector) with an input pattern (vector), and information is stored in the network by virtue of modifications made to the synaptic weights of the network.

b(n) x

1 w

1 x

2 w

2 w

N

f(.)

y(n) x

N

Fig. 3.1. A single Neuron

The structure of a single neuron is presented in Fig. 3.1.An artificial neuron involves the computation of the weighted sum of inputs and threshold [3.9, 3.10]. The resultant signal is then passed through a non-linear activation function. The output of the neuron may be represented as,

( )

f

j

N

=

1

( ) ( )

+

( )

(3.1)

Where b(n) = threshold to the neuron is called as bias.

w j

(n) = weight associated with the j

th

input, and N = no. of inputs to the neuron.

23

3.2.1. Activation Functions and Bias.

The perceptron internal sum of the inputs is passed through an activation function, which can be any monotonic function. Linear functions can be used but these will not contribute to a non-linear transformation within a layered structure, which defeats the purpose of using a neural filter implementation. A function that limits the amplitude range and limits the output strength of each perceptron of a layered network to a defined range in a non-linear manner will contribute to a nonlinear transformation. There are many forms of activation functions, which are selected according to the specific problem. All the neural network architectures employ the activation function [3.1, 3.8] which defines as the output of a neuron in terms of the activity level at its input (ranges from -1 to 1 or 0 to 1). Table 3.1 summarizes the basic types of activation functions. The most practical activation functions are the sigmoid and the hyperbolic tangent functions. This is because they are differentiable.

The bias gives the network an extra variable and the networks with bias are more powerful than those of without bias. The neuron without a bias always gives a net input of zero to the activation function when the network inputs are zero. This may not be desirable and can be avoided by the use of a bias.

Table 3.1 COMMON ACTIVATION FUNCTIONS

Name

Linear

Step

Definition

( )

=

kx

=

=

β

,

δ

,

if x

if x

<

k k

Sigmoid

Hyperbolic Tangent

Gaussian

=

1

+

1

e

α

x

,

α

>

0

( )

= tanh(

γ

x

)

=

1

1

+

e

γ

e

− γ

x x

,

γ

>

0

=

1

2

πσ

2 exp

(

x

μ

)

2

2

σ

2

3.2.2 Learning Processes

The property that is of primary significance for a neural network is that the ability of the network to learn from its environment, and to improve its performance through learning. The improvement in performance takes place over time in accordance with some prescribed

24

measure. A neural network learns about its environment through an interactive process of adjustments applied to its synaptic weights and bias levels. Ideally, the network becomes more knowledgeable about its environment after each iteration of learning process. Hence we define learning as:

“It is a process by which the free parameters of a neural network are adapted through a process of stimulation by the environment in which the network is embedded.”

The processes used are classified into two categories as described in [3.1]:

(A) Supervised Learning (Learning With a Teacher)

(B) Unsupervised Learning (Learning Without a Teacher)

(A) Supervised Learning:

We may think of the teacher as having knowledge of the environment, with that knowledge being represented by a set of input-output examples. The environment is, however unknown to neural network of interest. Suppose now the teacher and the neural network are both exposed to a training vector, by virtue of built-in knowledge, the teacher is able to provide the neural network with a desired response for that training vector. Hence the desired response represents the optimum action to be performed by the neural network. The network parameters such as the weights and the thresholds are chosen arbitrarily and are updated during the training procedure to minimize the difference between the desired and the estimated signal. This updation is carried out iteratively in a step-by-step procedure with the aim of eventually making the neural network emulate the teacher. In this way knowledge of the environment available to the teacher is transferred to the neural network. When this condition is reached, we may then dispense with the teacher and let the neural network deal with the environment completely by itself. This is the form of supervised learning.

The update equations for weights are derived as LMS [3.11]:

( )

j

μ

j

( ) (3.2)

Δ

w n

is the change in w

j

in nth iteration.

(B) Unsupervised Learning:

In unsupervised learning or self-supervised learning there is no teacher to over-see the learning process, rather provision is made for a task independent measure of the quantity of representation that the network is required to learn, and the free parameters of the network are optimized with respect to that measure. Once the network has become turned to the statistical regularities of the input data, it develops the ability to form the internal representations for

25

encoding features of the input and thereby to create new classes automatically. In this learning the weights and biases are updated in response to network input only. There are no desired outputs available. Most of these algorithms perform some kind of clustering operation. They learn to categorize the input patterns into some classes.

3.3. MULTILAYER PERCEPTRON

In the multilayer neural network or multilayer perceptron (MLP), the input signal propagates through the network in a forward direction, on a layer-by-layer basis. This network has been applied successfully to solve some difficult and diverse problems by training in a supervised manner with a highly popular algorithm known as the error backpropagation algorithm [3.1,3.9]. The scheme of MLP using four layers is shown in Fig.3.2.

( )

f k

represent the output of the two hidden layers and

( ) connecting weights between the input to the first hidden layer, first to second hidden layer and the second hidden layer to the output layers are represented by

,

jk

and

w kl

respectively.

Input

Signal,

( )

+1

w ij

+1

w jk

+1

w kl

Output

Signal

( )

Input layer

(Layer-1)

First

Hidden

layer

(Layer-2)

Second

Hidden layer

(Layer-3)

Output layer

(Layer-4)

Fig. 3.2 Structure of multilayer perceptron

If P

1

is the number of neurons in the first hidden layer, each element of the output vector of first hidden layer may be calculated as,

f j

= ϕ

j

N

i

=

1

( )

+

b j

i

=

1, 2,3,... ,

=

1, 2,3,...

26

where ϕ

( )

is the nonlinear activation function in the first hidden layer chosen from the Table 3.1.

The time index n has been dropped to make the equations simpler. Let P

2

be the number of neurons in the second hidden layer. The output of this layer is represented as,

f k

and may be written as

f k

= ϕ

k

j

P

1 ∑

=

1

w f j b k

, k=1, 2, 3, …, P

2

(3.4)

where,

b k

is the threshold to the neurons of the second hidden layer. The output of the final output layer can be calculated as

( ) ϕ

k

P

2 ∑

=

1

w f b l

, l=1, 2, 3, … , P

3

(3.5)

where,

α is the threshold to the neuron of the final layer and P

3

is the no. of neurons in the output layer. The output of the MLP may be expressed as

( )

= ϕ

n

P

2

w kl

ϕ

k

P

1

k

∑ ∑

=

1

j

=

1

w jk

ϕ

j

i

N

=

1

( )

+

b j

+

b k

+

b l

(3.6)

3.3.1. Backpropagation Algorithm.

x

1 x

2

Back-Propagation

Algorithm

( )

Σ

-

( )

+

Fig. 3.3 Neural network using BP algorithm

An MLP network with 2-3-2-1 neurons (2, 3, 2 and 1 denote the number of neurons in the input layer, the first hidden layer, the second hidden layer and the output layer respectively) with the back-propagation (BP) learning algorithm, is depicted in Fig.3.3. The parameters of

27

the neural network can be updated in both sequential and batch mode of operation. In BP algorithm, initially the weights and the thresholds are initialized as very small random values.

The intermediate and the final outputs of the MLP are calculated by using (3.3), (3.4.), and

(3.5.) respectively.

The final output output

( ) and the resulting error signal

( )

( )

=

( )

( )

(3.7)

The instantaneous value of the total error energy is obtained by summing all error signals over all neurons in the output layer, that is

ξ

( )

=

1

2

P

3

l

=

1

e l

2

( )

(3.8) where P

3

is the no. of neurons in the output layer.

This error signal is used to update the weights and thresholds of the hidden layers as well as the output layer. The reflected error components at each of the hidden layers is computed using the errors of the last layer and the connecting weights between the hidden and the last layer and error obtained at this stage is used to update the weights between the input and the hidden layer. The thresholds are also updated in a similar manner as that of the corresponding connecting weights. The weights and the thresholds are updated in an iterative method until the error signal becomes minimum. For measuring the degree of matching, the Mean Square

Error (MSE) is taken as a performance measurement.

The updated weights are,

( ) ( )

+ Δ

( )

w jk

(

n

)

w jk

( )

+ Δ

w jk

( )

( ) ( )

+ Δ

( )

where,

Δ

( )

,

Δ

w jk

( )

and

Δ

( ) layer-to-output layer, first hidden layer-to-second hidden layer and input layer-to-first hidden layer respectively. That is,

28

Δ

( )

= −

2

μ

=

μ

d

ξ

( )

( )

=

μ ϕ ′

l

k

P

2 ∑

=

1

+

α

l

( )

( )

(3.12)

f k

Where,

μ is the convergence coefficient (

0 be computed [3.1].

1 ). Similarly the

Δ

w jk

( )

and

Δ

( )

can

The thresholds of each layer can be updated in a similar manner, i.e.

( ) ( )

+ Δ

( )

(3.13)

( ) ( )

+ Δ

( )

(3.14)

( ) ( )

+ Δ

( )

(3.15) where,

Δ

( )

,

Δ

( )

and

Δ

( )

are the change in thresholds of the output, hidden and input layer respectively. The change in threshold is represented as,

Δ

( )

= −

2

μ

d

ξ

( )

( )

=

μ

=

μ

( ) ϕ ′

l

k

P

2 ∑

=

1

w f

+

b l

(3.16)

3.4. FUNCTIONAL LINK ANN

Pao originally proposed FLANN and it is a novel single layer ANN structure capable of forming arbitrarily complex decision regions by generating nonlinear decision boundaries

[3.4]. Here, the initial representation of a pattern is enhanced by using nonlinear function and thus the pattern dimension space is increased. The functional link acts on an element of a pattern or entire pattern itself by generating a set of linearly independent function and then evaluates these functions with the pattern as the argument. Hence separation of the patterns becomes possible in the enhanced space. The use of FLANN not only increases the learning rate but also has less computational complexity [3.13]. Pao et al [3.12] have investigated the learning and generalization characteristics of a random vector FLANN and compared with those attainable with MLP structure trained with back propagation algorithm by taking few functional approximation problems. A FLANN structure with two inputs is shown in Fig. 3.4.

29

3.4.1. Learning Algorithm.

Let X is the input vector of size N×1 which represents N number of elements; the k

th

element is given by:

X

( )

=

x k

,1 (3.17)

Each element undergoes nonlinear expansion to form M elements such that the resultant matrix has the dimension of N×M.

The functional expansion of the element

x k

by power series expansion is carried out using the equation given in (3.18)

s i

= ⎨

x k x l k

for

i

=

1

(3.18)

M

where

l

=

1, 2, ,

M

.

For trigonometric expansion, the

s i

⎪⎪

= ⎨

⎪⎩ sin cos

(

(

x k

π

k

π

k

)

) for

i

=

1 for

i

=

M

(3.19)

=

M

+1

Where

l

=

1, 2, ,

M

2 . In matrix notation the expanded elements of the input vector E, is denoted by S of size N×(M+1).

The bias input is unity. So an extra unity value is padded with the S matrix and the dimension of the S matrix becomes N×Q, where

Q

=

(

M

2

)

x

1 x

2

1

S

.

.

.

W

Adaptive

Algorithm

d(n) y(n)

_

+

Fig.3.4 Structure of the FLANN model

e(n)

30

y

=

i

Q

=

1

s i w i

(3.20)

In matrix notation the output can be,

Y

= ⋅

T

(3. 21)

At n

th

iteration the error signal

( )

=

( ) ( )

(3.22)

Let

ξ

denotes the cost function at iteration k and is given by

ξ

( )

=

1

2

P

e j

2

n (3.23)

j

=

1 where P is the number of nodes at the output layer.The weight vector can be updated by least mean square (LMS) algorithm, as

(

+ =

w k

μ

2

( k)

(3.24) where vector is an instantaneous estimate of the gradient of

ξ

with respect to the weight

Now

ˆ ( )

=

ξ

w

= −

= −

Substituting the values of

w

= −

w

in (3.24) we get

(3.25)

( ) ( )

+

μ

( ) ( )

(3.26) where

μ denotes the step-size

(

0

μ

1

)

, which controls the convergence speed of the LMS algorithm.

The functions used for Functional Expansion is linearly independent and this may be achieved by the use of suitable orthogonal polynomials for functional expansion. The examples of which include Legendre, Chebyshev and trigonometric polynomials. Some of the advantages of using trigonometric polynomials for use in the functional expansion are explained below. Of all the polynomials of Nth order with respect to an orthonormal system

31

{

φ

i

}

i

N

=

1

the best approximation in the metric space

L

2

is given by the Nth partial sum of its

Fourier series with respect this system Thus, the trigonometric polynomial basis functions given by

{

1,cos(

π

u

),sin(

π

u

),cos(2

π

u

),sin(2

π

u

),...cos(

π

),sin(

π

)

}

provides a compact representation of the function in the mean square sense. However, when the outer product terms were used along with the trigonometric polynomials for functional expansion, better results were obtained in the case of learning of a two-variable function .

3.5. CASCADED FUNCTIONAL LINK ANN (CFLANN)

For the identification of highly nonlinear systems the number of branches in the FLANN increases. Even some cases give poor performance. To decrease the number of branches and increase the performance a two-stage FLANN is proposed. Here the output of the first stage again undergoes functional expansion. The weights of cascaded FLANN are updated by using BP algorithm.

3.5.1. Learning Algorithm.

A two stage CFLANN structure is shown in Fig.(3.5). x(n-1) and x(n-2) are the one unit time delay and two unit time delay of input signal x(n).Here each term is expanded into three terms in the first stage.y

2

(n) which is the output of first stage is again expanded into three terms. The number of expansion depends upon the nonlinearity associated with the system.

For highly nonlinear system the number of expansion is more. For mathematical simplicity here we have considered that each term is expanded into three terms. This can be extended into any number of terms.

In the Fig.3.5.

w

1

(

n

) ,

w

2

(

n

),

w

2

(

n

).........

...

w

9

(

n

) are the weights of the first stage.

h

1

(

n

) ,

h

2

(

n

),

h

3

(

n

) are the weights of the second stage.

φ

(

x

(

n

))

=

[

x

(

n

) sin(

π

x x

(

n

2 )

(

n

)) sin(

π

x

( cos(

π

x

(

n

))

n

2 ))

x

(

n

1 ) cos(

π

x

(

n

sin(

2 ))]

π

x

(

n

1 )) cos(

π

x

(

n

1 ))

(3.27)

W

(

n

)

=

[

w

1

(

n

)

w

2

(

n

)

w

2

(

n

).........

......

w

9

(

n

)] (3.28)

ψ

(

y

2

(

n

))

H

(

n

)

=

[

h

1

=

[

(

n

)

y

2

h

(

n

)

2

(

n

) sin(

π

y

2

(

n

)) cos(

π

y

2

(

n

))]

(3.29)

h

3

(

n

)] (3.30)

Here f

1

(.) and f

2

(.) are taken as tanh(.).

The reason why we choose the hyperbolic tangent function in the output is twofold. First, the function has to be differentiable when using a BP algorithm to train the network. This

32

33

-

+

ensures the possibility of calculating the gradients of the error functions (also called the

Performance function). And secondly, one wants to choose a nonlinear function that is close to a binary-valued one to increase the speed of convergence. It turns out that tanh(

.

) is good choice.

Weight updation in the second stage

y

4

(

n

)

= tanh(

y

3

(

n

)) (3.31)

y

3

(

n

)

=

[

h

1

(

n

)

y

2

(

n

)

+

h

2

(

n

) sin(

π

y

2

(

n

))

+

h

3

(

n

) cos(

π

y

2

(

n

))] (3.32) error are nth iteration

e

(

n

)

=

d

(

n

)

y

4

(

n

) (3.33) where is the desired response.

Cost function

ξ

(

n

)

=

1

2

e

2

(

n

) (3.34)

We will drop the time index (n) for simplicity

The change in weight

h

1

is proportional to

ξ

h

1

h

1

=

h

1

η

ξ

h

1

(3.35)

The use of minus sign in equation accounts for gradient descent in weight space

By using chain rule we can write

ξ

h

1

=

ξ

e y e

4

y y

4

y h

3

1

(3.36)

From equation (3.34)

ξ

e

=

e

(3.37)

From equation (3.33)

e

y

4

= −

1 (3.38)

From equation (3.31)

y

4

y

3

y

4

2

(3.39)

From equation (3.32)

y

3

h

1

=

y

2

(3.40)

Now equation (3.36) becomes

ξ

h

1

= −

e

(1

y

2

4

)

y

2

(3.41)

From equation (3.35) and (3.41)

h

1

=

h

1

+

η

e

( 1

y

4

2

)

y

2

(3.42)

34

Similarly proceeding as above we can get

h

2

=

h

2

+

η

e

( 1

y

4

2

) sin(

π

y

2

) (3.43)

h

3

=

h

3

+

η

e

( 1

y

2

4

) cos(

π

y

2

) (3.44)

In general

H

T

=

H

T

+

η

e

( 1

y

2

4

)

ψ

(

y

2

)

T

(3.45)

Weight updation in the first stage

y

2

= tanh(

y

1

)

(3.46)

y

1

=

[

w

1

x

1

+

w

2 sin(

π

x

1

)

+

w

3 cos(

π

x

1

).........

........

+

w

9 cos(

π

x

3

)] (3.47)

The change in weight

w

1

is proportional to

ξ

w

1

w

1

=

w

1

η

ξ

w

1

(3.48)

The use of minus sign in equation accounts for gradient descent in weight space

By using chain rule we can write

ξ

w

1

=

ξ

e

y

4

y

3

y

1

e y

4

y y

2

y

1

w

1

(3.49)

From equation (3.32)

y

3

y

2

=

h

1

+

h

2

π cos(

π

y

2

)

h

3

π sin(

π

y

2

)

(3.50)

From equation (3.46)

y

2

y

1

=

( 1

y

2

2

)

(3.51)

From equation (3.47)

y =

w

1

x

1

(3.52)

Now equation (3.49) becomes

ξ

w

1

= −

e

( 1

y

2

4

)(

h

1

+

h

2

π cos(

π

y

2

)

h

3

π sin(

π

y

2

))( 1

y

2

2

)

x

1

(3.53)

From equation (3.48) and (3.53)

w

1

=

w

1

+

η

e

( 1

y

2

4

)(

h

1

+

h

2

π cos(

π

y

2

)

h

3

π sin(

π

y

2

))( 1

y

2

2

)

x

1

(3.54)

Let

bp

=

h

1

+

h

2

π cos(

π

y

2

)

h

3

π sin(

π

y

2

)

w

1

=

w

1

+

η

e

( 1

y

2

4

)

bp

( 1

y

2

2

)

x

1

(3.55)

Similarly proceeding as above we can get

w

2

=

w

2

+

η

e

( 1

y

2

4

)

bp

( 1

y

2

2

) sin(

π

x

1

)

35

.

.

.

w

9

=

w

9

+

η

e

( 1

y

4

2

)

bp

( 1

y

2

2

) cos(

π

x

3

)

W

T

=

W

T

+

η

e

( 1

y

2

4

)

bp

( 1

y

2

2

)

φ

(

x

)

T

(3.56)

In the similar way by taking different functions for f

1

(.) and f

2

(.) and taking different number of expansions the weight updation algorithm can be derived.

3.6. SIMULATION RESULTS

In this section different types of ANN models and systems are considered for comparison of their performances.

(A)Comparison between MLP and FLANN

Example 1:Here, different nonlinear systems are chosen to examine the approximation capabilities of the MLP and the FLANN. The structure considered for simulation purpose is shown in Fig.2.4.The unknown plant is described by some nonlinear function given in equation (3.57).The adaptive system is either MLP or FLANN. A three-layer MLP structure with 20 and 10 nodes (excluding the threshold unit) in the first and second layers respectively and one input node and one output node was chosen for the purpose of identification.. Where as, the FLANN structure has 14 number of input nodes. Thus, it has only 15 weights including the threshold unit, which are to be updated in one iteration.

a f u u

0.3

u

2 −

u b f u

=

π

u

+

π

u

+

π

u

),

=

=

4

u

3

1.2

u

2

+

1.2

0.4

u

5

+

0.8

u

4

1.2

u

3

+

0.2

u

2

3

π

u

)

u

3

2

+

2

,

0.1cos(4

π

u

+

(3.57)

The input pattern was expanded by using trigonometric polynomials, i.e., by using cos(

π and sin(

π , for n = 0,1,2; ….. .In some cases, the cross product terms were also included in the functional expansion. The nonlinearity used in a node of the MLP and the FLANN is the

tahh

() function. The BP algorithm was used to adapt the weights of both the ANN structures. The input u was a random signal drawn from a uniform distribution in the interval

[-1, 1]. Both the convergence parameter and the momentum term were set to 0.1. Both the

36

MLP and the FLANN were trained for 30000 iterations, after which the weights of the ANN were stored for testing. For testing the input signal is an uniform distribution in the interval

[-1 1]. The response is given in Fig.3.6. to 3.8.

(i)For nonlinearity (3.57(a))

1

1

0.8

true estimated

0.8

true estimated

0.6

0.6

0.4

0.4

0.2

0.2

0

0

-0.2

-0.2

-0.4

-1 -0.5

0 input----->

0.5

1

-0.4

-1 -0.5

0 input----->

(a) (b)

Fig.3.6. Output plot (Example 1) (a) f

1

using MLP (b) f

1

using FLANN

(ii) For nonlinearity (3.57(b))

0.5

1

1

0.5

0

-0.5

true estimated

0.2

0

-0.2

-0.4

-0.6

0.8

0.6

0.4

true estimated

-1

-1 -0.5

0 input----->

0.5

1

-0.8

-1 -0.5

0 input----->

(a) (b)

Fig.3.7. Output plot (Example 1) (a) f

2

using MLP (b) f

2

using FLANN

0.5

1

37

(iii) For nonlinearity (3.57(c))

1

0.5

0 true estimated

1

0.5

0 true estimated

-0.5

-0.5

-1

-1 -0.5

0 input----->

0.5

1

-1

-1 -0.5

0 input----->

0.5

1

(a)

(b)

Fig.3.8. Output plot (Example 1) (a) f

3

using MLP (b) f

3

using FLANN

Example 2: The plant is assumed to be of second order and is described by the following difference equation.

(

+ =

y n

+

y n

− +

g u n

(3.58)

The unknown function g was taken from the nonlinear functions of (3.57).

To identify the plant a model was considered which is governed by the difference equation

ˆ(

+ =

y n

+

y n

− +

N u n

(3.59)

The MLP and FLANN structure are same as Example1.N [u(n)] in equation (3.59) represents the neural network if the input is u(n) . The input to the plant was taken from an uniformly distributed random signal over the interval [-1,1]. The adaptation continues for 30000 iterations during which the series-parallel scheme of identification was used. Then, the adaptation was stopped and the network was used for testing for identification using the parallel scheme. The testing of the network models was undertaken by presenting a sinusoidal input to the identified model given by

⎧ sin

2

π

n

= ⎨

250

0.8sin

2

π

n

250

for n

250

+

0.2sin

2

π

25

n for n

>

250

(3.60)

The results of identification (3.58) with nonlinear function f and

3

f of (3.57) are shown in

4

Fig.3.9 and Fig.3.10 respectively.

38

(i) For nonlinearity (3.57(c))

2

0

-2

6

4

-4

-6 plant model

0

-2

-4

-6

6

4

2

-8

0 200 400 no. of iteration

600

-8

0 200 400 no. of iteration

(a) (b)

Fig.3.9. Output plot (Example 2) (a) f

3

using MLP (b) f

3

using FLANN

(i i) For nonlinearity (3.57(d)) plant model

600

6

4

6

4

2

2

0

0

-2

-2

-4

-4

-6

-8

0 plant

-6 plant model model

200 400 600

-8

0 200 400 600 no. of iteration no. of iteration

(a) (b)

Fig.3.10. Output plot (Exam ple 2) (a) f

4

using MLP (b) f

4

using FLANN

39

(B)Comparison between FLANN and CFLANN

Extensive simulation studies were carried out with several examples of nonlinear systems.

We have compared the performance of the CFLANN (two stages) structure with that of single layer FLANN structure and LMS based identifiers. The input to the system was taken from a uniformly distributed random signals over interval [-0.5, 0.5].White Gaussian noise was added to the output of the nonlinear system. The convergence factor is chosen as

0.03.The adaptation continues for 5000 iterations during parallel scheme of identification was used. The adaptation is then stopped and the network is used for identification purpose. The testing of the network model was done by applying new random samples to the input.

Example 3: We consider a system described by transfer function

= +

z

1 +

0.26

z

2

The nonlinearity used is

b n

=

a n

+

a n

− +

π

a n

For the identification the structure considered is given in Fig.2.6.The adaptive models considered here are FIR filter, FLANN and CFLANN. In FLANN structure each branch is expanded into nine branches out of which one is direct input and other eight are trigonometric expansions such as sin(πx),cos(πx),sin(2πx),cos(2πx)………… sin(4πx),cos(4πx) ,if x is the input to the identifier. The total number of weights used in this case is 28(including a bias term) . A white Gaussian noise of -30dB is added to the output of the nonlinear system.

In case of CFLANN in the first stage each branch is expanded into five branches by using trigonometric expansions. The output of the first stage is again expanded into three branches.

The total number of weights used is 20(including two bias term, one at each stage).The mean square error (MSE) and output responses are compared in Fig.3.11.It is clear from the graphs the performance of CFLANN is better than that of FLANN and LMS based systems.

Example 4: The system is described by

= +

z

1 +

0.26

z

2

The nonlinearity used is

b n

=

a n

a n

.All other conditions are same as example 1.

The mean square error (MSE) and output responses are compared in Fig.3.12.

40

MSE plot

0

-5

LMS

0.8

0.6

-10

-15

0.4

0.2

FLANN

-20

0

-25 -0.2

CFLANN

-30 -0.4

actual lms flann cflann

-35

0 1000 2000 3000 4000 5000

-0.6

0 10 20 30 no. of iterations no. of samples

(a) (b)

Fig.3.11. Performance com parison between LMS,FLANN and CFLANN (Example 3.)

(a) MSE plot (b) Response plot

MSE plot

0

0.5

-5

-10

-15

LMS

0

-20

FLANN

-25

-30

-35

0

CFLANN actual lms flann cflann

1000 2000 3000 4000 5000

-0.5

0 10 20 30 no. of iterations no. of samples

(a) (b)

Fig.3.12. Perform ance comparison between LMS,FLANN and CFLANN (Example 4.)

(a ) MSE plot (b) Response plot

41

3 .7. SUMMARY

Different types of ANN structures and thei r learning algorithms are discussed in this chapter. For identification purpose three ANN st ructures (MLP, FLANN and CFLANN) are used. In FLANN, the input pattern is expanded using trigonometric or polynomials terms of the input vector. The functional expansion ma y be thought of analogous to the nonlinear processing of signals in the hidden layer of an MLP. This functional expansion increases the dimensionality of the input pattern and thus, cre ation of nonlinear decision boundaries in the multidimensional space and identification of c omplex nonlinear functions become simple with this network. From Fig.3.6 to Fig.3.10 it is evident that system identification with the

FLANN structure is better than that of MLP.Since, the hidden layer is absent in this structure, the computational complexity is less and thus, th e le arning is faster in comparison to an MLP.

Therefore, this structure may be implemented for on-line applications.

From Fig. 3.11 and 3.12, it is observed that the CFLANN model is better than that of

FLANN and LMS based models . From the response and MSE plot it can be observed that for highly nonli near system the performance of CFLANN model is considerably better than that o f FLANN and LMS based models. The CFLANN model exhibits two advantages. The first o ne is that it involves less number of expansions. Secondly it provides better learning and response matching performance for nonlinear system in identification.

Chapter

4

PRUNING USING GA

4. PRUNING USING GA

4.1. INTRODUCTION

System identification is a pre-requisite to analysis of a dynamic system and design of an appropriate controller for improving its performance. The more accurate the mathematical model identified for a system, the more effective will be the controller designed for it. In many identification processes, however, the obtainable model using available techniques is generally crude and approximate.

In conventional identification methods, a model structure is selected and the parameters of that model are calculated by optimizing an objective function. The methods typically used for optimization of the objective function are based on gradient descent techniques. On-line system identification used to date are based on recursive implementation of off-line methods such as least squares, maximum likely-hood or instrumental variable. Those recursive schemes are in essence local search techniques. They go from one point in the search point to another at every sampling instant, as a new input-output pair becomes available. This process usually requires a large set of input/output data from the system which is not always available. In addition the obtained parameters may be locally optimal.

Gradient-descent training algorithms are the most common form of training algorithms in si gnal processing today because they have a solid mathematical foundation and have been p roven over the last five decades to work in many environments. Gradient-descent training, h owever leads to suboptimal performance under nonlinear conditions. Genetic Algorithm

(G A)[4.1] has been widely used in many applications to produce a global optimal solution.

T his approach is a probabilistically guided optimization process which simulates the genetic e volution. The algorithm cannot be trapped in local minima as it employs a random mutation p rocedure. In contrast to classical optimization algorithm, genetic algorithms are not guided in their search process by local derivatives. Through coding the variables population with st ronger fitness are identified and maintained while population with weaker fitness are re moved. This process ensures that better offsprings are produced from their parents. This se arch process is stable and robust and can identify global optimal parameters of a system.

T he underlying principles of GA’s were first published by Holland in 1962[4.2].GA’ has b een used in many diverse areas such as function optimization [4.3],image processing [4.4], the traveling salesman problem [4.5] ,[4.6] and system identification [4.7]-[4.11].

43

In this thesis GA is used for simultaneously pruning and weight updation.While constructing an artificial n eural network the designer is often faced with the problem of choosing a network of the right size for the task to be carried out. The advantage of using a reduced neural network is less costly and faster in operation. However, a much reduced network cannot solve the required problem while a fully ANN may lead to accurate solution.

Choosing an appropriate ANN architecture of a learning task is then an important issue in training neural networks. Giles and Omlin [4.12] have applied the pruning strategy for recurrent networks. Markel has employed [4.13] the pruning technique to FFT algorithm. He has eliminated those operations w hich do not contribute to estimate output. Jearanaitanakij and Pinngern [4.14] have analyzed on the minimum number of hidden units that is required to recognize English capital letters of the ANN. Thus to achieve the cost and speed advantage, appropriate pruning of ANN structure is required. In this chapter we have considered an adequately expanded FLANN model for the identification of nonlinear plant and then used

Genetic Algorithm (GA) to train the filter weights as well to obtain the pruned input paths based on their contributions. Procedure for simultaneous pruning and training of weights have been carried out in subsequent sections to obtain a low complexity reduced structure.

4.2. GENETIC ALGORITHM

In the case of deterministic search, algorithm methods such as steepest gradient methods are employed (using gradient concept), where as in stochastic approach, random variables are introduced. Whether the search is deterministic or stochastic, it is possible to improve the reliability of the results. GA’s are stochastic search mechanisms that utilize a Darwinian criterion of population evolution. The GA has robustness that allows its structural functionality to be applied to many different search problems [4.17]. This effectively means that once the search variables are encoded into a suitable format, the GA scheme can be applied in many environments. The process of natural selection, described by Darwin, is used to raise the effectiveness of a group of possible solutions to meet an environmental optimum [4.16].

Genetic algorithms are very different from most of the traditional optimization methods.

Genetic algorithms need design space to be converted into genetic space. So genetic algorithms work with coding variables. The advantages of working with a coding variable space is that coding discretizes the search space even though the function may be continuous.

A more striking difference between genetic algorithms and most of the traditional o ptimization methods is that GA uses a population of points at one time in contrast to the

44

single point approach by traditional optimization methods. This means that GA processes a number of designs at the same time.

4.2.1. GA Operations

The GA operates on the basis that a population of possible solutions, called chromosomes, is used to assess the cost surface of the problem. The GA evolutionary process can be thought of as solution breeding in that it creates a new generation of solutions by crossing two chromosomes. The solution variables or genes that provide a positive contribution to the population will multiply and be passed through each subsequent generation until an optimal combination is obtained.

The population is updated after each learning cycle through three evolutionary processes: selection, crossover and mutation. These create the new generation of solution variables. From the population a pool of individuals is randomly selected, some of these survive into the next iterations population. A mating pool is randomly created and each individual is paired off. These pairs undergo evolutionary operators to produce two new individuals that are added to the new population.

The selection function creates a mating pool of parent solution strings based upon the

“survival of the fittest” criterion. F rom the mating pool the crossover operator exchanges gene information. This essentially crosses the more productive genes from within the solution population to create an improved, more productive, generation. Mutation randomly alters selected genes, which helps prevent premature convergence by pulling the population into unexplored areas of the solution surface and add new gene information into the population.

Fig.4.1. A GA Iteration Cycle

45

4.2.2. Population Variable

A chromosome consists of the problem variables, where these can be arranged in a vector or a matrix. In the gene cro ssover process, corresponding genes are crossed so that there is no inter-variable crossing and therefore each chromosome uses the same fixed structure. An initial population that contains a diverse gene pool offers a better picture of the cost surface where each chromosome within the population is initialized independently by the same random process.

In the case of binary-genes each bit is generated randomly and the resulting bit-words are decoded into their real value equivalent .The binary num ber is used in the genetic search process and the real value is used in the problem evaluation. This type of initialization results in a normally distributed population of variables across a specific range.

This type of results in a normally distributed population of variables across a specific range.

A GA population, P, consists of a set of N chromosomes {Cj... C

N

} and N fitness values

{f

1

……f

N

}, where the fitness is some function of the error matrix.

P

=

C f C f C f

3

C

N

,

f

N

)} (4.1)

The GA is an iterative update algorithm and each chromosome requires its fitness to be evaluated individually. Therefore, N separate solutions need to be assessed upon the same training set in each training iteration. This is a large evaluation overhead where population sizes can range between twenty and a hundred, but the GA is seen to have learning rates that evens this overhead out over the training convergence.

4.2.3. Chrom osome selection.

The s election process is used to weed out the weaker chromosomes from the population so th at the more productive genes may be used in the production of the next generation. The c hromosome finesses are used to rank the population with each individual assigned it own fitness value,

f

E i

=

1

M j

M

=

1

e j i

2

( )

(4.2)

T he solution cost value E t

of the f chromosome in the population is calculated from a trainingb lock of M training signals and from this cost an associated fitness is assigned:

i

=

1

(1

+

E n

(4.3)

46

The fitness can be considered to be the inverse of the cost but the fitness function in Equation

(4.3) is preferred for stability r easons, i.e. E n

When the fitness of each chromosome in the population has been evaluated, two pools are generated, a survival pool and a mating pool. The chromosomes from the mating pool will be used to create a new set of chromosomes through the evolutional processes of natural selection and the survival pool allows a number of chromosomes to pass onto the next generation. The chrom osomes are selected randomly for the two pools but biased towards the fittest. Each chromosome may be chosen more than once and the fitter chromosomes are more likely to be chosen so that they will have a greater influence in the new generation of solutions.

The selection procedure can be described using a biased roulette wheel with the buckets of the wheel sized according to the individual fitness relative to the population's total fitness

[4.17]. Consider an example population often chromosomes that have the fitness assessment of f = {0.16, 0.16, 0.48, 0.08, 0.16, 0.24, 0.32, 0.08, 0.24, 0.16} and the sum of the finesses are used to normalize these values, fmm=2.08.

Figure 4.2 shows a roulette wheel that has been split into ten segments and each segment is in proportion to the population chromosomes relative fitness. The third segment therefore fills nearly a quarter of the roulette wheels area. The random selector points to a chosen chromosome, which is then copied into the mating pool because the third individual controls a greater proportion of the wheel, it has a greater probability of being selected.

Chromosome segments Population roulette wheel

Fig.4.2. Biased roulette-wheel that is used in the selection of the mating pool

The commonly used reproduction operator is the proportionate reproductive o perator where a string is selected from the mating pool with a probability proportional to the fitness. Thus i th

string in the population is selected with a probability proportional to f

i

where f

i

is the fitness value of that string. Since the population size is usually kept fixed in a simple GA,

47

the sum of probabilities of each string being selected for the mating pool must be one. The probability of i th

selected string is

p i

=

n

j

=

1

f i f j

(4.4)

Where ‘n’ is the population size.

Using the fitness value

f i

of all strings, the probability of selecting a string

p i

can be calculated. There after, cumulative probability

P i

of each string can be calculated by adding the individual probabilities from the top of the list. Thus the bottom most string in the population should have a cumulative probability of 1.The roulette wheel concept can be probability from

P i-1

to

P i

.Thus the first string represents the cumulative values from 0 to

P

1.

Hence cumulative probability of any string lies between 0-1. In order to choose n strings, n random numbers between zero and one is created at random. Thus the string that represents the chosen random number in the cumulative p robability range(calculate from fitness value) for the string, is copied from to the mating pool. This way the string with a higher fitness value will represent a larger range in the cumulative probability values and therefore, has a higher probability of being copied into the mating pool. On the other hand string with a smaller fitness value will represent a smaller range in the cumulative probability values and therefore, has a lesser probability of being copied into the mating pool.

4.2.4. Gene Crossover

The crossover operator exchanges gene information between two selected chromosomes, (C q

,

C r

), where this operation aims to improve the diversity of the solution vectors. The pair of chromosomes, taken from the mating pool, becomes the parents of two offspring chromosomes for the new generation.

In the case of a binary crossover operation the least significant bits are exchanged between correspond ing genes within the two parents. For each gene-crossover a random position along the bit sequence is chosen and then all of the bits right of the crossover point are e xchanged. In Fig.4.3 (a), which shows a single point crossover, the fifth crossover position is randomly chosen, where the first position corresponds to the left side. The bits from the right of the fourth bit will be exchanged. Fig.4.3 (b) shows a two point crossover in which two points are randomly chosen and the bits in between them are exchanged. Fig.4.3. shows a basic genetic crossover with the same crossover point chosen for both offspring genes. At the

48

start of the learning process the extent of crossing over the whole population can be decided allowing the evolutionary process to randomly select the individual genes. The probability of a gene crossing, P(crossing), provides a percentage estimate of the genes that will be affected within each parent. P(crossing)≤1 allows all the gene values to be crossed and P(crossing)=0 leaves the parents unchanged, where a random gene selection value, ω Є {1,0}, is governed by this probability of crossing.

1 0 1 0 0 1 0 1

0 0 1 0 1 1 1 0

1 0 1 0 1 1 1 0

0 0 1 0 0 1 0 1

Before crossover

After crossover

(a)

1 0 1 0 0 1 0 1

0 0 1 0 1 1 1 0

Before crossover

1 0 1 0 1 1 0 1

0 0 1 0 0 1 1 0

After crossover

(b)

Fig.4.3. Gene crossover (a) Single point crossover (b) Double point crossover

.

The crossover does not have to be lim ited to this simple operation. The crossover operator can be applied to each chromosome independently, taking different random crossing points in each gene. This operation would be more like grafting parts of the original genes onto each other to create the new gene pair. All of a chromosome's genes are not altered within a single crossover. A probability of gene-crossover is used to randomly select a percentage of the genes and those genes that are not crossed remain the same as one of the parents.

4.2.5 Chromosome Mutation

The last operator within the breeding process is mutation. Each chromosome is considered for mutation with a probability that some of its genes will be mutated after the crossover

49

operation. A random number is generated for each gene, if this value is within the specified mutation selection probability, P(mutation), the gene will be mutated. The probability of mutation occurring tends to be low with around one percent of the population genes being affected in a single generation. In the case of a binary mutation operator, the state of the randomly selected gene-bits is changed, from zero to one or vice-versa.

Selected bit for mutation

1 0 1 1 0 0 1 0

Before mutation

1 0 1 1 1 0 1 0

After mutation

Fig.4.4 Mutation operation in GA

A simple genetic algorithm treats the mutation as a secondary operator with the role of re storing lost genetic materials. Suppose, for example, all the string in a population have c onveyed to to a zero at a given position and the optimal solution has a one at that p osition, th en crossover cannot regenerate a one at that position while a mutation can. It helps the se arch algorithm to escape from local minima’s traps since the modification is not related to a ny previous genetic structure of the population. The mutation is also used to maintain d iversity in the population .For example, consider the following population having four e ight-bit strings.

0110 1011

0011 1101

0001 0110

0111 1100

All the four strings have a zero in the left most bit position. If the true optimum solution requires a one in that position, then neither reproduction nor crossover operator will be able to create a one in that position. Only mutation operation can change that zero to one.

4.3. PARAMETERS OF GA.

There are some parameter values required for GA. To get the desired result these p arameters should be chosen properly.

(a) Crossover and Mutation Pr obability.

There are two basic parameters of GA - crossover probability and mutation probability.

50

Crossover probability: This probability controls the frequency at which the crossover occurs for every chromosome in the search process. This is a number between (0,l) which is determined according to the sensitivity of the variables of the search process. The crossover probability is chosen small for systems with sensitive variables. If there is crossover, o ffspring are made from parts of both parent's chromosome. Crossover is made in hope that n ew chrom osomes will contain good parts of old chromosomes and theref ore the new c hromosomes will be better. However, it is good to leave some part of old population su rvives to next generation.

Mutation probability: This parameter decides how often parts of chromosome will be m utated. If there is no muta tion, offspring are generated imm ediately after crossover (or directly copied) without any change. If mutation is performed, one or more parts of a chromosome are changed. If mutation probability is 100%, whole chromosome is changed, if it is 0%, nothing is changed. Mutation generally prevents the GA from falling into local extremes. Mutation should not occur very often, because then GA will in fact change to random search.

(b) Other Parameters.

There are al so some other parameters in GA. One important parameter is population size.

Population size: How many chromosomes are in population in one generation. If there are too few chromosomes, GA has few possibilities to perform crossover and only a small part of search space is explored. On the other hand, i f there are too many chromosomes, GA slows down. Research shows that after some limit (which depends mainly on encoding and the problem) it is not useful to use very large populations because it does not solve the problem faster than moderate sized populations.

4.4. PRUNING USING GA.

In this Section a new algorithm for simultaneous training and pruning of weights using binary coded genetic algorithm (BGA) is proposed. Such a choice has led to effective pruning of branch and updating of weights. The p runing strategy is based on the idea of successive elimination of less productive paths (functional expansions) and elimination of weights fr om the FLANN architecture. As a result, many branches (functional expansions) are pruned and

51

the overall architecture of the FLANN based model is reduced which in turn reduces the corresponding computational cost associated with the proposed model without sacrificing the performance. Various steps involved in this algorithm are dealt in this section.

Step 1- Initialization in GA:

A population of M chromosomes is selected in GA in which each chromosome constitutes

(T×E+1)×L number of random binary bits where the first L number of bits are called Pruning bits (P) and the remaining bits represent the weights associated with various branches

(functional expansions) of th e FLANN model. Again (T – 1) represents the order the filter a nd E represents the number of expansions specified for each input to the filter. Thus each chromosome can be schematically represented as shown in the Fig. 4.6.

A pruning bit (p) from the set P indicates the presence or absence of expansion branch which ultimately signifies the usefulness of a feature extracted from the time series. In other words a binary 1 will indicate that the corresponding branch contributes and thus establishes a physical connection where as a 0-bit indicates that the effect of that path is insignificant and hence can be neglected. The remaining (T.E.L) bits represent the (T.E) weight values of the model each conta ining L bits.

Step 2- Generation of i nput training data:

K (≥500) number of signal samples is generated. In the present case two different types of signal s are generated to identify the static and feed forward dynamic plants.

(i) To identify a feed forward dynamic plant, a zero mean signal which is uniformly distributed between ±0.5 is generated.

(ii) To identify a static system, a uniformly distributed signal is generated within ±1.

Each of the input samples are passed through the unknown plant (static and feed forward dynamic plant) and K such outputs are obtained. The plant output is then added with the measurement noise (white uniform noise) of know n strength, there by producing k number of d esired signals. Thus the training data are produced to train the network.

Step 3- Decoding:

Each chromosome in GA constitutes random binary bits. So these chromosomes need to be converted to decimal values lying between some ranges to compute the fitness function. The equation that converts the binary coded chromosome in to real numbers is given by

52

x(n)

z

-1 x(n)

Plant

φ

φ

φ

1E

.

φ

11

12

21

Nonlinear Plant

.

.

w w

11

12 w

1E

.

.

.

w

21 x(n-1)

φ

22

.

.

.

.

φ t1

z

-1 x(n-T+1)

φ t2

φ

2E

.

.

.

.

.

φ

TE

.

w

22 w w w t2 w

2E t1

TE

.

.

.

.

.

.

NL p p p p p p p p p

11

12

1E

21

22

2E t1 t2

TE

.

.

.

.

.

.

.

.

.

+ noise

+ d(n)

FLANN model before pruning

+1 b

+

∑ y(n)

GA Based

Algorithm

-

∑ e(n)

Fig.4.5. FLANN based identification model showing updating weights and pruning path

53

RV

=

R

min

+

(

R

max

R

min

)

(

2

L

1

)

×

DV

(4.5)

W here

R min

, R max

, RV and DV represent the minimum range, maximum range, decimal and decoded value of an L bit coding schem e representation. The first L number of bits is not decoded sin ce they represent pruning bits.

L bits

Pruning bits (P)

L bits L bits

. . .

V=T×E×L bits

V=T×E. nos. of weight bits (L)

Fig.4.6. Bit allocation scheme for pruning and weight updating

Step 4

- To compute the estimated output:

A t nth instant the estimated output of the neuron can be computed as

L bits

=

i

T E

∑∑

=

1

j

=

1

φ

ij

( )

ij m

( )

×

( )

+

m

( ) (4.6)

W here φ ij

(n) represents jth expansion of the ith signal sample at the nth instant. W ij m

(n) and

P ij m

(n) represent the jth expansion weight and jth pruning weight of the ith signal sample for m th chromosome at kth instant. Again b m

(n) corresponds to the bias value fed to the neuron for m th chromosome at nth instant.

Step 5

- Calculation of cost function:

Each of the desired output is com pared with corresponding estimated output and K errors are produced. T he Mean-square-error (MSE) corresponding to m th

chromosome is determined by using the relation :

=

k

K

=

1

e k

2

K

(4.7)

This is r epeated for M times (i.e. for all the possible solutions).

54

Step 6- Operations of GA:

Here the GA is used to minimize the MSE . The crossover, mutation and selection operators are carried out sequentially to sel ect the best M individuals which will be treated as parents in the next generation.

Step 7- Stopping Criteria:

The train ing procedure will be ceased when the MSE settles to a desirable level. At this moment all the chromosomes attain the same genes. Then each gene in the chromosome repres ents an estimated weight.

4 .5. SIMULATION RESULTS

E xtensive simulation studies are carried out with several examples from static as well as feed fo rward dynamic systems. For updating the weights of the FLANN we will follow the le arning algorithm given in section 3.4. The performance of the proposed Pruned FLANN model is compared with that of basic FLAN N structure.

(A) Static Systems

Here different nonlinear static systems are chosen to examine the approximation capabilities of the basic FLANN and proposed Pruned FLANN models. In all the simulation studies reported in this Section a single layer FLANN structure having one input node and one neuron is considered. Each input pattern is expanded using trigonometric polynomials i.e. by using cos(

n u

) sin(

π , for n = 0,1,2,…6. In addition a bias is also fed to the output. In th e simulation work the data used are K = 500, M = 40, N = 15, L = 30, probability of crossover = 0.7 and probability of m utation = 0.1. Besides that the R max

and R min

values are judiciously chosen to attain satisfactory results. Three nonlinear static plants considered for this study are as follows:

( )

u

0.3

u

2

0.4

u

(4.8) Example-1:

Example-2:

=

π

u

+

π

u

+

π

u

) (4.9)

Example-3: ( )

=

4

u

3

1.2

u

2

+

1.2

0.4

u

5 +

0.8

u

4 −

1.2

u

3 +

0.2

u

2 −

3

(4.10)

55

At any nth instant, the output of the ANN model y (n) and the output of the system d (n) is compared to produce error e(n) which is then utilized to update the weights of the model. The

LMS algorithm is used to adapt the weights of basic FLANN model where as a proposed GA b ased algorithm is employed for simultaneous adaptation of weights and pruning of the branches. The basic FLAN N model is trained for 30000 iterations where as the pruned

FLANN model is trained for only 60 generations. Finally the weights of the ANN are stored for testing purpose. The responses of both the networks are compared during testing operation and shown in Figs.4.7

(a), (b), (c). The comparison of computational complexity b etween FLANN and pruned FLANN is given in Table.4.1.

TABLE. 4.1

COMPARISON OF COMPUTATIONAL COMPLEXITIES BETWEEN A BASIC

FLANN AND A PRUNED FLANN MODEL

Ex.

No.

Ex-1

Ex-2

Ex-3

FLAN

Additions

N

Pruned

FLANN

14

14

14

B. Dynamic Systems

Number of operations

3

2

5

Multiplications

FLANN

14

14

14

Pruned

FLANN

3

3

5

Number of weights

FLANN

15

15

15

Pruned

FLANN

4

3

6

In the following the simulation studies of nonlinear dynamic feed forward systems has been carried out with the help of several examples. In each example, one particular model of th e unknown system is considered. In this simulation a single layer FLANN structure having one input node and one neuron is consid ered. Each input pattern is expanded using the direct input as well as the trigonometric polynomials i.e. by using

u

,sin(

π cos(

π

)

, for n =

1. In this case the bias is removed. In the simulation work we have considered K = 500, M

= 40, N = 9, L = 20, probability of crossover = 0.7 and probability of mutation = 0.03.

Besides that the R max

and R min

values are judiciously chosen to attain satisfactory results. The three nonlinear dynamic feed forward plants considered for this study are as follows:

56

Example-4:

(a) Parameter of the linear system of the plant [ 0.2600 , 0.9300 , 0.2600 ]

(b) Nonlinearity associated with the plant y n

(k) = y k

+ 0.2 y k

2

– 0.1 y k

3

,

Example-5:

(a) Parameter of the linear system of the plant [0.3040,0.9029,0.3040]

(b) Nonlinearity associated with the plant y n

(k) = tanh(y k

),

Example-6:

(a) Parameter of the linear system of the plant [0.3

410 , 0.8760 , 0.3410]

( b) Nonlinearity associated with the plant y n

(k) = y k

– 0.9 y k

3

.

The basic FLANN m odel is trained for 2000 iterations where as th e proposed FLANN is trained or only 60 genera tions. While training, a white uniform noise of strength -30dB is added to actual system response to assess the performance of two different models u nder noisy condition. T hen the weights of t he AN N are stored for te sting. Finally the te sting of t he networks mo del is un dert aken by pre senting a ze ro mean wh ite random signal to the identified model. Performance comparison between the FLANN and pruned FLANN structure in term s of estim ated output of the unknown plant has been carried out. The responses of both the networks are compared during testing operation and shown in Figs.4.7

(d), (e) , (f).

o

F LAN N is given in T able.4.2.

1

0.8

0.6

desired

FLANN pruned FLANN

0.8

0.6

0.4

desired

FLANN pruned FLANN

0.2

0.4

0.2

0

-0.2

0

-0.2

-0.4

-0.6

-0.4

0 40

-0.8

0 40 10 20 no. of samples

30

(a)

10 20 no. of samples

30

(b)

57

0.1

0

-0.1

-0.2

-0.3

-0.4

-0.5

-0.6

-0.7

10

0.6

0.4

0.2

desired

FLANN pruned FLANN

20 30 no. of samples

(c) desired

FLANN pruned FLANN

40

0.6

0.4

0.2

0

-0.2

-0.4

0 desired

FLANN pruned FLANN

5 10 no. of samples

15

(d)

20

0.6

0.4

0.2

0 desired

FLANN pruned FLANN

0

-0.2

-0.2

-0.4

0 5 10 no. of samples

15 20

-0.4

0 5 10 no. of samples

15 20

(e) (f)

Fig.4.7. Output plot for static and dynamic systems (a) output plot for Example 1. (b) Output plot for Example 2. (c) output plot for Example 3. (d) output plot for Example 4.(e)output plot for Example 5. (f) output plot for Example 6.

58

TABLE. 4.2

COMPARISON OF COMPUTATIONAL COMPLEXITIES BETWEEN A BASIC

FLANN AND A PRUNED FLANN MODEL

Number of weights

Ex.

No.

Ex-1

Ex-2

Ex-3

FLANN

8

8

8

Number of operations

Additions

Pruned

FLANN

3

2

2

Multiplications

FLANN

9

9

9

Pruned

FLANN

4

3

3

FLANN

9

9

9

Pruned

FLANN

4

3

3

4.6. SUMMARY

Simultaneous weight updating and pruning of FLANN identification models using GA is presented. The pruning strategy is based on idea of successive elimination of less productive path. For each weight a separate pruning bit reserved in this process. Computer simulation studies on static and dynamic nonlinear plants demonstrate that there is more than 50% active paths are pruned keeping response matching almost identical with those obtaining from conventional FLANN identification models.

59

Chapter

5

CHANNEL EQUALIZATION

5. CHANNEL EQUALIZATION

5.1. INTRODUCTIO N

Recently, there has been substantial increase of demand for high speed digital data ion channel. C ommunication chann els are usuall y mo deled as band-limit ed linear fi nite impulse resp onse (FIR) filters with low pass frequency response. When the amp litud n e en velope d within e bandw idth of r elay sp e ar e not co nstant inters ymbol interf ere nce (ISI) . B ecause of this linear d istor tion, the trans mitted symbols are spread and overlapped over successive tim e in tervals. In addition to the linear di sto rtion, the transmitted symbols are subjec to other im pair ments such as therm al noise, im pulse noise, a nd n onlin st in in terference, the use of amplifiers and converters, and the nature of the channel itself. All the si gnal processing methods used at the receiver's end to compensate the introduced channel distortion and reco ver the transmitted symbols are referred as channel equalization techniques. High speed communications channels are often impaired by channel inter symbol interference (ISI) and additive noise. Adaptive equalizers are required in these communication systems to obtain reliable data transmission. In adaptive equalizers the main constraint is training the equalizer. Many algorithms have been applied to train the equalizer, each having their own advantages and disadvantages. More over the importance of the channel equalization always keeps the researc h going on to introduce new algorithms to train the equa lizer.

A daptive channel equalization was first proposed and analyzed by Lucky in 1965[5.11].

A daptive channel equalizer employing a multilayer perceptron (MLP) structure has been reported [5.2], [5.7]. One of the major drawback of the MLP structure is the long training tim e required for generalization and thus, this network has very poor convergence speed w hich is primarily due to its multilayer architecture. A single layer polynomial perceptron network (PP N) has been utilized for the purpose of channel equalization [5.3] in which the original input pattern is e xpanded using polynomials and cross-product terms of the pattern and then, this expanded pattern is utilized fo r the equalization problem. Superior performance of this netw ork over a linear equalizer has been reported. An alternative ANN structure called functional link ANN (F LANN) originally proposed by Pao [5.8] is a novel single layer ANN capable of form ing arbitrarily complex decision regions. In this network, the initial

60

representation of a patt ern is enhanced by the use of nonlinear functio ns resulting in higher d imensional pat tern and hence, the separability of the patterns becomes possible. The PPN, which uses the polynomials for the expansion of the input pattern, in fact, is a subset of the approximation [5.8] and for channel equalization [5.l], [5.4]. It has been shown [5.9] that in the case of 2-ary PAM signal, BER and MSE performance of the FLANN-based equalizer is superior than two other ANN structures such as MLP and PPN.

5.2. BASEBAND COMMUNICATION SYSTEM

In an ideal communication channel, the received information is identical to that transmitted.

However, this is not the case for real communication channels, where signal distortions take place. A channel can interfere with the transmitted data through three types of distorting effects: power degradation and fades, multi-path time dispersions and background thermal noise. Equalization is the process of recovering the data sequence from the corrupted channel samples. A typical base band transmission system is depicted in Fig.5.1., where an equalizer is incorporated within the receiver.

Noise

In put

Transmitter

Filter

Channel

Medium

+

Receiver

Filter

Equalizer

Fig.5.1. A baseband Communication System

Outpu

The equalization approaches investigated in this thesis are applied to a BPSK (binary phase shift keying) base band communication system. Each of the transmitted data belongs to a t binary and 180° out of phase alphabet {-1, +1}.

5.3. CHANNEL INTERFERENCE

In a communication system data signals can either be transmitted sequentially or in parallel across a channel medium in a manner that can be recovered at the receiver. To increase the data rate within a fixed bandwidth, data compression in space and/or time is required .

61

5.3.1. Multipath Propagation.

Within telecommunication channels multiple paths of propagation commonly occur. In practical terms this is equivalent to transmitting the same signal through a number of separate channels, each having a different attenuation and delay [5.13]. Consider an open-air radio transmission channel that has three propagation paths, as illustrated in Fig.5.2 [14].These could be direct, earth bound and sky bound.

Fig.5.2 (b) describes how a receiver picks up the transmitted data. The direct signal is received first whilst the earth and sky bound are delayed. All three of the signals are attenuated with the sky path suffering the most.

Multipath interference between consecutively transmitted signals will take place if one signal is received whilst the previous signal is still being detected [5.13]. In Fig.5.2. this would occur if the symbol transmission rate is greater than1/

τ

. Because bandwidth efficiency leads to high data rates, multi-path interference commonly occurs.

Multiple Transmission Paths

Transmitter

Sky Bound

Direct

Earth Bound

(a)

Signal Strength

at Receiver

Direct

Receiver

Earth Bound

Sky Bound

τ

(b)

Fig.5.2. Impulse Response of a transmitted signal in a channel which has 3 modes of propagation, (a) The signal transmitted paths, (b) The received samples.

Channel models are used to describe the channel distorting effects and are given as a

62

summation of weighted time delayed channel inputs ( ) .

=

i m

=

0

(

)

i

=

( )

+

(

1)

z

1 +

(

2)

z

2 +

......

(5.1)

The transfer function of a multi-path channel is given in Equation 5.1. The model coefficients

(

) describe the strength of each multipath signal.

5.4. MINIMUM AND NONMINIMUM PHASE CHANNELS

When all the roots of the model z-transform lie within the unit circle, the channel is termed minimum phase [5.15] The inverse of a minimum phase channel is convergent, illustrated by

Equation(5.2):

z

1

1

=

+

1

( ) 1.0 0.5

z

1

=

i

=

0

1

2

i z

i

= −

z

1

+

0.25

z

2

0.125

z

3

+

(5.2) where as the inverse of non-minimum ph ase chan nels are not convergent, as shown in

Equation (5.3)

1

= +

=

+

z z

1

z

=

z

.

i

=

0

1

2

i z

i

=

z

.5

z

+

0.25

z

2 −

0 .12

5

(5.3)

Since equalizers are de signed to inve rt th h d n process they will in effect a linear equalization solution exists. However, limiting the inverse model to m-dimensions will approximate the solution and it has been shown that non-linear solutions can provide a su perior inverse model in the same dimension.

A linear inverse of a non-minimum phase channel does not exist without incorporating time delays. A time delay creates a convergent series for a non-minimum phase model, where lo nger delays are necessary to provide a reasonable equalizer. Equation (5.4) describes a nonminimum phase channel with a single delay inverse and a four sample delay invers e. The latter of these is the more suitable form for a linear filter.

63

z

1

z

4

=

0.5 1.0

z

1

1

=

+

1

( ) 1 0.5

z

1

=

z

3

= −

0.5

z

2

z

+

0.25

z

2

0.125

+

0.25

z

1

0.125

z

+

z

3

(

+

(

non ca usal

)

truncated and causal

)

5.4)

5 .5. INTERSYMBOL INTERFERENCE

Inter-symbol interference (ISI) has already been described as the overlapping of the transmitted data. It is difficult to recover the original data from one channel sample dimension because there is no statistical information about the multipath propagation.

Increasing the d imensionality of the channel output vector helps characterize the multipath propagation. This has the effect of not only increasing the number of symbols but also increases the Euclidean distance between the output classes.

When additive Gaussian noise ,

η , is pre sent within the channel, the input sample will form

Gaussian clusters around the symbol centers. These symbol clusters can be characterized by a probability density function (pdf) with a noise variance

σ

η

2

, where the noise can cause the symbol clusters to interfere. Once this occurs, equalization filtering will become inadequate to classify all o f the input samples. Error control coding schemes can be employed in such cases but these often require extra bandwidth.

5.5.1. Symbol Overlap.

The expected number of errors can be calculated by considering the amount of symbol interaction, assuming Gaussian noise. Taking any two neighboring symbols, the cumulative distribution function (CDF) can be used to describe the overlap between the two noise characteristics. The overlap is directly related to the probability of error between the two symbols and if these two symbols belong to opposing classes, a class error will occur.

Figure 2.3 shows two Gaussian functions that could represent two symbol noise distributions. The Euclidean distance, L, between symbol canters and the noise variance, can be used in the cumulative distribution function of Equation (5.5) to calculate the area of overlap between the two symbol noise distributions and therefore the probability of error, as in Equation (5.6)

=

x

−∞

1

2

πσ exp

x

2

2

σ

2

dx

(5.5)

64

Area of overlap =

Probability of error

Fig.5.3. Interaction between two neighboring symbols.

=

CDF

(5.6)

Since each channel symbol is equally likely to occur, the probability of unrecoverable errors occurring in the equalization space can be calculated using the sum of all the CDF overlap between each opposing class symbol. The probability of error is more commonly described as the BER. Equation(5.7)describes the BER based upon the Gaussian noise overlap, where

N sp

is the number of symbols in the positive class, in the negative class and

Δ , is the distance between the th positive symbol and its closest neighboring symbol in the negative class.

BER

σ

2

N sp

+

N m

N sp i

=

1

CDF

Δ

i

2

σ

n

(5.7)

5.6. CHANNEL EQUALIZATION

High speed communications channels are often impaired by channel inter s ymbol interference (ISI) and additive noise. Adaptive equalizers are required in these constraint is training the equalizer. Many algorithms have been applied to train the equalizer, each having their own advantages and disadvantages. More over the importance of the channel equalization always k eeps the research going on to introduce new algorithms to train the equalizer.

The optimal BER equalization performance is obtained using a maximum likelihood sequence estimator (MLSE) on the entire transmitted data sequence [5.18].A more practical

65

MLSE would operate on smaller data sequences but these can still be computationally e xpensive, they also have problems tracking time-varying channels and can only produce se quences of outputs with a significant time delay. Another equalization approach im plements a symbol-by-symbol detection procedure and is based upon adaptive filters. The sy mbol-by-symbol approach to equalization applies the channel output samples to a decision c lassifier that separates the symbol into their respective classes. Two types of symbol-bysy mbol equalizers are examined in this thesis, the transvers al (TE) and decision feedback

linear filters, LTE a nd LDFE, with a simple FIR structure. The ideal equalizer will model the inverse of the c hannel model but this does not take into account the effect of noise within the channel.

Noise

q(n)

X(n)

Channel

a(n)

N.L.

b(n)

+

r(n)

Equalizer

Update

Algorithm

e(n)

+

Delay

d(n)

N.L. =Nonlinearity

Fig.5.4. Block diagram of Channel Equalization

A basic block diagram of channel equalization is shown in Fig. 5.4.The transmitted signal

X(n) pass through the channel .The block N.L accounts for the nonlinearity associated with the channel. q(n) is the Gaussian noise added through the channel. The equalizer is placed at the receiver end. The output of the equalizer is compared with the delayed version of the transmitted signal to calculate the error signal e(n),which is used by the update algorithm to update the equalization coefficient such that the error becomes minimum.

66

5.6.1. Transversal Filter

The transversal equalizer uses a time-delay vector, ( ) (Equation (5.8)), of channel output samples to determine the symbol class. The {m} TE notation used to represent the transversal equalizer specifies m inputs.

= −

y n

(

m

1))] (5.8)

The equalizer filter output will be classified through a threshold activation device (Fig. 5.5) so that the equalizer decision will belong to one of the BPSK states i.e. 1 or -1.

w

1

(n) y(n)

z

-1

w

2 y(n-1)

(n)

z

w

-1

3

(n) y(n-2)

Adder

y(n)= Received samples

d(n)=Equalized output

z

-1

Fig.5.5. Linear Transversal Filter

Considering the inverse of the channel

w

4

(n) y(n-3)

z

w

-1

5

(n) y(n-4) d(n) z

1

that was given in Equation (5.3), th is is an infinitely long convergent linear series:

1

=

i

=

0

1

2

i z

i

. Each coefficient of this inverse model can be used in a linear equalizer as a FIR tap-weight. Each tap-dimension will improve the accuracy; however, high input dimensions leave the equalizer susceptible to noisy samples. If a noisy sample is received, this will remain within the filter affecting the output from each equalizer tap. Rather than designing a linear equalizer, a non-linear filter can be used to provide the desired performance that has a shorter input dimension; this will reduce the sensitivity to noise.

67

5.7. SIMULATION RESULTS

Extensive simulation studies have been carried out for channel equalization problem as described in Fig 5.4. using the two discussed ANN structures (FLANN and CFLANN), as discussed in section 3.4. and 3.5., with BP algorithm and a linear FIR equalizer with LMS algorithm as discussed in section 2.6.. Th e digital message was with binary phase shift keying (BPSK) signal constellation and in the form [-1 1] in which each symbol was obtained from a random distribution. To obtain meaningful comparisons from th e simulations w e assign the same input signals to the two neural-network based equalizer structures c onsidered. To the channel output a zero mean white Gaussian noise of SNR 30dB was a dded. Th ei s to s the SNR equal to the c ocal of noise varian ce at the inpu t of the equ alizer.

Fo r FLA NN th e equali zer h ave six branch es and each branch i s expanded to five branches u sing trigonometric functions. The output of the FLANN contains a tanh(.) function. Total n umber of weights used for FLANN is 31 (including one bias term).In case of CFLANN in th e first stage each branch is expanded into five branches. The output of first s tage is a gain e xpanded into three terms. The number of weights used in the first stage is 19 (including one b ias term). The number of weights used in the second stage is 4 (including one bias term).

T otal number of weights used for FLANN is 23. At each stage CFLANN contains tanh(.) fu nction. The l earning rate is chosen as 0.03 for both the structure. To study the BER p erformance, each of the equalizer structures was trained with 5000 iterations for optimal w eight solution. After comple tion of the training, testing of the e qualizer was carried out. The

B ER was calculated 10

5

data samples.

The channel considered here has the normalized transfer function given in z-transform form:

= +

z

1

+

0.26

z

2

(5.9)

The following types of nonlinearity were introduced in the channel.

(1)

b n

=

a n

+

a n

(5.10)

(2)

b n

=

a n

+

a n

− +

π

a n

(5.11)

The BER performance for both the nonlinearity is plotted in Fig.5.6.From the BER plot it is o bserved that the performance of CFLANN equalizer is better than those of FLANN and

L MS based equalizers.

68

10

-1

10

-2

10

-3

10

-4

BER plot lms flann cflann

10

-5

10 12 14 16

SNR in dB----->

(a)

BER plot

10

-1

18 20 22 lms flann cflann

10

-2

10

-3

10

-4

10

-5

10 11 12 13 14 15

SNR in dB----->

16 17 18 19

(b)

Fig.5.6. BER plot (a) For nonlinearity(5.10) (b) For nonlinearity(5.11)

20

69

5.8. SUMMARY

To compensate the effect of ISI and other type noises on the bits, when transmitted through the channel, an equalizer is placed at the receiver end. For the equalization of highly nonlinear systems the number of branches in the FLANN increases. Even some cases give poor performance. To decrease the number of branches and increase the performance a twostage FLANN is described in this chapter. Here the output of the first stage again undergoes functional expansion. From the BER plots it can be observed that for nonlinear channels the performance of CFLANN equalizer is considerably better than those of FLANN and LMS based equalizers.

70

Chapter

6

CONCLUSIONS

6. CONCLUSIONS

6.1. CONCLUSIONS

The aim of this thesis is to find a proper artificial neural network (ANN) model for adaptive nonlinear system identification and channel equalization. The prime advantages of using

ANN models are their ability to learn based on optimization of an appropriate error function and their excellent performance for approximation of nonlinear functions. .

Three ANN stru ctures, MLP, FLANN and CFLANN, are discussed. Due to multilayer st ructure of the MLP, its convergence is slow. On the otherhand FLANN is a single layer st ructure with functionally mapped inputs. Here, the initial representation of a pattern is e nhanced by using nonlinear function and thus the pattern dimension space is increased. The fu nctional link acts on an element of a pattern or entire pattern itself by generating a set of li nearly independent function and then evaluates these functions with the pattern as the a rgument. Hence separation of the patterns becomes possible in the enhanced space. While c onstructing an artificial neural network the designer is often faced with the problem of c hoosing a network of the right size for the task to be carried out. To overcome this problem a GA based pruning strategy and weight updation algorithm is used.

Transmission bandwidth is one of the precious resources in digital communication sy stems. To achieve better use of this resource, signals are commonly transmitted through b and-limited channels. So the received signals inevitably affected by inter symbol in terference(ISI). A channel equalizer is used to recover the transmitted data from the re ceived signals. If the nonlinearity associated with the system or channel is high the number o f branches in the FLANN increases. Even some cases give poor performance. To decrease th e number of branches and increase the performance CFLANN is used.

6 .2. SCOPE FOR FUTURE WORK

In this thesis CFLANN is used for only FIR system and channels. This can be extended to in finite impulse response (IIR) systems and channels.

In pruning technique GA is used which is a very slow process and has a very high c omputational complexity. Hence research can be done to find out a faster method for p runing.

71

REFERENCES

Chapter. 1

[1.1] Narendra K.S. and Annaswamy A.M., Stable Adaptive Systems. Englewood Cliffs,

NJ: Prentice - Hall, 1989.

[1.2] Proakis J. G., Digital Communications, third edition, McGraw Hill, 1995.

[1.3] Chen S.,Billings, S. A. and Grant, P. M.,“Nonlinear system identification using neural networks”, Int. J. Contr., vol. 51, no. 6, June 1990, pp. 1191-1214.

[1.4] Narendra K. S. and Parthasarathy K., “Identification and control of dynamic systems using neural networks”, IEEE Trans. on Neural Networks, vol. 1, Mar, 1990, pp. 4-27.

[1.5] Widrow B. and. Stearns S. D., Adaptive Signal Processing, Second Edition, Pearson

Education, 2002.

[1.6] Qureshi S., “Adaptive equalization”, IEEE Communications Magazine, Mar 1982, pp

9-16.

[1.7] Siller C., “Multipath propagation”, IEEE communications magazine, vol.22, no.2,

Feb.1984, pp.6-15.

[1.8] Chen S., Mulgrew B. and McLaughlin S., “Adaptive Bayesian Equalizer with

Decision Feedback”, IEEE Trans. on Signal Processing, vol.41, no.9, Sept. 1993, pp.2918-2927.

[1.9] Patra J. C., Pal R. N., Chatterji B. N. and Panda G., “Identification of nonlinear dynamic systems using functional link artificial neural networks”, IEEE Trans. On

Systems, Man and Cybernetics-part B: Cybernetics, vol. 29, no. 2, April 1999, pp.

254-262.

Chapter. 2

[2.1] Ljung L., System Identification. Englewood Cliffs, NJ: Prentice-Hall, 1987.

[2.2] Akaike H.,“A new look at the statistical model identification”. IEEE Trans. on

Automatic Control, AC-19:716-723, 1974.

[2.3] Widrow Bernard & D.Streans Samuel, Adaptive Signal Processing, Second Edition,

Pearson Education, 2002.

72

[2.4] Haykin Simon, Adaptive Filter Theory, Pearson Education Publisher, 2003.

[2.5] Wiener N., “Extrapolation, Interpolation, and Smoothing of Stationary Time Series with Engineering Applications”, MIT Press, vol.15.pp.18-25, Cambridge, MA, 1949.

[2.6] Levinso n N., “The Wiener RMS (root-mean-square) error criterion in filter design and prediction”, Math Phys., pp. 261-278, Vol.25, 1947.

[2.7] Fan H. and Jenkins, W.K. "A New Adaptive IIR Filter", IEEE Trans. on Circuits and

Systems, vol. CAS-33, no. 10, Oct. 1986 , pp. 939-947.

[2.8] Ho K.C. and Chan Y.T., "Bias Removal in Equation-Error Adaptive IIR Filters", IEEE

Trans. on Signal Processing, vol. 43, no. 1, Jan. 1995.

[2.9] Cousseau J.E. and Diniz P.S.R., "New Adaptive IIR Filtering Algorithms Based on the

Steiglitz-McBride Method", IEEE Trans Signal Processing, vol. 45, no. 5, May 1997.

[ 2.10] Blackmore K.L., Williamson R.C., Mareels I.M.Y and Sethares, W.A., "Online

Learning via Cong regational Gradient Descent", Mathematics of Control, Signals, and

Systems vol.10, pp. 331-363,1997.

[2.11] Haykin Simon, “Adaptive Filter Theory”, Third Edition, Prentice-Hall Inc., Upper

Saddle River, NJ, 1996.

[2.12] Mars P., Chen J.R. and Nambiar, R., “Learning Algorithms: Theory and Applications in Signal Processing, Control, and Communications”, CRC Press , vol.10,pp.18-25Inc.,

1996.

Chapter. 3

[3.1] Haykin S.,“ NeuralNetworks: A ComprehensiveFoundation”, Pearson Education Asia,

2002.

[3.2] Jagannathan S., Lewis F.L.,“Identification of a class of nonlinear dynamical systems using m ultilayered neu ral networks”, IEEE International Symposium on Intelligent

Control, Columbus, Ohio, USA, 1994), pp. 345–351.

[3.3] Narendr a K. S., and Parthasarathy K., “Identification and control of dynamical system using neural networks,” IEEE Trans. on Neural Networks, vol. 1, no.1, Mar. 1990, pp. 4-26.

[3.4] P ao Y. H., Adaptive Pattern Recognition and Neural Network , Reading, MA, Addison

Wesley, 1989, Chapter 8, pp.197-222.

73

[3.5] Patra J.C., Pal R.N., Chatterji B.N., and Panda G., “Identification of Nonlinear

Dynamic Systems Using Functional Link Artificial Neural Networks”, IEEE Trans. on Systems, Man and Cybernetics-Part B : Cybernetics Vol. 29, No.2, Apr.

1999,pp.254-262.

[3.6] P atra J.C., Kot A.C.,“Nonlinear Dynamic System Identification Using Chebyshev

Functional Link Artificial Neural Networks”, IEEE Trans. on Systems Man and

Cybernetics-Part B. Cybernetics.Vol.32,No.4,Aug. 2002,pp.505-510.

[3.7] Wang J.and Chen Y., “A Fully Automated Recurrent Neural Network for Unknown

D ynamic System Identification and Control”, IEEE Tr ans. on circuits and systems-I:

Regular papers, Vol.53, No.6, June 2006 pp.1363-1372..

[3.8] M artin T. Hagan, Howard B. Demuth., Neural Network Design, Thomson Learning

2003.

[3.9] D ayhoff E.J.,“Neural Network Architecture – An Introduction” Van Norstand Reilold,

N ew York, 1990.

[3.10] Bose N.K., and Liang P., “Neural Network Fundamentals with Graphs, Algorithms,

Applications”, TMH Publishing Company Ltd, 1998.

[3.11] W idrow Bernard and D.Streans Samuel. Adaptive Signal Processing, Pearson

Education Publisher.

[3.12] Pao Y.H., Park G.H. and Sobjic D.J., “Learning and Generalization Characteristics of the Ran dom Vector Function”, Neuro Computation , vol.6, pp.163-180, 1994.

[3.13] Patra J.C., and Pal R.N., “A Functional Link Artificial Neural Network for Adaptive

Channe l Equalization”, Signal Processing 43(1995) , vol.43, no.2, pp.181-195May

1995.

[3.14] Mishra S. K. and Panda G., “A Novel Method for Designing L VDT and Its

Comparison with Conventional Design”, SAS 2006-IEEE Senosrs Applications

Symposium,pp.129-134, Houstan, Texas, USA, 7-9 Feb.2006

Chapter. 4

[4.1] Holland J.H., Adaptation in Natural a nd Artificial Systems . University of Michigan

Press, Ann Arbor, 1975.

74

[4.2] Holland J. H., “Outline for a logical theory of adaptive systems,” J. ACM, vol. 3, pp.

297 - 314, July 1962; also in A. W. Burks, Ed., Essays on Cellular Automata, Univ.

Illinois Press, 1970, pp. 297-319.

[4.3] DeJong K. A., “A n analysis of the behavior of a class of genetic adaptive systems,”

Ph.D. dissertation (CCS), Univ. Mich., Ann Arbor, MI, 1975.

[4.4] Fitzpatrick J. M., Grefenstette J. J., and Gucht D. Van, “Image registration by genetic search,” in Proc. IEEE Southeastcon ’84, 1984, pp.460-464.

[4.5] Goldherg D. E. and Lingle R., ‘‘Alleles, loci and the traveling salesman problem,” in

Proc. Int. Conf. Genetic Algorithms and Their Appl., pp. 154-159. 1985

[4.6] Grefenstette J. J., Gopal R., Rosmaita B. and VanGucht D., “Genetic algorithms for the traveling salesman problem,” in Proc. Int. Conf Genetic Algorithms and Their

Applications, pp. 160-165, 1985.

[4.7] Das R. and Goldberg D.E., “Discrete-time parameter estimation with genetic algorithms,” prep rints from the Proc. 19th annual Pittsburgh Conf Modeling and

Simulation, 1988.

[4.8] Etter D. M., Hicks M. J. and. Cho K. H, “Recursive adaptive filter design using an adaptive genetic algorithm,” Proc. IEEE Int. Co nf Acoustics, Speech, Signal

Processing, vol. 2, pp. 635-638, 1982.

[4.9] Goldberg D. E. System identification via genetic algorithm, unpublished manuscript,

Univ. Mich., Ann Arbor, MI, 1981.

[4.10] Kristinsson K. and Dumont G. A.,” Genetic algorithms in system identif ication,”

Third IEEE Int. Symp. Intelligent Contr., Arlington, VA, pp. 597-602, 1988.

[4.11] Smith T. and DeJong K.A., “Genetic Algorithms applied to the calibration of inform ation driven models of US migration patterns,” in Proc. 12th Annu. Pittsburgh

Conf Modeling and Simulation, 1981, pp. 955-959, 1981.

[4.12] Giles C. Lee and Christian W. Omlin, “Pruning Recurrent Neural Networks for

Improved Generalization Performance”, IEEE Trans. on Neural Networks Vol.5,

No.5, Sept.1994,pp.848-851.

[4.13] Markel J.D., “FFT pruning”, IEEE Trans. on Audio and Electro Acoustics, Vol.AU-

19, No.

4, Dec.1971,pp.305-311.

[ 4.14] Jearanaitanakij K.and Pinngern O., “Hidden Unit Reduction of Artificial Neural

Network on English Capita l Letter Reduction”, IEEE conference , pp.1-5. June 2006.

75

[ 4.15] Chung T., Leung H., “A genetic algorithm approach in optimal capacitor selection with harmonic distortion considerations”, International Journal of Electrical Power & Energy

Systems, vol.21, no.8, Nov. 1999, pp .561-9.

[4.16] Fogel D., “What is evolutionary computing”, IEEE spectrum magazine, Feb. 2000, pp.26-32.

[4.17] Goldberg D., Genetic algorithms in search optimization , Addison-Wesley, 1989.

C hapter. 5

[5.1] Arcens S., Sueiro, J. C. and Vidal, A. R. F. "Pao networks for data transmission equalization," Proc. Int. Jt. Conf. on Neural Networks, Baltimore, Maryland, pp.

11.963-11.968, June 1992.

[5.2] Chen S., Gibson G.J., Cowan, C .F.N. and Grant P. M., “Adaptive equalization of finite nonlinear channels using multilayer perceptrons,” EURASIP Signal Processing

Jnl., V01.20, 1990, pp.107-119.

[5.3] Chen S., Gibson G .J. and Cowan C.F.N., "Adaptive channel equalization using a polynomial perceptron structure," Proc.IEE, Pt.I, Vo1.137, pp.257-264. October 1990.

[5.4] Gan W. S., Saraghan J. J., and Durrani T. S., "New functional-link based equalizer",

Electronics Letters, Vol.28, No. 17, 13 Aug. 1992, pp. 1643-1645.

[5.5] Haykin S., Adaptive Filter Theory, Prectice Hall, Englewood Cliffs, NJ. 1986,

Chapter 5, pp.194-268.

[5.6] Hornik K., Stichcombe M. and White H.,"Multilayer feed forward networks are universal approximators", IEEE Trans. on Neural Networks, Vol. 2, 1989, pp. 359-

366.

[5.7] Meyer M. and Pfeiffer G.,"Multilayer perceptron based decision feedback equalizers for channels with intersymbol interference,"Proc. IEE, Pt- I ,Vol 140, No 6, pp 420-

424,June 1992.

[5.8] Pao Y. H. Adaptive Pattern Recognition and Neural Networks, Reading, MA,

Addison Wesley, 1989, Chapter 8, pp. 197-222, Dec 1993.

[5.9] Patra J. C. and Pal R. N.,"A functional link artificial neural network for adaptive channel equalization", EURASI P Signal Processing Jnl ., Vol. 43, No. 2, 1995

.pp.662-670.

[5.10] Widrow B. and Lehr, M. A.,"30 years of adaptive neural networks: perceptron , madaline and back propagation," Proc. IEEE , vo1.78, No.9, September 1990, pp.1415-1442.

76

[5.11] Lucky R. W., "Automatic equalization for digital communications", Bell Syst. Tech.

Journal, vol. no. 44, April 1965, pp. 547-588.

[5.12] Proakis J. G., Digital Communications, third edition, McGraw Hill, 1995.

[5.13] Qureshi S., “Adaptive equalization”, Proceedings of the IEEE, vol.73, no.9, pp.1349-

1387, Sept. 1985.

[5.14] Widrow B., Stearns S., Adaptive signal processing, Chap.9, Prentice-Hall Signal processing series, New Jersey 1985.

[5.15] Macch i O., “Adaptive processing, the least mean squares approach with applications in transmission”, John Wiley and Sons, West Sussex. England, 1995.

[5.16] Siu S., “Non-linear Adaptive Equalization based on multi-layer perceptron

Architecture”, Ph.D. Thesis, Faculty of Science, University of Edinburgh , 1990.

77

Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement