9 LEARNING TASKS

NOTATIONS II: PROBABILITY THEORY

In the previous section, we discussed different learning paradigms. In this section, we de-scribe some basic learning tasks. The choice of a particular learning rule, is of course, in-fluenced by the learning task, the diverse nature of which is testimony to the universality of neural networks.

Pattern Association

An associative memory is a brainlike distributed memory that learns by association. As-sociation has been known to be a prominent feature of human memory since the time of Aristotle, and all models of cognition use association in one form or another as the basic operation (Anderson, 1995).

Association takes one of two forms: autoassociation and heteroassociation. In au-toassociation, a neural network is required to store a set of patterns (vectors) by re-peatedly presenting them to the network. The network is subsequently presented with a partial description or distorted (noisy) version of an original pattern stored in it, and the task is to retrieve (recall) that particular pattern. Heteroassociation dif-fers from autoassociation in that an arbitrary set of input patterns is paired with an-other arbitrary set of output patterns. Autoassociation involves the use of unsupervised learning, whereas the type of learning involved in heteroassociation is supervised.

Let x_kdenote a key pattern (vector) applied to an associative memory and y_k de-note a memorized pattern (vector). The pattern association performed by the network is described by

(32) where q is the number of patterns stored in the network.The key pattern x_kacts as a stim-ulus that not only determines the storage location of memorized pattern y_k, but also holds the key for its retrieval.

In an autoassociative memory, y_k xk, so the input and output (data) spaces of the network have the same dimensionality. In a heteroassociative memory, ; hence, the dimensionality of the output space in this second case may or may not equal the di-mensionality of the input space.

There are two phases involved in the operation of an associative memory:

• storage phase, which refers to the training of the network in accordance with Eq.

(32);

• recall phase, which involves the retrieval of a memorized pattern in response to the presentation of a noisy or distorted version of a key pattern to the net-work.

Let the stimulus (input) x represent a noisy or distorted version of a key pattern x_j. This stimulus produces a response (output) y, as indicated in Fig. 27. For perfect recall, we should find that y yj, where yjis the memorized pattern associated with the key pattern xj. When for y Z yj x xj, the associative memory is said to have made an error in recall.

y_k Z xk

x_k S yk,

k = 1, 2, ..., q

Section 9 Learning Tasks 39

The number of patterns q stored in an associative memory provides a direct measure of the storage capacity of the network. In designing an associative memory, the challenge is to make the storage capacity q (expressed as a percentage of the total number N of neurons used to construct the network) as large as possible, yet insist that a large fraction of the memorized patterns is recalled correctly.

Pattern Recognition

Humans are good at pattern recognition. We receive data from the world around us via our senses and are able to recognize the source of the data. We are often able to do so almost immediately and with practically no effort. For example, we can recog-nize the familiar face of a person even though that person has aged since our last en-counter, identify a familiar person by his or her voice on the telephone despite a bad connection, and distinguish a boiled egg that is good from a bad one by smelling it.

Humans perform pattern recognition through a learning process; so it is with neural networks.

Pattern recognition is formally defined as the process whereby a received pat-tern/signal is assigned to one of a prescribed number of classes. A neural network per-forms pattern recognition by first undergoing a training session during which the network is repeatedly presented with a set of input patterns along with the category to which each particular pattern belongs. Later, the network is presented with a new pattern that has not been seen before, but which belongs to the same population of pat-terns used to train the network. The network is able to identify the class of that par-ticular pattern because of the information it has extracted from the training data.

Pattern recognition performed by a neural network is statistical in nature, with the patterns being represented by points in a multidimensional decision space. The deci-sion space is divided into regions, each one of which is associated with a class. The decision boundaries are determined by the training process. The construction of these boundaries is made statistical by the inherent variability that exists within and between classes.

In generic terms, pattern-recognition machines using neural networks may take one of two forms:

• The machine is split into two parts, an unsupervised network for feature extraction and a supervised network for classification, as shown in the hybridized system of Fig. 28a. Such a method follows the traditional approach to statistical pattern recognition (Fukunaga, 1990; Duda et al., 2001; Theodoridis and Koutroumbas, 2003). In conceptual terms, a pattern is represented by a set of m observables, which may be viewed as a point x in an m-dimensional observation (data) space.

Pattern associator Input

vector x

Output vector

FIGURE 27 Input–output relation of pattern associator.

Feature extraction is described by a transformation that maps the point x into an intermediate point y in a q-dimensional feature space with q < m, as indicated in Fig. 28b. This transformation may be viewed as one of dimensionality reduction (i.e., data compression), the use of which is justified on the grounds that it simplifies the task of classification. The classification is itself described as a transformation that maps the intermediate point y into one of the classes in an r-dimensional decision space, where r is the number of classes to be dis-tinguished.

• The machine is designed as a feedforward network using a supervised learning al-gorithm. In this second approach, the task of feature extraction is performed by the computational units in the hidden layer(s) of the network.

Function Approximation

The third learning task of interest is that of function approximation. Consider a nonlinear input–output mapping described by the functional relationship

(33) where the vector x is the input and the vector d is the output. The vector-valued function f(·) is assumed to be unknown. To make up for the lack of knowledge about the function f(·), we are given the set of labeled examples:

t = {(xi, di)}^N_{i = 1} (34) d = f(x)

FIGURE 28 Illustration of the classical approach to pattern classification.

Unsupervised network for

feature extraction

Feature vector

Feature extraction

Classification

m-dimensional observation space

q-dimensional feature space

r-dimensional decision space

1 2

r Supervised

network for classification Input

pattern x

x y

••

•

(a)

(b)

Outputs

Section 9 Learning Tasks 41 The requirement is to design a neural network that approximates the unknown function f(·) such that the function F(·) describing the input–output mapping actually realized by the network, is close enough to f(·) in a Euclidean sense over all inputs, as shown by

(35) where is a small positive number. Provided that the size N of the training sample is large enough and the network is equipped with an adequate number of free parameters, then the approximation error can be made small enough for the task.

The approximation problem described here is a perfect candidate for supervised learning, with xiplaying the role of input vector and diserving the role of desired re-sponse. We may turn this issue around and view supervised learning as an approxima-tion problem.

The ability of a neural network to approximate an unknown input–output mapping may be exploited in two important ways:

(i) System identification.Let Eq. (33) describe the input–output relation of an un-known memoryless multiple input–multiple output (MIMO) system; by a “memo-ryless” system, we mean a system that is time invariant. We may then use the set of labeled examples in Eq. (34) to train a neural network as a model of the system.

Let the vector y_idenote the actual output of the neural network produced in re-sponse to an input vector x_i. The difference between d_i(associated with x_i) and the network output y_iprovides the error signal vector e_i, as depicted in Fig. 29. This error signal is, in turn, used to adjust the free parameters of the network to mini-mize the squared difference between the outputs of the unknown system and the neural network in a statistical sense, and is computed over the entire training sample

(ii) Inverse modeling.Suppose next we are given a known memoryless MIMO system whose input–output relation is described by Eq. (33). The requirement in this case is to construct an inverse model that produces the vector x in response to the vector d. The inverse system may thus be described by

x = f^-1(d) (36) t.

7 F(x) - f(x)7 6  for all x

FIGURE 29 Block diagram of system identification: The neural network, doing the identification, is part of the feedback loop.

Unknown system

Input vector x_i

y_i d_i

e_i

Σ

Neural network

model

where the vector-valued function f^-1(·) denotes the inverse of f(·). Note, however, that f^-1(·) is not the reciprocal of f(·); rather, the use of superscript -1 is merely a flag to indicate an inverse. In many situations encountered in practice, the vector-valued function f(·) is much too complex and inhibits a straightforward formulation of the inverse function f^-1(·). Given the set of labeled examples in Eq. (34), we may construct a neural network approximation of f^-1(·) by using the scheme shown in Fig. 30. In the situation described here, the roles of x_iand d_iare interchanged: The vector d_iis used as the input, and x_iis treated as the desired response. Let the error signal vector e_idenote the difference between x_iand the actual output y_iof the neural network produced in response to d_i. As with the sys-tem identification problem, this error signal vector is used to adjust the free pa-rameters of the neural network to minimize the squared difference between the outputs of the unknown inverse system and the neural network in a statistical sense, and is computed over the complete training set . Typically, inverse mod-eling is a more difficult learning task than system identification, as there may not be a unique solution for it.

Control

The control of a plant is another learning task that is well suited for neural networks; by a “plant” we mean a process or critical part of a system that is to be maintained in a con-trolled condition. The relevance of learning to control should not be surprising because, after all, the human brain is a computer (i.e., information processor), the outputs of which as a whole system are actions. In the context of control, the brain is living proof that it is possible to build a generalized controller that takes full advantage of parallel distributed hardware, can control many thousands of actuators (muscle fibers) in parallel, can handle nonlinearity and noise, and can optimize over a long-range planning horizon (Werbos, 1992).

Consider the feedback control system shown in Fig. 31. The system involves the use of unity feedback around a plant to be controlled; that is, the plant output is fed back directly to the input. Thus, the plant output y is subtracted from a reference signal d supplied from an external source. The error signal e so produced is applied to a neural controller for the purpose of adjusting its free parameters. The primary objective of the controller is to supply appropriate inputs to the plant to make its output y track the

FIGURE 30 Block diagram of inverse system modeling. The neural network, acting as the inverse model, is part of the feedback loop.

Input vector x_i

System output

d_i

Error e_i

x_i Model

output y_i

Σ

Inverse model Unknown

f(⁾

Section 9 Learning Tasks 43

reference signal d. In other words, the controller has to invert the plant’s input–output behavior.

We note that in Fig. 31, the error signal e has to propagate through the neural con-troller before reaching the plant. Consequently, to perform adjustments on the free pa-rameters of the plant in accordance with an error-correction learning algorithm, we need to know the Jacobian, made up of a matrix of partial derivatives as shown by

(37) where y_kis an element of the plant output y and u_jis an element of the plant input u.

Unfortunately, the partial derivatives for the various k and j depend on the oper-ating point of the plant and are therefore not known. We may use one of two approaches to account for them:

(i) Indirect learning.Using actual input–output measurements on the plant, we first construct a neural model to produce a copy of it. This model is, in turn, used to provide an estimate of the Jacobian J. The partial derivatives constituting this Ja-cobian are subsequently used in the error-correction learning algorithm for com-puting the adjustments to the free parameters of the neural controller (Nguyen and Widrow, 1989; Suykens et al., 1996; Widrow and Walach, 1996).

(ii) Direct learning.The signs of the partial derivatives are generally known and usually remain constant over the dynamic range of the plant. This suggests that we may approximate these partial derivatives by their individual signs. Their absolute values are given a distributed representation in the free parameters of the neural controller (Saerens and Soquet, 1991; Schiffman and Geffers, 1993).

The neural controller is thereby enabled to learn the adjustments to its free pa-rameters directly from the plant.

Beamforming

Beamforming is used to distinguish between the spatial properties of a target signal and background noise. The device used to do the beamforming is called a beamformer.

The task of beamforming is compatible, for example, with feature mapping in the cortical layers of auditory systems of echolocating bats (Suga, 1990a; Simmons et al.,

0yk0uj

J = e0yk

0ujf

j, k

Controller Plant

Reference

Σ

signal d

Error signal e

Plant input u

Unity feedback

Plant output

FIGURE 31 Block diagram of feedback control system.

1992).The echolocating bat illuminates the surrounding environment by broadcasting short-duration frequency-modulated (FM) sonar signals and then uses its auditory system (in-cluding a pair of ears) to focus attention on its prey (e.g., flying insect).The ears provide the bat with a beamforming capability that is exploited by the auditory system to produce attentional selectivity.

Beamforming is commonly used in radar and sonar systems where the primary task is to detect and track a target of interest in the combined presence of receiver noise and interfering signals (e.g., jammers). This task is complicated by two factors:

• the target signal originates from an unknown direction, and

• there is no prior information available on the interfering signals.

One way of coping with situations of this kind is to use a generalized sidelobe canceller (GSLC), the block diagram of which is shown in Fig. 32. The system consists of the fol-lowing components (Griffiths and Jim, 1982; Haykin, 2002):

• An array of antenna elements, which provides a means of sampling the observation-space signal at discrete points in observation-space.

• A linear combiner defined by a set of fixed weights the output of which performs the role of a desired response. This linear combiner acts like a “spa-tial filter,” characterized by a radiation pattern (i.e., a polar plot of the ampli-tude of the antenna output versus the incidence angle of an incoming signal).

The mainlobe of this radiation pattern is pointed along a prescribed direction, for which the GSLC is constrained to produce a distortionless response. The output of the linear combiner, denoted by d(n), provides a desired response for the beamformer.

• A signal-blocking matrix Ca, the function of which is to cancel interference that leaks through the sidelobes of the radiation pattern of the spatial filter represent-ing the linear combiner.

FIGURE 32 Block diagram of generalized sidelobe canceller.

Section 10 Concluding Remarks 45

• A neural network with adjustable parameters, which is designed to accommodate statistical variations in the interfering signals.

The adjustments to the free parameters of the neural network are performed by an error-correcting learning algorithm that operates on the error signal e(n), defined as the difference between the linear combiner output d(n) and the actual output y(n) of the neural network. Thus the GSLC operates under the supervision of the linear combiner that assumes the role of a “teacher.” As with ordinary supervised learning, notice that the linear combiner is outside the feedback loop acting on the neural network. A beam-former that uses a neural network for learning is called a neuro-beambeam-former. This class of learning machines comes under the general heading of attentional neurocomputers (Hecht-Nielsen, 1990).

在文檔中 Neural Networks and Learning Machines (頁 69-76)