NOTATIONS II: PROBABILITY THEORY
7 KNOWLEDGE REPRESENTATION
1. restricting the network architecture, which is achieved through the use of local con- con-nections known as receptive fields 6 ;
2. constraining the choice of synaptic weights, which is implemented through the use of weight-sharing.7
These two techniques, particularly the latter one, have a profitable side benefit: The number of free parameters in the network could be reduced significantly.
To be specific, consider the partially connected feedforward network of Fig. 20.This network has a restricted architecture by construction.The top six source nodes constitute Section 7 Knowledge Representation 29
Input layer combined use of a receptive field and weight sharing. All four hidden neurons share the same set of weights exactly for their six synaptic connections.
the receptive field for hidden neuron 1, and so on for the other hidden neurons in the network. The receptive field of a neuron is defined as that region of the input field over which the incoming stimuli can influence the output signal produced by the neuron. The mapping of the receptive field is a powerful and shorthand description of the neuron’s behavior, and therefore its output.
To satisfy the weight-sharing constraint, we merely have to use the same set of synaptic weights for each one of the neurons in the hidden layer of the network. Then, for the example shown in Fig. 20 with six local connections per hidden neuron and a total of four hidden neurons, we may express the induced local field of hidden neuron j as
(29) where {wi}6i 1constitutes the same set of weights shared by all four hidden neurons, and xkis the signal picked up from source node k ij- 1. Equation (29) is in the form of a convolution sum. It is for this reason that a feedforward network using local connec-tions and weight sharing in the manner described herein is referred to as a convolutional network (LeCun and Bengio, 2003).
The issue of building prior information into the design of a neural network pertains to one part of Rule 4; the remaining part of the rule involves the issue of invariances, which is discussed next.
How to Build Invariances into Neural Network Design Consider the following physical phenomena:
• When an object of interest rotates, the image of the object as perceived by an ob-server usually changes in a corresponding way.
• In a coherent radar that provides amplitude as well as phase information about its surrounding environment, the echo from a moving target is shifted in frequency, due to the Doppler effect that arises from the radial motion of the target in rela-tion to the radar.
• The utterance from a person may be spoken in a soft or loud voice, and in a slow or quick manner.
In order to build an object-recognition system, a radar target-recognition system, and a speech-recognition system for dealing with these phenomena, respectively, the system must be capable of coping with a range of transformations of the observed signal.
Accordingly, a primary requirement of pattern recognition is to design a classifier that is invariant to such transformations. In other words, a class estimate represented by an output of the classifier must not be affected by transformations of the observed signal applied to the classifier input.
There are at least three techniques for rendering classifier-type neural networks invariant to transformations (Barnard and Casasent, 1991):
1. Invariance by Structure.Invariance may be imposed on a neutral network by structuring its design appropriately. Specifically, synaptic connections between the
vj = a
6 i = 1
wixi + j - 1,
j = 1, 2, 3, 4
neurons of the network are created so that transformed versions of the same input are forced to produce the same output. Consider, for example, the classification of an input image by a neural network that is required to be independent of in-plane rotations of the image about its center. We may impose rotational invariance on the network struc-ture as follows: Let wjibe the synaptic weight of neuron j connected to pixel i in the input image. If the condition wji wjkis enforced for all pixels i and k that lie at equal distances from the center of the image, then the neural network is invariant to in-plane rotations. However, in order to maintain rotational invariance, the synaptic weight wjihas to be duplicated for every pixel of the input image at the same radial distance from the origin. This points to a shortcoming of invariance by structure: The number of synaptic connections in the neural network becomes prohibitively large even for images of moderate size.
2. Invariance by Training.A neural network has a natural ability for pattern clas-sification. This ability may be exploited directly to obtain transformation invariance as follows: The network is trained by presenting it with a number of different examples of the same object, with the examples being chosen to correspond to different transfor-mations (i.e., different aspect views) of the object. Provided that the number of exam-ples is sufficiently large, and if the the network is trained to learn to discriminate between the different aspect views of the object, we may then expect the network to generalize correctly to transformations other than those shown to it. However, from an engineer-ing perspective, invariance by trainengineer-ing has two disadvantages. First, when a neural net-work has been trained to recognize an object in an invariant fashion with respect to known transformations, it is not obvious that this training will also enable the network to recognize other objects of different classes invariantly. Second, the computational de-mand imposed on the network may be too severe to cope with, especially if the dimen-sionality of the feature space is high.
3. Invariant Feature Space.The third technique of creating an invariant classifier-type neural network is illustrated in Fig. 21. It rests on the premise that it may be pos-sible to extract features that characterize the essential information content of an input data set and that are invariant to transformations of the input. If such features are used, then the network as a classifier is relieved of the burden of having to delineate the range of transformations of an object with complicated decision boundaries. Indeed, the only differences that may arise between different instances of the same object are due to unavoidable factors such as noise and occlusion. The use of an invariant-feature space offers three distinct advantages. First, the number of invariant-features applied to the network may be reduced to realistic levels. Second, the requirements imposed on network design are relaxed. Third, invariance for all objects with respect to known transformations is assured.
Section 7 Knowledge Representation 31
Invariant feature extractor
Input Class
estimate
Classifier-type neural network
FIGURE 21 Block diagram of an invariant-feature-space type of system.
EXAMPLE 1: Autoregressive Models
To illustrate the idea of invariant-feature space, consider the example of a coherent radar system used for air surveillance, where the targets of interest include aircraft, weather systems, flocks of migrating birds, and ground objects. The radar echoes from these targets possess different spec-tral characteristics. Moreover, experimental studies have shown that such radar signals can be modeled fairly closely as an autoregressive (AR) process of moderate order (Haykin and Deng, 1991). An AR model is a special form of regressive model defined for complex-valued data by
(30) where {ai}Mi 1are the AR coefficients, M is the model order, x(n) is the input, and e(n) is the error described as white noise. Basically, the AR model of Eq. (30) is represented by a tapped-delay-line filter as illustrated in Fig. 22a for M2. Equivalently, it may be represented by a lattice filter as shown in Fig. 22b, the coefficients of which are called reflection coefficients. There is a one-to-one correspondence between the AR coefficients of the model in Fig. 22a and the reflection coefficients of the model in Fig. 22b. The two models depicted here assume that the input x(n) is complex val-ued, as in the case of a coherent radar, in which case the AR coefficients and the reflection coeffi-cients are all complex valued. The asterisk in Eq. (30) and Fig. 22 signifies complex conjugation.
For now, it suffices to say that the coherent radar data may be described by a set of autoregressive coefficients, or by a corresponding set of reflection coefficients. The latter set of coefficients has
x(n) = a
M i = 1
a*ix(n - i) + e(n)
x(n 1)
x(n) x(n 2)
Σ
z1 z1
w*1 w*2
x(n)ˆ
x(n)
Σ Σ
Σ Σ
z1 z1
1
*1
2
*2
e(n) x(n) x(n)ˆ (a)
(b)
FIGURE 22 Autoregressive model of order 2: (a) tapped-delay-line model; (b) lattice-filter model. (The asterisk denotes complex conjugation.)
a computational advantage in that efficient algorithms exist for their computation directly from the input data. The feature extraction problem, however, is complicated by the fact that moving objects produce varying Doppler frequencies that depend on their radial velocities measured with respect to the radar, and that tend to obscure the spectral content of the reflection coefficients as feature discriminants. To overcome this difficulty, we must build Doppler invariance into the com-putation of the reflection coefficients. The phase angle of the first reflection coefficient turns out to be equal to the Doppler frequency of the radar signal. Accordingly, Doppler frequency normalization is applied to all coefficients so as to remove the mean Doppler shift. This is done by defining a new set of reflection coefficients {κm} related to the set of ordinary reflection coef-ficients {κm} computed from the input data as:
(31) where is the phase angle of the first reflection coefficient. The operation described in Eq. (31) is referred to as heterodyning. A set of Doppler-invariant radar features is thus represented by the nor-malized reflection coefficients κ1,κ2, ...,κM, with κ1being the only real-valued coefficient in the set.
As mentioned previously, the major categories of radar targets of interest in air surveillance are weather, birds, aircraft, and ground. The first three targets are moving, whereas the last one is not.
The heterodyned spectral parameters of radar echoes from ground have echoes similar in charac-teristic to those from aircraft.A ground echo can be discriminated from an aircraft echo because of its small Doppler shift.Accordingly, the radar classifier includes a postprocessor as shown in Fig. 23, which operates on the classified results (encoded labels) for the purpose of identifying the ground class (Haykin and Deng, 1991). Thus, the preprocessor in Fig. 23 takes care of Doppler-shift-invariant feature extraction at the classifier input, whereas the postprocessor uses the stored Doppler signature to distinguish between aircraft and ground returns. ■
EXAMPLE 2: Echolocating Bat
A much more fascinating example of knowledge representation in a neural network is found in the biological sonar system of echolocating bats. Most bats use frequency-modulated (FM, or
“chirp”) signals for the purpose of acoustic imaging; in an FM signal, the instantaneous frequency of the signal varies with time. Specifically, the bat uses its mouth to broadcast short-duration FM sonar signals and uses its auditory system as the sonar receiver. Echoes from targets of interest are represented in the auditory system by the activity of neurons that are selective to different com-binations of acoustic parameters. There are three principal neural dimensions of the bat’s audi-tory representation (Simmons et al., 1992):
• Echo frequency, which is encoded by “place” originating in the frequency map of the cochlea;
it is preserved throughout the entire auditory pathway as an orderly arrangement across cer-tain neurons tuned to different frequencies.
¿m = me-jm
for m = 1, 2, ..., M
Section 7 Knowledge Representation 33
FIGURE 23 Doppler-shift-invariant classifier of radar signals.
Labeled classes
Doppler information
Aircraft Birds Weather Ground Radar data
Neural network classifier
Postprocessor Feature
extractor (preprocessor)
• Echo amplitude, which is encoded by other neurons with different dynamic ranges; it is manifested both as amplitude tuning and as the number of discharges per stimulus.
• Echo delay, which is encoded through neural computations (based on cross-correlation) that produce delay-selective responses; it is manifested as target-range tuning.
The two principal characteristics of a target echo for image-forming purposes are spectrum for target shape and delay for target range. The bat perceives “shape” in terms of the arrival time of echoes from different reflecting surfaces (glints) within the target. For this to occur, frequency information in the echo spectrum is converted into estimates of the time structure of the target.
Experiments conducted by Simmons and coworkers on the big brown bat, Eptesicus fuscus, crit-ically identify this conversion process as consisting of parallel time-domain and frequency-to-time-domain transforms whose converging outputs create the common delay of range axis of a perceived image of the target. It appears that the unity of the bat’s perception is due to certain properties of the transforms themselves, despite the separate ways in which the auditory time representation of the echo delay and frequency representation of the echo spectrum are initially performed. Moreover, feature invariances are built into the sonar image-forming process so as to make it essentially independent of the target’s motion and the bat’s own motion. ■
Some Final Remarks
The issue of knowledge representation in a neural network is directly related to that of network architecture. Unfortunately, there is no well-developed theory for optimizing the architecture of a neural network required to interact with an environment of inter-est, or for evaluating the way in which changes in the network architecture affect the rep-resentation of knowledge inside the network. Indeed, satisfactory answers to these issues are usually found through an exhaustive experimental study for a specific application of interest, with the designer of the neural network becoming an essential part of the struc-tural learning loop.