3 MODELS OF A NEURON

NOTATIONS II: PROBABILITY THEORY

A neuron is an information-processing unit that is fundamental to the operation of a neural network. The block diagram of Fig. 5 shows the model of a neuron, which forms the basis for designing a large family of neural networks studied in later chapters. Here, we identify three basic elements of the neural model:

1. A set of synapses, or connecting links, each of which is characterized by a weight or strength of its own. Specifically, a signal x_jat the input of synapse j connected to neuron k is multiplied by the synaptic weight w_kj. It is important to make a note of the manner in which the subscripts of the synaptic weight w_kjare written. The first subscript in w_kjrefers to the neuron in question, and the second subscript refers to the input end of the synapse to which the weight refers. Unlike the weight of a synapse in the brain, the synaptic weight of an artificial neuron may lie in a range that includes negative as well as positive values.

2. An adder for summing the input signals, weighted by the respective synaptic strengths of the neuron; the operations described here constitute a linear combiner.

3. An activation function for limiting the amplitude of the output of a neuron. The ac-tivation function is also referred to as a squashing function, in that it squashes (limits) the permissible amplitude range of the output signal to some finite value.

7 5 1 2 4 3

6 8

11 38

37 19

18 18

17 21

45 44

4142 43

FIGURE 4 Cytoarchitectural map of the cerebral cortex. The different areas are identified by the thickness of their layers and types of cells within them. Some of the key sensory areas are as follows: Motor cortex: motor strip, area 4; premotor area, area 6; frontal eye fields, area 8. Somatosensory cortex: areas 3, 1, and 2. Visual cortex: areas 17, 18, and 19. Auditory cortex:

areas 41 and 42. (From A. Brodal, 1981; with permission of Oxford University Press.)

Typically, the normalized amplitude range of the output of a neuron is written as the closed unit interval [0,1], or, alternatively, [-1,1].

The neural model of Fig. 5 also includes an externally applied bias, denoted by b_k. The bias b_khas the effect of increasing or lowering the net input of the activation function, depending on whether it is positive or negative, respectively.

In mathematical terms, we may describe the neuron k depicted in Fig. 5 by writ-ing the pair of equations:

(1) and

(2) where x₁, x₂, ..., x_mare the input signals;wk1,wk2, ...,wkmare the respective synaptic weights of neuron k; u_k(not shown in Fig. 5) is the linear combiner output due to the input signals; b_k is the bias;„(·) is the activation function; and ykis the output signal of the neuron. The use of bias b_khas the effect of applying an affine transformation to the out-put u_kof the linear combiner in the model of Fig. 5, as shown by

(3) In particular, depending on whether the bias b_kis positive or negative, the relationship between the induced local field, or activation potential,vkof neuron k and the linear combiner output u_kis modified in the manner illustrated in Fig. 6; hereafter, these two terms are used interchangeably. Note that as a result of this affine transformation, the graph of vkversus u_kno longer passes through the origin.

The bias b_kis an external parameter of neuron k. We may account for its presence as in Eq. (2). Equivalently, we may formulate the combination of Eqs. (1) to (3) as follows:

vk = a_{j = 0}^m wkjxj (4) vk = uk + bk

yk = ␸(uk + bk) uk = a_{j = 1}^m wkjxj

Section 3 Models of a Neuron 11 FIGURE 5 Nonlinear model of a neuron, labeled k.

Σ

wk1

wk2

wkm

x₁

x₂

x_m

w(⭈⁾

Activation function

Output y_k

Summing junction

Synaptic weights Input

signals

Bias b_k

•

••

•

••

and

(5) In Eq. (4), we have added a new synapse. Its input is

(6) and its weight is

(7) We may therefore reformulate the model of neuron k as shown in Fig. 7. In this figure, the effect of the bias is accounted for by doing two things: (1) adding a new input signal fixed at 1, and (2) adding a new synaptic weight equal to the bias bk. Although the models of Figs. 5 and 7 are different in appearance, they are mathematically equivalent.

w_k0 = bk

x₀ = +1 y_k = (vk)

0 Induced local fieldyk

Linear combiner output u_k Bias b_k 0

b_k 0 b_k 0 FIGURE 6 Affine transformation produced by the

presence of a bias; note that vk^bkat uk^0.

Σ

w_k1

w_k2

w_k0 w_k0 bk

w_km x₁

Fixed input x₀ 1

x₂

x_m

Summing junction

Synaptic weights (including bias) Inputs

••

•

••

•

w(⁾

Activation function

Output y_k yk

FIGURE 7 Another nonlinear model of a neuron;wk0accounts for the bias bk.

Types of Activation Function

The activation function, denoted by (v), defines the output of a neuron in terms of the induced local field v. In what follows, we identify two basic types of activation functions:

1. Threshold Function.For this type of activation function, described in Fig. 8a, we have

(8) In engineering, this form of a threshold function is commonly referred to as a Heaviside function. Correspondingly, the output of neuron k employing such a threshold function is expressed as

(9) wherevkis the induced local field of the neuron; that is,

(10) v_k = a

m j = 1

w_kjx_j + bk

y_k = e1 ifvk 0 0 ifvk 6 0 (v) = e1 ifv 0 0 ifv 6 0

Section 3 Models of a Neuron 13

2 1.5 1 0.5 0 0.5 1 1.5 2

w(v)

v (a)

10 8 6 4 2 0 2 4 6 8 10

w(v)

v (b)

Increasing a

FIGURE 8 (a) Threshold function.

(b) Sigmoid function for varying slope parameter a.

In neural computation, such a neuron is referred to as the McCulloch–Pitts model, in recognition of the pioneering work done by McCulloch and Pitts (1943). In this model, the output of a neuron takes on the value of 1 if the induced local field of that neuron is nonnegative, and 0 otherwise. This statement describes the all-or-none property of the McCulloch–Pitts model.

2. Sigmoid Function.⁴The sigmoid function, whose graph is “S”-shaped, is by far the most common form of activation function used in the construction of neural net-works. It is defined as a strictly increasing function that exhibits a graceful balance be-tween linear and nonlinear behavior. An example of the sigmoid function is the logistic function,⁵defined by

(11)

where a is the slope parameter of the sigmoid function. By varying the parameter a, we obtain sigmoid functions of different slopes, as illustrated in Fig. 8b. In fact, the slope at the origin equals a/4. In the limit, as the slope parameter approaches infinity, the sig-moid function becomes simply a threshold function. Whereas a threshold function as-sumes the value of 0 or 1, a sigmoid function asas-sumes a continuous range of values from 0 to 1. Note also that the sigmoid function is differentiable, whereas the threshold function is not. (Differentiability is an important feature of neural network theory, as described in Chapter 4).

The activation functions defined in Eqs. (8) and (11) range from 0 to 1. It is sometimes desirable to have the activation function range from -1 to 1, in which case, the activation function is an odd function of the induced local field. Specifically, the threshold function of Eq. (8) is now defined as

(12)

which is commonly referred to as the signum function. For the corresponding form of a sigmoid function, we may use the hyperbolic tangent function, defined by

(13) Allowing an activation function of the sigmoid type to assume negative values as pre-scribed by Eq. (13) may yield practical benefits over the logistic function of Eq. (11).

Stochastic Model of a Neuron

The neural model described in Fig. 7 is deterministic in that its input–output behav-ior is precisely defined for all inputs. For some applications of neural networks, it is de-sirable to base the analysis on a stochastic neural model. In an analytically tractable approach, the activation function of the McCulloch–Pitts model is given a probabilistic interpretation. Specifically, a neuron is permitted to reside in only one of two states:1

(v) = tanh(v) (v) = •

1 ifv 7 0 0 ifv = 0 -1 ifv 6 0

(v) = 1

1 + exp(-av)

or-1, say. The decision for a neuron to fire (i.e., switch its state from “off” to “on”) is probabilistic. Let x denote the state of the neuron and P(v) denote the probability of firing, where v is the induced local field of the neuron. We may then write

(14) A standard choice for P(v) is the sigmoid-shaped function

(15) where T is a pseudotemperature used to control the noise level and therefore the un-certainty in firing (Little, 1974). It is important to realize, however, that T is not the phys-ical temperature of a neural network, be it a biologphys-ical or an artificial neural network.

Rather, as already stated, we should think of T merely as a parameter that controls the thermal fluctuations representing the effects of synaptic noise. Note that when , the stochastic neuron described by Eqs. (14) and (15) reduces to a noiseless (i.e., determin-istic) form, namely, the McCulloch–Pitts model.

在文檔中 Neural Networks and Learning Machines (頁 41-46)