• 沒有找到結果。

Physical Phenomenon from the Viewpoint of Information (Introduction to Quantum Information Theory)

N/A
N/A
Protected

Academic year: 2022

Share "Physical Phenomenon from the Viewpoint of Information (Introduction to Quantum Information Theory)"

Copied!
32
0
0

加載中.... (立即查看全文)

全文

(1)

Physical Phenomenon from the Viewpoint of Information (Introduction to Quantum

Information Theory)

Cheng-Yuan Liou and Jyh-Ying Peng March 28, 2004

1 Introduction

We will introduce three theories that formulate the static description and dynamic evolution of quantum physics from the viewpoint of information.

The first two theories study the possible information structures of quantum mechanics, the last one is the basis of quantum information theory. The informational viewpoint is also the viewpoint of measurement, since we gain information only by the act of measurement or observation. All refutable theories involving physical phenomenon are dependent on measurement re- sults. We can even state that all physical phenomenon arises from the act of measurement or observation, this is the central concept of this work.

Classical physics holds the assumption that physical phenomenon have a reality independent of observation, and that any observer can perform measurements with arbitrary accuracy. Regarding the first point, we note that all useful physical theories (theories that can predict future physical states) are facts about measurement results, hence they can be verified or refuted by measurement. This means that physical theories need not have an existence independent of conscious measurement or observation, and that such an assumption is neither provable nor useful, and hence is unnecessary.

Once we accept this, we realize that all theories are just models constructed by our minds to account for what we preceive in nature. That is, there are no

“correct” physical theory, only theories that are more useful, or make better predictions about measurement results, the “reality” of a physical theory is meaningless.

To discuss the second point stated earlier, we have to construct a “model”

for the act of observation. We take currently the most “accurate” physical

(2)

theory - quantum physics as the basis for the discussion. When we observe the position of an object, that object is no longer where we observed it, because the photon our eyes received gave it momentum. This means that there are no passive observation, all physical acts are interactions of some kind; as soon as we observe something, we change its physical state. So we say that measurements to infinite accuracy is not impossible, but rather the concept doesn’t have any meaning (in our model of reality); in the physical world, there is no observation, only interaction. This is the viewpoint of quantum physics, and also the viewpoint adapted here. We will take the uncertainty of the measurement process as a starting point, and demostrate how to derive physical theories from it.

2 Information in the Measurement Process - Fisher Information

2.1 Measurement

We will now start with a description of the classical measurement process.

Measuring a physical system means estimating the value of some of its phys- ical parameters by data obtained from the system. Let the ideal value of the parameter we are trying to estimate be θ, we obtain N data values y ≡ {y1, y2, . . . , yN}, the values of which are determined by conditional prob- ability p(y|θ). This conditional probability represents the intrinsic physical properties of the measured parameter.

Let the relationship between ideal value and measured data be

y = θ + x, (1)

where x ≡ {x1, x2, . . . , xN} represents the measurement uncertainty, or data fluctuations. We can define an estimator ˆθ(y) for θ based on the obtained data y. A possible estimator is the sample mean ˆθ(y) = N1 Pnyn.

Here we step beyond classical physics, and interpret x as a physical quan- tity intrinsic to the parameter θ, and independent of the measurement pro- cess. This means that x represents the uncertainty of the parameter when it is observed, and its values are not dependent on the measurement methods used or measurement errors encountered; it is an intrinsic physical prop- erty of the measured parameter. So x is the measurement uncertainty, not measurement error; it is a physical quantity independent of measurement, but manifested only by the measurement process (it has an “experimental reality”, though it may not be “real”).

(3)

If we accept that the uncertainty x of a parameter θ has physical meaning, then y, θ, and x form a closed, physically isolated system.

2.2 The Cramer-Rao Inequality

We assume that the estimator ˆθ(y) is unbiased, that is,

Dθ(y) − θˆ E

Z

dyhθ(y) − θˆ ip(y|θ) = 0, (2) where dy ≡ dy1dy2. . . dyN. Here p(y|θ) is the conditional probability distri- bution of the parameter’s fluctuation, or uncertainty, given that the value of the parameter is θ. Operating by ∂θ on both sides of equation (2), we get

Z

dyθ(y) − θˆ ∂p(y|θ)

∂θ −

Z

dyp(y|θ) = 0. (3)

The second term on the left is 1, using ∂p∂θ = p∂ ln p∂θ , we have

Z

dyθ(y) − θˆ ∂ ln p(y|θ)

∂θ p(y|θ) = 1. (4)

Separate the integrand

Z

dy

"

∂ ln p(y|θ)

∂θ

q

p(y|θ)

#

θ(y) − θˆ  qp(y|θ)



= 1 (5)

and square both sides, from the Schwarz inequality we get

Z

dy ∂ ln p(y|θ)

∂θ

!2

p(y|θ)

Z

dyθ(y) − θˆ 2p(y|θ)



≥ 1. (6)

The first term on the left of eq. (6) is defined as the Fisher information I for (the measurement of) the parameter θ,

I(θ) ≡

Z

dy ∂ ln p(y|θ)

∂θ

!2

p(y|θ) =

Z

dy 1 p(y|θ)

∂p(y|θ)

∂θ

!2

, (7)

and the second term is the mean-squared error for the estimator ˆθ(y) e2

Z

dyθ(y) − θˆ 2p(y|θ) =



θ(y) − θˆ 2



. (8)

Thus we have the Cramer-Rao inequality

e2I ≥ 1. (9)

(4)

This inequality holds for any measurement using unbiased estimates of the measured parameter, it establishes the relationship between the Fisher in- formation and mean-squared error of any measurement. As the estimation error increases, the Fisher information decreases, so the Fisher information can be seen as a measure of information. Moreover, because the estimation errors come only from x, which is an intrinsic property of the parameter, the Fisher information represents the quality of measurement attainable when there is no measurement errors or human mistakes, its value depends only on the parameter and the measured system.

2.3 The Special Case of Shift-Invariance

Assume that we take only one data value, N = 1, and p(y|θ) = p(y|θ). If

p(y|θ) = p(y − θ) = p(x), (10)

which means that the fluctuation of the data value y relative to the ideal value θ is independent of the value of θ, we call this property shift-invariance.

Under this condition the fluctuation (or uncertainty) x is independent of the value of θ, hence the Fisher information is also indenpendent of θ (in three dimensions we call this Galilean invariance, invariance of physical laws to changes of reference point). Since ∂θ = −∂(y−θ) , the Fisher information can be written as

I =

Z

dy 1 p(y − θ)

"

∂p(y − θ)

∂(y − θ)

#2

=

Z

dx 1 p(x)

"

dp(x) dx

#2

. (11)

So we can calculate the uncertainty and Fisher information of parameter θ without knowing its ideal value. When a parameter satisfies shift-invariance, no matter what its ideal value, the fluctuations observed in measurements are the same. To simplify the discussion, we will assume all parameters satisfy this property.

2.4 Probability Amplitude Functions

In eq. (11) the term p(x)1 would diverge when p(x) → 0, we can define real probability amplitude functions q(x) to avoid this problem:

p(x) = q2(x). (12)

Using this in eq. (11) we have I = 4

Z

dx dq(x) dx

!2

(13)

(5)

2.5 Extension to Vectors and Multi-Dimensions

Under the framework of relativity, physical quantities are four-vectors such as x ≡ (x, y, z, t), we will define a scalar Fisher information for the measurement of multiple four-vectors.

Suppose we are measuring N four-vector parameters of a physical system, these four-vectors can represent any physical attribute of the system. We obtain the data

yn= θn+ xn, n = 1, . . . , N (14) where yn≡ (yn1, yn2, yn3, yn4), θn≡ (θn1, θn2, θn3, θn4), and xn ≡ (xn1, xn2, xn3, xn4) are all four-vectors, they are the obtained data, ideal value, and fluctuation of the parameters respectively. For notational simplicity, we also define grand vectors

θ ≡ (θ1, θ2, . . . , θN)

y ≡ (y1, y2, . . . , yN) (15) dy ≡ dy1dy2. . . dyN,

where dyn ≡ dyn1dyn2dyn3dyn4. If the measurement consists of the same physical four-vector parameter measured N times, or N particles sufficient distance apart measured for the same physical quantity, then measurement of the individual four-vectors are independent, and yn represents the data of the n-th measurement, or the n-th particle.

We assume that all estimators ˆθn(y) on θn are unbiased, so that

Dθˆn(y)E

Z

dyˆθn(y)p(y|θ) = θn, n = 1, . . . , N (16) Using the mean-squared error of the four components of each four-vector, we can derive the Cramer-Rao inequality for each of them. Since the Fisher information is additive [1], we get the scalar Fisher information for the mea- surement

I ≡X

n

Z

dyp(y|θ)X

ν

∂ ln p(y|θ)

∂θ

!2

(17) If we assume independence between measurements of all four-vectors, then

p(y|θ) =Y

n

pn(yn|θ) =Y

n

pn(ynn). (18) So that

∂ ln p(y|θ)

∂θ =X

m

∂ ln pm(ymm)

∂θ = 1

pn(ynn)

∂pn(ynn)

∂θ . (19)

(6)

Using eq. (18) and eq. (19) to simplify the Fisher information eq. (17), we have

I =X

n

Z

dyn 1 pn(ynn)

X

ν

∂pn(ynn)

∂θ

!2

. (20)

For shift-invariant four-vectors

pn(ynn) = pn(yn− θn) = pn(xn) (21) the Fisher information can be further simplified to

I =X

n

Z

dxn 1 pn(xn)

X

ν

∂pn(xn)

∂x

!2

. (22)

Using probability amplitudes qn(xn), we have

I = 4X

n

Z

dxnX

ν

∂qn(xn)

∂x

!2

, pn(xn) = q2n(xn). (23) Finally, if all the parameters θn’s represent the same physical quantity, then all fluctuations xn are equivalent, we can then drop the subscript n on xn and get

I = 4X

n

Z

dxX

ν

∂qn(x)

∂xν

!2

, (24)

where x ≡ (x1, x2, x3, x4) is the uncertainty of each four-vector.

2.6 Total Probability Function

If the measurement in the previous section is N measurements for the same four-vector parameter of the same particle, then each xn represents the same physical quantity’s n-th measurement result. Since there is only one param- eter being measured, we can set

pn(ynn) = pn(y|θn) = pxn(x|θn) = q2n(x). (25) The probability of measuring any of the θn’s is equal, so the total probability (or net probability) function of the fluctuation four-vector x is

p(x) =

N

X

n=1

pxn(x|θn)P (θn) = 1 N

N

X

n=1

q2n(x). (26)

(7)

3 Derivation of Physical Laws by EPI

3.1 The Basics

The principle of EPI (extreme physical information) by B. Roy Frieden [1]

rests on the assumption that the act of measurement (or observation) pro- duces the physical laws.

The hierarchy of physical knowledge according to EPI consists of four layers. The topmost laws are:

1. The Fisher I-theorem. The Fisher information monotonically de- creases with time. Like entropy, I can be transferred from one system to another.

2. Bound information J . There is an information bound J intrinsic to each physical phenomenon. The information J represents an upper limit to the information gained in measurement.

3. Invariance principle. There is an invariance, or symmetry principle governing the time evolution of each physical phenomenon.

These laws exist prior to and independent of any explicit measurements, that is, they govern all physical phenomenon. At the next level we have the three axioms describing the measurement process:

Axiom 1. The conservation of information perturbation, δI = δJ , during a measurement.

Axiom 2. The existence of information densities in(x) and jn(x) de- fined as

I ≡

Z

dxX

n

in(x) and J ≡

Z

dxX

n

jn(x), (27) where in(x) = 4∇qn· ∇qn= 4Pν∂x∂qn

2

.

Axiom 3. The efficiency of information transition from phenomenon to intrinsic data on the microlevel

in(x) − κjn(x) = 0, ∀x, n. (28) The third level of knowledge consists of the EPI principle, which include the variational principle and the zero-condition. The variational principle states the extremization of K[q] ≡ I[q] − J [q], that is

δK = δ(I − J ) = 0. (29)

(8)

The zero-condition is the efficiency of information transition on the macrolevel

I = κJ. (30)

These follows either from the axioms or from the existence of a physically meaningful unitary transformation space.

The fourth and lowest level is the physical laws found by calculation of EPI, that is, the qn(xn)’s found to extremize K, where the form of J is determined by the invariance principles for the particular phenonmenon in question.

3.2 The Schr¨ odinger Wave Equation

This derivation runs parallel to the fully covariant EPI derivation of the Klein-Gordon equation in [1], chapter 4. However the Schr¨odinger equation treats space and time coordinates differently, time coordinates are assumed to have no fluctuations, and can always be determined precisely, whereas space coordinates cannot be determined with arbitrary accuracy. Since the EPI approach is covariant while the Schr¨odinger equation is not, some ap- proximations have to be made in the derivation, the end result of which is also an approximation of nature.

We will derive the one-dimensional time-independent Schr¨odinger equa- tion. The position θ of a particle of mass m is measured as data y = θ + x, where x is a random excursion governed by probability amplitudes qn(x), which is to be found. We ignore time t in the derivation, hence we will get a stationary solution to this problem. The particle is assumed to be in a conservative field of scalar potential V (x), with total energy W conserved.

The information associated with the measurement of position is

I = 4

N

X

n=1

Z

dx dqn(x) dx

!2

(31)

in the one-dimensional case. We define the complex wave functions ψn(x) as ψn(x) ≡ 1

√N(q2n−1(x) + iq2n(x)), (32) where there are N/2 of them. The information expressed with the ψn(x)’s becomes

I = 4N

N/2

X

n=1

Z

dx dψn(x) dx

!

n(x) dx

!

= 4N

N/2

X

n=1

Z

dx

n(x) dx

2

. (33)

(9)

We next define a Fourier transform space consisting of wave functions φn(µ) of momentum µ

ψn(x) = 1

√2π¯h

Z

dµφn(µ)eiµx¯h , (34) operating on both sides by dxd we get

n(x)

dx = 1

√2π¯h

Z



−iµ

¯ hφn(µ)



eiµxh¯ . (35) So we have

ψn(x)←→ φF n(µ) (36)

and dψn(x)

dx

←→ −F

¯

n(µ), (37)

then by Parseval’s theorem we have

Z

dx |ψn(x)|2 =

Z

dµ |φn(µ)|2 (38)

and

Z

dx

n(x) dx

2

= 1

¯ h2

Z

dµµ2n(µ)|2. (39) Using equation (39) in the information expression we have

I = 4N

¯ h2

Z

dµµ2

N/2

X

n=1

n(µ)|2 ≡ J, (40)

that is, the unitary nature of the fourier transform inherent in the nature of the measurement device gives rise to the invariance principle (40), where J is the bound information, and I is constrained to be equal to J . The bound information represents the maximum information obtainable from the measurement, with I = J this means that information is transferred with maximum efficiency. Yet only the workings of the input side of the measure- ment device is described by the fourier transform, we have not considered the measurement device’s output yet, hence the hallmark of quantum phe- nomenon, uncertainty in the precise state of physical systems is not included in this derivation. In other words, we are describing the situation where the Schr¨odinger cat experiment is completed, yet no one has opened the box and looked at the cat yet. For more detail see [1], 3.8 and chapter 10.

The total probability distribution for variable x is (by eq. (26)) p(x) = 1

N

N

X

n=1

q2n(x) =

N/2

X

n=1

n(x)|2, (41)

(10)

so by equation (38)

Z

X

n

n(µ)|2 =

Z

dxX

n

n(x)|2 =

Z

dxp(x) = 1, (42) so that

P (µ) =X

n

n(µ)|2 (43)

is a probability density on µ. Thus we have

J = 4N

¯ h2

Z

dµµ2

N/2

X

n=1

n(µ)|2 = 4N

¯ h2

Z

dµµ2P (µ) = 4N

¯ h2

Dµ2E. (44)

We use the non-relativistic approximation that the kinetic energy of the particle is 2mµ2, so W = V (x) + 2mµ2, we have

J = 4N

¯ h2

Dµ2E

= 8N m

¯

h2 hW − V (x)i

= 8N m

¯ h2

Z

dx(W − V (x))p(x)

= 8N m

¯ h2

Z

dx(W − V (x))

N/2

X

n=1

n(x)|2. (45) Thus we have successfully expressed J as a functional of the ψn’s, J [ψ] is the bound information functional for this problem, and I[ψ] = J [ψ].

According to the principle of extreme physical information, K = I − J is extremized, that is

K = I − J = 4N

N/2

X

n=1

Z

dx

n(x) dx

2

−2m

¯

h2 [W − V (x)]|ψn(x)|2

= Extrem.

(46) The Euler-Lagrange equation for the variational problem is

d dx

∂L

∂ψnx

!

= ∂L

∂ψn, n = 1, . . . , N/2, ψnx ≡ ∂ψn

∂x , (47)

using the integrand in equation (46) as the Lagrangian L, the solution to this variational problem is

d2ψn(x) dx2 +2m

¯

h2 [W − V (x)]ψn(x) = 0, n = 1, . . . , N/2, (48)

(11)

the time-independent Schr¨odinger wave equation.

Since the solution (SWE) is the same for each index n, an N = 2 solution is permitted, that is, the SWE defines a single complex wave function ψ(x) =

1

2(q1(x) + iq2(x)) and d2ψ(x)

dx2 + 2m

¯

h2 [W − V (x)]ψ(x) = 0. (49)

3.3 Uncertainty Principles

According to the Heisenberg uncertainty principle, a particle’s position and momentum intrinsically fluctuates by amounts x and µ from ideal (classical) values θx and θµ with variances 2x and 2µ obeying

2x2µ≥ (¯h

2)2. (50)

This relation is conventionally derived from the fourier transform relation eq.

(36) between position and momentum spaces.

This result may also be proved using the Cramer-Rao inequality of Fisher information. The mean-square error for position (θx) measurements is defined as

e2xD(ˆθx(y) − θx)2E, (51) where ˆθx(y) is a general estimator for the ideal position θxbased on measured data y. Suppose the probability distribution for x is p(x), then the Fisher information for the variable x is

Ix =

Z

dx 1 p(x)

dp(x) dx

!2

. (52)

The Cramer-Rao inequality states that

e2xIx ≥ 1. (53)

The one-dimensional wave function for a quantum particle is derived in the previous section, suppose the solution is attained with N = 2, then there is only one ψ1(x) = ψ(x). The Fisher information (on one-dimension variable) for the quantum particle is

I = 8

Z

dx

dψ(x) dx

2

. (54)

(12)

Since |ψ(x)|2 = p(x) we have |ψ(x)| =qp(x) so that

I = 8

Z

dx

d

dx|ψ(x)|eiS(x)

2

= 8

Z

dx

d dx

q

p(x)eiS(x)

2

= 8

Z

dx

1 2qp(x)

dp(x)

dx eiS(x)+qp(x)idS(x) dx eiS(x)

2

= 8

Z

dx

1 2qp(x)

dp(x)

dx +qp(x)idS(x) dx

2

= 8

Z

dx

1 4p(x)

dp(x) dx

!2

+ p(x) dS(x) dx

!2

= 2

Z

dx 1 p(x)

dp(x) dx

!2

+ 8

Z

dxp(x) dS(x) dx

!2

= 2Ix+ 8

* dS(x) dx

!2+

,

where S(x) ∈ < is the phase of ψ(x). So we have 2Ix ≤ I, from eq. (99) we have

I = J = 8

¯ h2

Dµ2E, (55)

so

Ix ≤ 4

¯ h2

Dµ2E= 4

¯

h22µ, (56)

since µ is the fluctuation in momentum. Using the Cramer-Rao inequality we have

e2x 4

¯

h22µ≥ e2xIx ≥ 1, (57) and finally

e2x2µ≥ ¯h 2

!2

. (58)

There is a subtle difference between 2x in the Heisenberg uncertainty and e2x used earlier, the former represents the variance of the position fluctuation distribution of a particle, and is independent of any measurements; while the latter is a measure of the quality of a position measurement and the subsequent position estimate, which depends on the intrinsic properties of a particle, but is only manifested by actual measurements. This difference

(13)

results in different interpretations of the meaning of the Heisenberg uncer- tainty and its Fisher version, the former treats the fluctuations as intrinsic and exists independent of any observation, while the latter inequality arises when a measurement of position is actually made, that is, when a position measurement is made on a particle, its momentum would exhibit a fluc- tuation governed by the uncertainty principle. The latter interpretation is consistent with the EPI principle in that the uncertainty is intrinsic to the phenonmenon, but only by an actual observation can its effects be felt.

3.4 Boltzmann Energy Distribution

We will derive the Boltzmann energy distribution law for a perfect gas in equilibrium. The gas is composed of M identical molecules within a con- tainer, all collisions with other molecules and container walls are assumed to be elastic. The gas has temperature T .

We again start the derivation on a covariant basis and choose the fisher coordinates to be

x0 ≡ ixE, xµ≡ cµ ≡ (cµ1, cµ2, cµ3). (59) We take the non-relativistic approximation and treat the energy fluctuation xE and the momentum fluctuations xµ separately. We are only interested in deriving the law on energy, hence the subscript on xE is dropped and the measured value of the energy is

E = θE+ x, E0 ≤ E ≤ ∞, (60)

where θE is the ideal value of the energy.

Thus we have

I(E) = −4

Z

dxX

n

dqn(x) dx

!2

, (61)

where the probability amplitudes qn(x) relate to the probability distribution function by

p(x) = 1 N

X

n

q2n(x). (62)

The negativity of I(E) is due to the use of imaginary coordinate for energy, which is justified later. The goal of this analysis is then to solve the two EPI principles

I(E) − J (E) = extrem. and I(E) = κJ (E), (63) the extremization of physical information and the zero-condition respectively.

(14)

We find the bound information functional J [q] by assuming that both EPI principle yields the same solution qn(x)’s. The more general form J [q, x] is not needed since the extra dependence on energy x explicitly only yields non-equilibrium solutions, and will be discarded after further derivations.

According to axiom 2, the existence of information densities, we can rep- resent J [q] as

J [q] = 4

Z

dxX

n

Jn(qn(x)), (64)

hence we have

K ≡ I − J = −4

Z

dxX

n

(q02n(x) + Jn(qn(x))), qn0(x) = dqn(x)

dx , (65) and the extremum principle results in the Euler-Lagrange equation of the integrand L

d dx

∂L

∂qn0(x)

!

= ∂L

∂qn(x), (66)

the solution of which is d2qn

dx2 = 1 2

dJn

dqn, n = 1, . . . , N. (67) For the information efficiency we first change the form of I by noting that

Z

dx dqn(x) dx

!2

= dqn(x) dx qn(x)

#

E0−θE

Z

dxqn(x)d2qn(x)

dx2 , (68) the first term of the result is zero since we assume that the probability of the energy to be infinity or E0 to be zero. So we have

I = 4

Z

dxX

n

qn(x)d2qn(x)

dx2 , (69)

by the energy efficiency zero-condition we have I − κJ = 4

Z

dxX

n

qn(x)d2qn(x)

dx2 − κJn(qn(x))

!

= 0. (70)

By axiom 3, the efficiency on the microlevel we have qn(x)d2qn(x)

dx2 − κJn(qn(x)) = 0, n = 1, . . . , N. (71)

(15)

Combining eqs. (67) and (71) we find that the qn’s obey κJn

qn = 1 2

dJn

dqn, (72)

or 2κ

qn = 1 2

d

dqn ln Jn. (73)

Intgrating both sides by qn we have

Jn(qn) = Anqn , An≥ 0. (74) Using this equation in either eq. (67) or eq. (71) yields

d2qn(x)

dx2 = α2nq2κ−1n (x), α2n≡ κAn ≥ 0. (75) For the invariance principle we use the normalization of p(E), which is weak in the sense that any p.d.f. is normalized. Hence for this phenomenon we have minimum prior information, that is, maximum ignorance about the independent variable in question. By classical assumptions, we set κ = 1, which represents maximum information transfer from the bound information of the phenomenon to the fisher information we gain as a result of observation.

So we have

d2qn(x)

dx2 = α2nqn(x), (76)

the general solution of which is

qn(x) = Bne−αnx+ Cneαnx, αn ≥ 0. (77) Since x is bounded below (by E0− θE) yet unbounded above, the Cn’s must vanish for p(E) to be normalizable. The solution now becomes

qn(x) = Bne−αnx. (78)

In retrospective if the coordinate of energy x0 ≡ ixE is taken to be real, then the right hand side of eq. (75) would be negative, and we would have obtained the general solution

qn(x) = Bne−iαnx+ Cnenx, (79) which is sinusoidal. Which then causes p(E) to be sinusoidal too, and cannot be normalized.

(16)

From the form of the solution the N for this problem need only be one, so we have

q(x) = Be−αx, (80)

and

p(x) = B2e−2αx. (81)

Using the change of variable x = E − θE we have

p(E) = Ce−2αE, C ≡ B2e2αθE. (82) We find the value of constants C and α by normalization and in terms of the expectation value of E:

Z E0

p(E)dE = 1, (83)

hEi =

Z E0

Ep(E)dE. (84)

We have

p(E) = 1

hEi − E0ehEi−E0E−E0 , E ≥ E0, (85) and p(E) = 0 for other values of E. Shifting the origin of E by a constant does not change the physical law, so we subtract E0 from E and get

p(E) = hEi−1ehEiE , E ≥ 0. (86) The energy of a perfect gas in equilibrium with three degrees of freedom per molecule is

hEti = 3M kT

2 , (87)

so

hEi = hEti

M = 3kT

2 , (88)

and the energy distribution is

p(E) = 2

3kTe3kT2E . (89)

3.5 Newton’s Law of Motion

Lastly we present a mock derivation of Newton’s law of motion using a pseudo EPI procedure. We assume that the energy of a particle has two forms, kinetic and conservative potential, and the total energy is constant. We

(17)

define the position perturbation q(t) in terms of a function of time, then the constant energy requirement becomes

E = 1

2m dq(t) dt

!2

− V (q(t)) = const. (90) The fisher information for this phenomenon is

I = 4

Z

dt dq(t) dt

!2

= 8 m

Z

dt(E + V (q(t))) ≡ J, (91) and κ = 1 here. We have perfect efficiency because time can be measured to infinite accuracy in the non-relativistic picture.

We extremize

K = I − J = 4

Z

dt

dq(t) dt

!2

− 2E m − 2

mV (q(t))

(92)

using the Euler-Lagrange equation and get the solution md2q(t)

dt2 = −dV (q(t))

dq(t) , (93)

which is Newton’s law of motion.

4 The Geometrical Representation of Physi- cal Phenomenon

According to the view of Italian physicist E. R. Caianiello, uncertainty is in- herent in all branches of science, he obtained a geometrical representation of physics, especially quantum physics, using the theories and methods of infor- mation geometry. In his formulation [2], the quantum physical uncertainty appears as a “curvature” in relativistic phase space. He also tries to combine such representation (quantum geometry) to theories of entropy and informa- tion, so as to find a theoretical foundation for such representations. Like the principle of EPI, his goal is to describe physical phenomenon from the viewpoint of information. But the theories of information geometry, or the geometrization of information theory used for its foundation are more general than the Fisher information, and much more difficult to comprehend.

(18)

4.1 Information Geometry

Information geometry is a specialization of differential geometry that deals with the geometrical structure of probability distributions. Under this formu- lation, probability distributions are treated as points on a manifold, and the Fisher information is the distance between different probability distributions on these manifolds.

According to Caianiello [7], the geometrical representation of a model (or theory) requires essentially the choice of a metric G and a connection Γµ and the identification of a reference frame; these depend upon the “universe”

one wishes to model. Below we give a short introduction to these concepts of differential geometry and apply them to the special case of information geometry (see [8] for an introduction).

4.1.1 Manifolds

An N -dimensional manifold MN is a “curved” space embedded in an M - dimensional affine space EM, where M > N . For instance, a curved surface or the surface of a sphere in 3-dimensions are both 2-dimensional manifolds.

A point on an N -dimensional manifold can be specified with N parameters (or coordinates) x ≡ {x1, . . . , xN}, so there is a mapping from each point of the manifold to a point on an affine space EN.

Because probability distributions (probability amplitude and wave func- tions) are used in quantum physics, Caianiello used manifolds with para- metric distributions as points, and extend the definitions so obtained to de- scriptions of quantum physics. The form of the most general parametric distribution is ρ0(x|z) where z ≡ (z1, z2, . . . , zm) ∈ Rm is the random output and x ≡ (x1, x2, . . . , xn) ∈ Rn represent the parameters (the subscripts and superscripts are all indices). So x is a point in EN, but its also a point in the manifold MN formed by the probability distributions ρ0(x|z). The parame- ters x provide a coordinate system for MN, each point of which represents a different probability distribution.

We will use Gaussian distributions to illustrate these concepts. The mean µ and standard deviation σ determines a Gaussian distribution

ρ(x|z) = 1

√2πσ exp

"

−(z − µ)22

#

, (94)

where z is the single random output, and the parameter x ≡ (x1 = f1(µ, σ), x2 = f2(µ, σ)) is determined by the mean and standard deviation. Hence Gaussian distributions form a 2-dimensional manifold M2.

(19)

4.1.2 Metric

The metric of a manifold MN with N dimensions is a N × N symmetric matrix G(x), with real elements gij(x) defined on every point of the manifold, so the metric is a tensor field of some sort. The metric G(x) serves as a standard of length or distance measure on a manifold, hence its name.

We define the infinitesimal distance between two points x and x + dx ≡ (x1+ dx1, . . . , xN + dxN) by the metric G(x) as

ds2(x) = gij(x)dxidxj (95) (note the use of Einstein’s convention of summation). In affine spaces the distance is defined as

ds2 =X

i

(dxi)2, (96)

hence its metric is the identity matrix I at all points of the space.

The manifold MN is thus a generalized space that has a “metric” G defined for the specification of distance between its points. From eq. (95) we can see that the coordinate axes may not be orthogonal, and that different coordinates may not have the same “weight” on the value of “distance”.

But more importantly, because the metric defined on different points are not in general equal, the space of a manifold is “deformed” compared to affine spaces, so the affine spaces spaned by the coordinate axes of two different points on the manifold may have no intersection.

We will now use the entropy to define a metric for the manifolds formed by probability distributions. The Shannon entropy of a p.d.f. is defined as

H(ρ(x|z)) = −

Z

ρ(x|z) ln ρ(x|z)dz, (97)

with continuous p.d.f. this may diverge, that is, it may not have meaning under certain contexts, hence we use the cross entropy instead. The cross en- tropy (or Kullback-Leibler information) of two Gaussian distributions ρ(x1|z) and ρ(x2|z) is

Hc(x1, x2) =

Z

ρ(x1|z) ln ρ(x1|z)

ρ(x2|z)dz. (98)

The J -divergence of the two distributions ρ(x1|z) and ρ(x2|z) is defined as J (x1|x2) = Hc(x1, x2) + Hc(x2, x1). (99) We interpret the J -divergence as a sort of “distance” between these two distributions. If we set x1 = x, x2 = x + dx ≡ (x1+ dx1, x2+ dx2), then the infinitesimal distance can be defined as the J -divergence

J (x|x + dx) = ds2 = ghk(x)dxhdxk, (100)

(20)

where, with the notation ∂h = ∂xh, we have the metric ghk(x) = gkh(x) =

Z

ρ(x|z)∂hln ρ(x|z)∂kln ρ(x|z)dz, (101) which forms a symmetric matrix G. Because the elements ghk(x) are in the form of the Fisher information (and yields the familiar form when h = k), the metric G is a Fisher information matrix, we refer to it as the Fisher metric in this context.

If we express the two parameters of the Gaussian distribution as x1 = µ

σ2, x2 = 1

2, (102)

then the Gaussian distribution becomes ρ(x|z) = exp

"

zx1− z2x2− 1 4

(x1)2 x2 +1

2ln x2−1 2ln π

#

, (103) and the Fisher metric

G ≡ {ghk} = σ2 2µσ2 2µσ22σ2+ 2σ4

!

. (104)

4.1.3 Coordinate Transformations

A vector field A(x) in the affine space EN can be specified by its N com- ponents Ai(x), which is the components on some coordinate system x = {x1, . . . , xN}. If there is another coordinate system x0 = {x01, . . . , x0N} for EN, then the components of A(x) on the new coordinate system can be ex- pressed as

A0j(x0) = ∂x0j

∂xi Ai(x). (105)

We will now extend these concepts to fields defined on manifolds.

In the discussion below, we will be using local coordinate systems that are limited to a certain neighborhood of the manifold. Since manifolds are

“deformed”, there may not be a global coordinate system, but a small enough locality of the manifold may be considered “flat”. Hence we will define two co- ordinate systems in the neighborhood of M ∈ MN, and let the coordinates of M in the two coordinate systems be x = {x1, . . . , xN} and x0 = {x01, . . . , x0N} respectively.

A scalar field F (x) on a manifold maps its points to a real number. Its variance under coordinate transformation is

F0(x0) = F (x), (106)

(21)

which is called invariance, since the value of a scalar field on the same point does not change under coordinate transformations.

A contravariant vector field A(x) on a manifold is the extension of normal vector fields, the variance of its components Ai(x) under coordinate trans- formation is

A0j(x0) = ∂x0j

∂xi Ai(x), (107)

which is the same as a normal vector field and is called contravariance.

A covariant vector field A(x) on a manifold with components Ai(x) is defined by

Ai(x) = gij(x)Aj(x) (108) with respect to a contravariant field and the metric. We can also consider A(x) as a vector field with both contravariant and covariant components.

Under coordinate transformations, the covariant components change as A0j(x0) = ∂xi

∂x0jAi(x), (109)

which is called covariance.

We can further generalize these concepts to a tensor field T (x) defined as the direct product of p contravariant fields and q covariant fields. Its Np×Nq components Tji11ji22...j...ipq(x) change under coordinate transformation as

Tj0i11...j...iqp(x0) = ∂x0i1

∂xk1 · · ·∂x0ip

∂xkp

∂xl1

∂x0j1 · · · ∂xlq

∂x0jqTlk11...l...kqp(x), (110) and we say that T (x) is p times contravariant and q times covariant. A scalar field F (x) can then be considered as a tensor field 0 times contravariant and 0 times covariant.

If the metric of a manifold is 2 times covariant, that is, grs0 (x0) = ∂xi

∂x0r

∂xj

∂x0sgij(x), (111)

then we call this manifold a Riemannian manifold.

4.1.4 Connection

In affine space EN the derivative is ∇ ≡ (∂1, ∂2, . . . , ∂N), with i-th component

i = ∂i = ∂xi. In manifolds, we define the i-th component of the covari- ant derivative ∇i on scalar fields and the components of contravariant and covariant vector fields as

iF (x) = ∂iF (x), (112)

(22)

iAk(x) = ∂iAk(x) + Γkji(x)Aj(x), (113) and

iAk(x) = ∂iAk(x) − Γjki(x)Aj(x) (114) respectively. The Γijk(x)’s are the components of the affine connection field, and are called affine connection coefficients, Christoffel symbols, or Γ-symbols.

They “connect”, or provides the relationship between coordinate systems of different points on manifolds, hence is needed to describe derivatives on man- ifolds. The name covariant derivative comes from the fact that the fields after operation are increased in covariance by 1, so that ∇iF (x) are components of a covariant field, ∇iAk(x) components of a tensor field 1 time contravari- ant and 1 time covariant, and ∇iAk(x) components of a 2 times covariant tensor field. Note that the connection Γ(x) is not a tensor field in general, only when the coordinate transformation is linear do the components of Γ(x) change like the components of a tensor field 1 time contravariant and 2 times covariant.

The covariant derivatives of general tensor fields can be found by ex- tending eqs. (113) and (114). For example, the k-th covariant derivative of Riemannian metric component gij(x) is

kgij(x) = ∂kgij(x) − Γlik(x)glj(x) − Γljk(x)gil(x). (115) We define Γjik = gjlΓlik (this is needed since Γ is not a tensor in general), the equation above becomes

kgij(x) = ∂kgij(x) − Γjik(x) − Γijk(x). (116) If Γijk= Γikj, then the connection is called a Riemannian connection. Using eq. (116), we can generate two more equations by permutation of the indices i, j, k, add two of these equations and subtract by the third, with the assump- tion of Riemannian connection, we can express the connection coefficients by the metric components:

Γijk(x) = 1

2(∂kgji(x) + ∂jgik(x) − ∂igkj(x)) . (117) We now define the connection for probability distribution manifolds. The most general connection compatible with information geometry is

Γ(α)ij,k(x) = Γkij(x) − α 2

Z

iln ρ(x|z)∂jln ρ(x|z)∂kln ρ(x|z)dz. (118) With the metric eq. (101) we have

Γkij =

Z

ρ



ij2 ln ρ∂kln ρ + 1

2∂iln ρ∂jln ρ∂kln ρ



dz (119)

(23)

4.1.5 Curvature Tensor

The curvature tensor (or Riemannian tensor) R(x) of a manifold is a tensor field 1 time contravariant and 3 times covariant, the definition of its compo- nents is

Rjikl(x) = ∂kΓjil(x) − ∂lΓjik(x) + Γjhk(x)Γhil(x) − Γjhl(x)Γhik(x). (120) On a Riemannian manifold, because gij = gji and Γijk = Γikj, we can define

Riklm = ginRnklm = 1

2(∂klgim+ ∂imgkl− ∂kmgil− ∂ilgkm)

+gnpnklΓpim− ΓnkmΓpil) , (121) which is 4 times covariant, the fully covariant form of the Riemannian tensor.

By the Riemannian conditions we can further state the relations

Riklm = −Rkilm (122)

Riklm = −Rikml (123)

Riklm = Rlmik (124)

Riilm= Rikll = 0 (125)

We use Γ(α)ij,k as the connection for the Gaussian manifold, using the eqs.

(104), (118) and (119) for the metric and connection, we can obtain the curvature tensor component for the Gaussian manifold

R(α)1212 = (1 − α26. (126) When α = 0, the connection becomes

Γ(0)ijk = Γkij (127)

and the curvature tensor becomes

R(0)1212 = σ6. (128)

The previous equation suggests that the curvature tensor expresses our lack of information since it vanishes only when σ = 0.

(24)

4.2 Quantum Geometry

4.2.1 The Metric and Connection

Because wave functions (complex probability amplitudes) are used instead of probability functions in quantum mechanics, we need to generalize the metric defined in information geometry to use wave functions.

We begin by changing the expression of the metric to ghk(x) = 4

Z

hqρ(x|z)∂kqρ(x|z)dz, (129) which is in terms of probability amplitude qρ(x|z). Similarly, the infinitesi- mal distance (J -divergence) can be expressed as

ds2 = 4

Z

(∂h

ρdxh)(∂k

ρdxk)dz = 4

Z

(d√

ρ)2dz. (130) If we change the variable z to a discrete variable, with φα(x) =qρ(x|z = α), we can then use the inner product form to express the metric

ghk(x) = 4X

α

hφα(x)∂kφα(x). (131) The desired generalization of the information metric is now obvious: it is (neglecting the irrelevant numerical factor)

ghk(x) = gkh(x) =

Z

ψh(x|z)ψk(x|z)dz. (132) If ψh(x|z) = ∂hφ(x|z) then eq. (132) yields the general holonomic case, if also ∂hφ = ∂hφ, we return back to the standard information metric (101).

In quantum geometry we wish to use Riemannian manifolds, hence we must have α = 0 in the connection Γ(α)ij,k, since only then would the covariant derivative of the metric vanish. The Riemannian property of the metric can be considered to correspond to the Hermitian property of density operators in quantum physics.

So we set α = 0 in eq. (118), and by eq. (119) and the following relation (which is easily proven)

k

Y

i=1

iln ρ = 2kρk2

k

Y

i=1

i

ρ, (133)

we have

Γ(0)ij,k = 4

Z

ij2√ ρ∂k

ρdz, (134)

which is also expressed with probability amplitudes.

(25)

4.2.2 The Uncertainty Relation

Since the metric is in the form of Fisher information, we can use the Cramer- Rao inequality to derive the uncertainty relation.

In the one-dimensional case (with manifold M1), let ˆx be an unbiased estimator for the single parameter x. With variance (∆x)2 ≡ h(ˆx − x)2i, the Cramer-Rao inequality is

(∆x)2· g11≥ 1, (135)

where the only component of the metric g11 is the Fisher information. Using eq. (129) with x ≡ x1 we have

(∆x)2· 4

Z

dz ∂√ ρ

∂x

!2

≥ 1, (136)

and with quantum physical identities√

ρ = φ, px= −i¯h∂x we get the uncer- tainty relation

∆x · ∆px ≥ ¯h

2. (137)

Thus the uncertainty relation can be derived from Cramer-Rao inequality, and hence is not exclusive to quantum physics.

4.2.3 The Sign of Infinitesimal Distance ds2

From the preceding derivation, the “infinitesimal distance” ds2 and “infinites- imal cross-entropy” dHc can be considered equivalent (to within appropiate identification, see eqs. (99) and (100))

ds2 ≡ dHc = ghkdxhdxk =

Z

ψhdxhψkdxkdz. (138) Under special relativity a particle in space-time is described by the four parameters (four-vector) x = {x1 = ct, x2 = ix, x3 = iy, x4 = iz}, and the metric G is identity, we have

ds2 = c2dt2− dx2− dy2− dz2 ≥ 0, (139) which is the requirement that the speed of particles not exceed c. From this inequality we also have

dHc ≥ 0. (140)

Since the previous equation is derived from a basic assumption of relativity, it can be considered as a basic physical principle, similar to other basic principles such as the second law of thermodynamics or the I-theorem of EPI.

參考文獻

相關文件

An information literate person is able to recognise that information processing skills and freedom of information access are pivotal to sustaining the development of a

• In 1976 Hawking argued that the formation and evaporation of black holes leads to a fundamental loss of information from the universe, a breakdown of predictability, as

Official Statistics --- Reproduction of these data is allowed provided the source is quoted.. Further information can be obtained from the Documentation and Information Centre

 The IEC endeavours to ensure that the information contained in this presentation is accurate as of the date of its presentation, but the information is provided on an

 Create and present information and ideas for the purpose of sharing and exchanging by using information from different sources, in view of the needs of the audience. 

 Create and present information and ideas for the purpose of sharing and exchanging by using information from different sources, in view of the needs of the audience. 

Centre for Learning Sciences and Technologies (CLST) The Chinese University of Hong Kong..

– S+U can also preserve annotations of synthetic images – Refined images really help improving the testing result – Generate &gt; 1 images for each synthetic