## Physical Phenomenon from the Viewpoint of Information (Introduction to Quantum

## Information Theory)

### Cheng-Yuan Liou and Jyh-Ying Peng March 28, 2004

### 1 Introduction

We will introduce three theories that formulate the static description and dynamic evolution of quantum physics from the viewpoint of information.

The first two theories study the possible information structures of quantum mechanics, the last one is the basis of quantum information theory. The informational viewpoint is also the viewpoint of measurement, since we gain information only by the act of measurement or observation. All refutable theories involving physical phenomenon are dependent on measurement re- sults. We can even state that all physical phenomenon arises from the act of measurement or observation, this is the central concept of this work.

Classical physics holds the assumption that physical phenomenon have a reality independent of observation, and that any observer can perform measurements with arbitrary accuracy. Regarding the first point, we note that all useful physical theories (theories that can predict future physical states) are facts about measurement results, hence they can be verified or refuted by measurement. This means that physical theories need not have an existence independent of conscious measurement or observation, and that such an assumption is neither provable nor useful, and hence is unnecessary.

Once we accept this, we realize that all theories are just models constructed by our minds to account for what we preceive in nature. That is, there are no

“correct” physical theory, only theories that are more useful, or make better predictions about measurement results, the “reality” of a physical theory is meaningless.

To discuss the second point stated earlier, we have to construct a “model”

for the act of observation. We take currently the most “accurate” physical

theory - quantum physics as the basis for the discussion. When we observe the position of an object, that object is no longer where we observed it, because the photon our eyes received gave it momentum. This means that there are no passive observation, all physical acts are interactions of some kind; as soon as we observe something, we change its physical state. So we say that measurements to infinite accuracy is not impossible, but rather the concept doesn’t have any meaning (in our model of reality); in the physical world, there is no observation, only interaction. This is the viewpoint of quantum physics, and also the viewpoint adapted here. We will take the uncertainty of the measurement process as a starting point, and demostrate how to derive physical theories from it.

### 2 Information in the Measurement Process - Fisher Information

### 2.1 Measurement

We will now start with a description of the classical measurement process.

Measuring a physical system means estimating the value of some of its phys-
ical parameters by data obtained from the system. Let the ideal value of
the parameter we are trying to estimate be θ, we obtain N data values
y ≡ {y_{1}, y_{2}, . . . , y_{N}}, the values of which are determined by conditional prob-
ability p(y|θ). This conditional probability represents the intrinsic physical
properties of the measured parameter.

Let the relationship between ideal value and measured data be

y = θ + x, (1)

where x ≡ {x_{1}, x_{2}, . . . , x_{N}} represents the measurement uncertainty, or data
fluctuations. We can define an estimator ˆθ(y) for θ based on the obtained
data y. A possible estimator is the sample mean ˆθ(y) = _{N}^{1} ^{P}_{n}y_{n}.

Here we step beyond classical physics, and interpret x as a physical quan- tity intrinsic to the parameter θ, and independent of the measurement pro- cess. This means that x represents the uncertainty of the parameter when it is observed, and its values are not dependent on the measurement methods used or measurement errors encountered; it is an intrinsic physical prop- erty of the measured parameter. So x is the measurement uncertainty, not measurement error; it is a physical quantity independent of measurement, but manifested only by the measurement process (it has an “experimental reality”, though it may not be “real”).

If we accept that the uncertainty x of a parameter θ has physical meaning, then y, θ, and x form a closed, physically isolated system.

### 2.2 The Cramer-Rao Inequality

We assume that the estimator ˆθ(y) is unbiased, that is,

Dθ(y) − θˆ ^{E}≡

Z

dy^{h}θ(y) − θˆ ^{i}p(y|θ) = 0, (2)
where dy ≡ dy_{1}dy_{2}. . . dy_{N}. Here p(y|θ) is the conditional probability distri-
bution of the parameter’s fluctuation, or uncertainty, given that the value of
the parameter is θ. Operating by _{∂θ}^{∂} on both sides of equation (2), we get

Z

dy^{}θ(y) − θˆ ^{}∂p(y|θ)

∂θ −

Z

dyp(y|θ) = 0. (3)

The second term on the left is 1, using ^{∂p}_{∂θ} = p^{∂ ln p}_{∂θ} , we have

Z

dy^{}θ(y) − θˆ ^{}∂ ln p(y|θ)

∂θ p(y|θ) = 1. (4)

Separate the integrand

Z

dy

"

∂ ln p(y|θ)

∂θ

q

p(y|θ)

#

θ(y) − θˆ ^{ q}p(y|θ)

= 1 (5)

and square both sides, from the Schwarz inequality we get

Z

dy ∂ ln p(y|θ)

∂θ

!2

p(y|θ)

Z

dy^{}θ(y) − θˆ ^{}^{2}p(y|θ)

≥ 1. (6)

The first term on the left of eq. (6) is defined as the Fisher information I for (the measurement of) the parameter θ,

I(θ) ≡

Z

dy ∂ ln p(y|θ)

∂θ

!2

p(y|θ) =

Z

dy 1 p(y|θ)

∂p(y|θ)

∂θ

!2

, (7)

and the second term is the mean-squared error for the estimator ˆθ(y)
e^{2} ≡

Z

dy^{}θ(y) − θˆ ^{}^{2}p(y|θ) =

θ(y) − θˆ ^{}^{2}

. (8)

Thus we have the Cramer-Rao inequality

e^{2}I ≥ 1. (9)

This inequality holds for any measurement using unbiased estimates of the measured parameter, it establishes the relationship between the Fisher in- formation and mean-squared error of any measurement. As the estimation error increases, the Fisher information decreases, so the Fisher information can be seen as a measure of information. Moreover, because the estimation errors come only from x, which is an intrinsic property of the parameter, the Fisher information represents the quality of measurement attainable when there is no measurement errors or human mistakes, its value depends only on the parameter and the measured system.

### 2.3 The Special Case of Shift-Invariance

Assume that we take only one data value, N = 1, and p(y|θ) = p(y|θ). If

p(y|θ) = p(y − θ) = p(x), (10)

which means that the fluctuation of the data value y relative to the ideal value θ is independent of the value of θ, we call this property shift-invariance.

Under this condition the fluctuation (or uncertainty) x is independent of the
value of θ, hence the Fisher information is also indenpendent of θ (in three
dimensions we call this Galilean invariance, invariance of physical laws to
changes of reference point). Since _{∂θ}^{∂} = −_{∂(y−θ)}^{∂} , the Fisher information can
be written as

I =

Z

dy 1 p(y − θ)

"

∂p(y − θ)

∂(y − θ)

#2

=

Z

dx 1 p(x)

"

dp(x) dx

#2

. (11)

So we can calculate the uncertainty and Fisher information of parameter θ without knowing its ideal value. When a parameter satisfies shift-invariance, no matter what its ideal value, the fluctuations observed in measurements are the same. To simplify the discussion, we will assume all parameters satisfy this property.

### 2.4 Probability Amplitude Functions

In eq. (11) the term _{p(x)}^{1} would diverge when p(x) → 0, we can define real
probability amplitude functions q(x) to avoid this problem:

p(x) = q^{2}(x). (12)

Using this in eq. (11) we have I = 4

Z

dx dq(x) dx

!2

(13)

### 2.5 Extension to Vectors and Multi-Dimensions

Under the framework of relativity, physical quantities are four-vectors such as x ≡ (x, y, z, t), we will define a scalar Fisher information for the measurement of multiple four-vectors.

Suppose we are measuring N four-vector parameters of a physical system, these four-vectors can represent any physical attribute of the system. We obtain the data

y_{n}= θ_{n}+ x_{n}, n = 1, . . . , N (14)
where y_{n}≡ (y_{n1}, y_{n2}, y_{n3}, y_{n4}), θ_{n}≡ (θ_{n1}, θ_{n2}, θ_{n3}, θ_{n4}), and x_{n} ≡ (x_{n1}, x_{n2}, x_{n3}, x_{n4})
are all four-vectors, they are the obtained data, ideal value, and fluctuation
of the parameters respectively. For notational simplicity, we also define grand
vectors

θ ≡ (θ_{1}, θ_{2}, . . . , θ_{N})

y ≡ (y1, y2, . . . , yN) (15) dy ≡ dy1dy2. . . dyN,

where dy_{n} ≡ dy_{n1}dy_{n2}dy_{n3}dy_{n4}. If the measurement consists of the same
physical four-vector parameter measured N times, or N particles sufficient
distance apart measured for the same physical quantity, then measurement
of the individual four-vectors are independent, and y_{n} represents the data of
the n-th measurement, or the n-th particle.

We assume that all estimators ˆθ_{n}(y) on θ_{n} are unbiased, so that

Dθˆ_{n}(y)^{E} ≡

Z

dyˆθ_{n}(y)p(y|θ) = θ_{n}, n = 1, . . . , N (16)
Using the mean-squared error of the four components of each four-vector,
we can derive the Cramer-Rao inequality for each of them. Since the Fisher
information is additive [1], we get the scalar Fisher information for the mea-
surement

I ≡^{X}

n

Z

dyp(y|θ)^{X}

ν

∂ ln p(y|θ)

∂θ_{nν}

!2

(17) If we assume independence between measurements of all four-vectors, then

p(y|θ) =^{Y}

n

p_{n}(y_{n}|θ) =^{Y}

n

p_{n}(y_{n}|θ_{n}). (18)
So that

∂ ln p(y|θ)

∂θ_{nν} =^{X}

m

∂ ln p_{m}(y_{m}|θ_{m})

∂θ_{nν} = 1

p_{n}(y_{n}|θ_{n})

∂p_{n}(y_{n}|θ_{n})

∂θ_{nν} . (19)

Using eq. (18) and eq. (19) to simplify the Fisher information eq. (17), we have

I =^{X}

n

Z

dy_{n} 1
p_{n}(y_{n}|θ_{n})

X

ν

∂pn(yn|θn)

∂θ_{nν}

!2

. (20)

For shift-invariant four-vectors

p_{n}(y_{n}|θ_{n}) = p_{n}(y_{n}− θ_{n}) = p_{n}(x_{n}) (21)
the Fisher information can be further simplified to

I =^{X}

n

Z

dx_{n} 1
pn(xn)

X

ν

∂p_{n}(x_{n})

∂xnν

!2

. (22)

Using probability amplitudes q_{n}(x_{n}), we have

I = 4^{X}

n

Z

dx_{n}^{X}

ν

∂q_{n}(x_{n})

∂x_{nν}

!2

, p_{n}(x_{n}) = q^{2}_{n}(x_{n}). (23)
Finally, if all the parameters θ_{n}’s represent the same physical quantity, then
all fluctuations x_{n} are equivalent, we can then drop the subscript n on x_{n}
and get

I = 4^{X}

n

Z

dx^{X}

ν

∂q_{n}(x)

∂x_{ν}

!2

, (24)

where x ≡ (x_{1}, x_{2}, x_{3}, x_{4}) is the uncertainty of each four-vector.

### 2.6 Total Probability Function

If the measurement in the previous section is N measurements for the same
four-vector parameter of the same particle, then each x_{n} represents the same
physical quantity’s n-th measurement result. Since there is only one param-
eter being measured, we can set

p_{n}(y_{n}|θ_{n}) = p_{n}(y|θ_{n}) = p_{x}_{n}(x|θ_{n}) = q^{2}_{n}(x). (25)
The probability of measuring any of the θ_{n}’s is equal, so the total probability
(or net probability) function of the fluctuation four-vector x is

p(x) =

N

X

n=1

pxn(x|θn)P (θn) = 1 N

N

X

n=1

q^{2}_{n}(x). (26)

### 3 Derivation of Physical Laws by EPI

### 3.1 The Basics

The principle of EPI (extreme physical information) by B. Roy Frieden [1]

rests on the assumption that the act of measurement (or observation) pro- duces the physical laws.

The hierarchy of physical knowledge according to EPI consists of four layers. The topmost laws are:

1. The Fisher I-theorem. The Fisher information monotonically de- creases with time. Like entropy, I can be transferred from one system to another.

2. Bound information J . There is an information bound J intrinsic to each physical phenomenon. The information J represents an upper limit to the information gained in measurement.

3. Invariance principle. There is an invariance, or symmetry principle governing the time evolution of each physical phenomenon.

These laws exist prior to and independent of any explicit measurements, that is, they govern all physical phenomenon. At the next level we have the three axioms describing the measurement process:

Axiom 1. The conservation of information perturbation, δI = δJ , during a measurement.

Axiom 2. The existence of information densities in(x) and jn(x) de- fined as

I ≡

Z

dx^{X}

n

i_{n}(x) and J ≡

Z

dx^{X}

n

j_{n}(x), (27)
where i_{n}(x) = 4∇q_{n}· ∇q_{n}= 4^{P}_{ν}^{}_{∂x}^{∂q}^{n}

nν

2

.

Axiom 3. The efficiency of information transition from phenomenon to intrinsic data on the microlevel

i_{n}(x) − κj_{n}(x) = 0, ∀x, n. (28)
The third level of knowledge consists of the EPI principle, which include the
variational principle and the zero-condition. The variational principle states
the extremization of K[q] ≡ I[q] − J [q], that is

δK = δ(I − J ) = 0. (29)

The zero-condition is the efficiency of information transition on the macrolevel

I = κJ. (30)

These follows either from the axioms or from the existence of a physically meaningful unitary transformation space.

The fourth and lowest level is the physical laws found by calculation of
EPI, that is, the q_{n}(x_{n})’s found to extremize K, where the form of J is
determined by the invariance principles for the particular phenonmenon in
question.

### 3.2 The Schr¨ odinger Wave Equation

This derivation runs parallel to the fully covariant EPI derivation of the Klein-Gordon equation in [1], chapter 4. However the Schr¨odinger equation treats space and time coordinates differently, time coordinates are assumed to have no fluctuations, and can always be determined precisely, whereas space coordinates cannot be determined with arbitrary accuracy. Since the EPI approach is covariant while the Schr¨odinger equation is not, some ap- proximations have to be made in the derivation, the end result of which is also an approximation of nature.

We will derive the one-dimensional time-independent Schr¨odinger equa-
tion. The position θ of a particle of mass m is measured as data y = θ + x,
where x is a random excursion governed by probability amplitudes q_{n}(x),
which is to be found. We ignore time t in the derivation, hence we will get
a stationary solution to this problem. The particle is assumed to be in a
conservative field of scalar potential V (x), with total energy W conserved.

The information associated with the measurement of position is

I = 4

N

X

n=1

Z

dx dq_{n}(x)
dx

!2

(31)

in the one-dimensional case. We define the complex wave functions ψn(x) as
ψ_{n}(x) ≡ 1

√N(q_{2n−1}(x) + iq_{2n}(x)), (32)
where there are N/2 of them. The information expressed with the ψ_{n}(x)’s
becomes

I = 4N

N/2

X

n=1

Z

dx dψ_{n}(x)
dx

!∗

dψ_{n}(x)
dx

!

= 4N

N/2

X

n=1

Z

dx

dψ_{n}(x)
dx

2

. (33)

We next define a Fourier transform space consisting of wave functions φ_{n}(µ)
of momentum µ

ψ_{n}(x) = 1

√2π¯h

Z

dµφ_{n}(µ)e^{−}^{iµx}^{¯}^{h} , (34)
operating on both sides by _{dx}^{d} we get

dψ_{n}(x)

dx = 1

√2π¯h

Z

dµ

−iµ

¯
hφ_{n}(µ)

e^{−}^{iµx}^{h}^{¯} . (35)
So we have

ψ_{n}(x)←→ φ^{F} _{n}(µ) (36)

and dψ_{n}(x)

dx

←→ −F iµ

¯

hφ_{n}(µ), (37)

then by Parseval’s theorem we have

Z

dx |ψn(x)|^{2} =

Z

dµ |φn(µ)|^{2} (38)

and

Z

dx

dψ_{n}(x)
dx

2

= 1

¯
h^{2}

Z

dµµ^{2}|φ_{n}(µ)|^{2}. (39)
Using equation (39) in the information expression we have

I = 4N

¯
h^{2}

Z

dµµ^{2}

N/2

X

n=1

|φ_{n}(µ)|^{2} ≡ J, (40)

that is, the unitary nature of the fourier transform inherent in the nature of the measurement device gives rise to the invariance principle (40), where J is the bound information, and I is constrained to be equal to J . The bound information represents the maximum information obtainable from the measurement, with I = J this means that information is transferred with maximum efficiency. Yet only the workings of the input side of the measure- ment device is described by the fourier transform, we have not considered the measurement device’s output yet, hence the hallmark of quantum phe- nomenon, uncertainty in the precise state of physical systems is not included in this derivation. In other words, we are describing the situation where the Schr¨odinger cat experiment is completed, yet no one has opened the box and looked at the cat yet. For more detail see [1], 3.8 and chapter 10.

The total probability distribution for variable x is (by eq. (26)) p(x) = 1

N

N

X

n=1

q^{2}_{n}(x) =

N/2

X

n=1

|ψ_{n}(x)|^{2}, (41)

so by equation (38)

Z

dµ^{X}

n

|φn(µ)|^{2} =

Z

dx^{X}

n

|ψn(x)|^{2} =

Z

dxp(x) = 1, (42) so that

P (µ) =^{X}

n

|φ_{n}(µ)|^{2} (43)

is a probability density on µ. Thus we have

J = 4N

¯
h^{2}

Z

dµµ^{2}

N/2

X

n=1

|φ_{n}(µ)|^{2} = 4N

¯
h^{2}

Z

dµµ^{2}P (µ) = 4N

¯
h^{2}

Dµ^{2}^{E}. (44)

We use the non-relativistic approximation that the kinetic energy of the
particle is _{2m}^{µ}^{2}, so W = V (x) + _{2m}^{µ}^{2}, we have

J = 4N

¯
h^{2}

Dµ^{2}^{E}

= 8N m

¯

h^{2} hW − V (x)i

= 8N m

¯
h^{2}

Z

dx(W − V (x))p(x)

= 8N m

¯
h^{2}

Z

dx(W − V (x))

N/2

X

n=1

|ψ_{n}(x)|^{2}. (45)
Thus we have successfully expressed J as a functional of the ψ_{n}’s, J [ψ] is the
bound information functional for this problem, and I[ψ] = J [ψ].

According to the principle of extreme physical information, K = I − J is extremized, that is

K = I − J = 4N

N/2

X

n=1

Z

dx

dψ_{n}(x)
dx

2

−2m

¯

h^{2} [W − V (x)]|ψ_{n}(x)|^{2}

= Extrem.

(46) The Euler-Lagrange equation for the variational problem is

d dx

∂L

∂ψ_{nx}^{∗}

!

= ∂L

∂ψ_{n}^{∗}, n = 1, . . . , N/2, ψ_{nx}^{∗} ≡ ∂ψ_{n}^{∗}

∂x , (47)

using the integrand in equation (46) as the Lagrangian L, the solution to this variational problem is

d^{2}ψ_{n}(x)
dx^{2} +2m

¯

h^{2} [W − V (x)]ψ_{n}(x) = 0, n = 1, . . . , N/2, (48)

the time-independent Schr¨odinger wave equation.

Since the solution (SWE) is the same for each index n, an N = 2 solution is permitted, that is, the SWE defines a single complex wave function ψ(x) =

√1

2(q_{1}(x) + iq_{2}(x)) and
d^{2}ψ(x)

dx^{2} + 2m

¯

h^{2} [W − V (x)]ψ(x) = 0. (49)

### 3.3 Uncertainty Principles

According to the Heisenberg uncertainty principle, a particle’s position and
momentum intrinsically fluctuates by amounts x and µ from ideal (classical)
values θx and θµ with variances ^{2}_{x} and ^{2}_{µ} obeying

^{2}_{x}^{2}_{µ}≥ (¯h

2)^{2}. (50)

This relation is conventionally derived from the fourier transform relation eq.

(36) between position and momentum spaces.

This result may also be proved using the Cramer-Rao inequality of Fisher
information. The mean-square error for position (θ_{x}) measurements is defined
as

e^{2}_{x} ≡^{D}(ˆθ_{x}(y) − θ_{x})^{2}^{E}, (51)
where ˆθ_{x}(y) is a general estimator for the ideal position θ_{x}based on measured
data y. Suppose the probability distribution for x is p(x), then the Fisher
information for the variable x is

I_{x} =

Z

dx 1 p(x)

dp(x) dx

!2

. (52)

The Cramer-Rao inequality states that

e^{2}_{x}I_{x} ≥ 1. (53)

The one-dimensional wave function for a quantum particle is derived in the
previous section, suppose the solution is attained with N = 2, then there is
only one ψ_{1}(x) = ψ(x). The Fisher information (on one-dimension variable)
for the quantum particle is

I = 8

Z

dx

dψ(x) dx

2

. (54)

Since |ψ(x)|^{2} = p(x) we have |ψ(x)| =^{q}p(x) so that

I = 8

Z

dx

d

dx|ψ(x)|e^{iS(x)}

2

= 8

Z

dx

d dx

q

p(x)e^{iS(x)}

2

= 8

Z

dx

1
2^{q}p(x)

dp(x)

dx e^{iS(x)}+^{q}p(x)idS(x)
dx e^{iS(x)}

2

= 8

Z

dx

1
2^{q}p(x)

dp(x)

dx +^{q}p(x)idS(x)
dx

2

= 8

Z

dx

1 4p(x)

dp(x) dx

!2

+ p(x) dS(x) dx

!2

= 2

Z

dx 1 p(x)

dp(x) dx

!2

+ 8

Z

dxp(x) dS(x) dx

!2

= 2I_{x}+ 8

* dS(x) dx

!2+

,

where S(x) ∈ < is the phase of ψ(x). So we have 2Ix ≤ I, from eq. (99) we have

I = J = 8

¯
h^{2}

Dµ^{2}^{E}, (55)

so

I_{x} ≤ 4

¯
h^{2}

Dµ^{2}^{E}= 4

¯

h^{2}^{2}_{µ}, (56)

since µ is the fluctuation in momentum. Using the Cramer-Rao inequality we have

e^{2}_{x} 4

¯

h^{2}^{2}_{µ}≥ e^{2}_{x}I_{x} ≥ 1, (57)
and finally

e^{2}_{x}^{2}_{µ}≥ ¯h
2

!2

. (58)

There is a subtle difference between ^{2}_{x} in the Heisenberg uncertainty and
e^{2}_{x} used earlier, the former represents the variance of the position fluctuation
distribution of a particle, and is independent of any measurements; while
the latter is a measure of the quality of a position measurement and the
subsequent position estimate, which depends on the intrinsic properties of
a particle, but is only manifested by actual measurements. This difference

results in different interpretations of the meaning of the Heisenberg uncer- tainty and its Fisher version, the former treats the fluctuations as intrinsic and exists independent of any observation, while the latter inequality arises when a measurement of position is actually made, that is, when a position measurement is made on a particle, its momentum would exhibit a fluc- tuation governed by the uncertainty principle. The latter interpretation is consistent with the EPI principle in that the uncertainty is intrinsic to the phenonmenon, but only by an actual observation can its effects be felt.

### 3.4 Boltzmann Energy Distribution

We will derive the Boltzmann energy distribution law for a perfect gas in equilibrium. The gas is composed of M identical molecules within a con- tainer, all collisions with other molecules and container walls are assumed to be elastic. The gas has temperature T .

We again start the derivation on a covariant basis and choose the fisher coordinates to be

x0 ≡ ixE, xµ≡ cµ ≡ (cµ1, cµ2, cµ3). (59)
We take the non-relativistic approximation and treat the energy fluctuation
x_{E} and the momentum fluctuations x_{µ} separately. We are only interested in
deriving the law on energy, hence the subscript on x_{E} is dropped and the
measured value of the energy is

E = θ_{E}+ x, E_{0} ≤ E ≤ ∞, (60)

where θ_{E} is the ideal value of the energy.

Thus we have

I(E) = −4

Z

dx^{X}

n

dq_{n}(x)
dx

!2

, (61)

where the probability amplitudes q_{n}(x) relate to the probability distribution
function by

p(x) = 1 N

X

n

q^{2}_{n}(x). (62)

The negativity of I(E) is due to the use of imaginary coordinate for energy, which is justified later. The goal of this analysis is then to solve the two EPI principles

I(E) − J (E) = extrem. and I(E) = κJ (E), (63) the extremization of physical information and the zero-condition respectively.

We find the bound information functional J [q] by assuming that both EPI
principle yields the same solution q_{n}(x)’s. The more general form J [q, x] is
not needed since the extra dependence on energy x explicitly only yields
non-equilibrium solutions, and will be discarded after further derivations.

According to axiom 2, the existence of information densities, we can rep- resent J [q] as

J [q] = 4

Z

dx^{X}

n

J_{n}(q_{n}(x)), (64)

hence we have

K ≡ I − J = −4

Z

dx^{X}

n

(q^{02}_{n}(x) + J_{n}(q_{n}(x))), q_{n}^{0}(x) = dqn(x)

dx , (65) and the extremum principle results in the Euler-Lagrange equation of the integrand L

d dx

∂L

∂q_{n}^{0}(x)

!

= ∂L

∂q_{n}(x), (66)

the solution of which is
d^{2}q_{n}

dx^{2} = 1
2

dJ_{n}

dq_{n}, n = 1, . . . , N. (67)
For the information efficiency we first change the form of I by noting that

Z

dx dq_{n}(x)
dx

!2

= dq_{n}(x)
dx q_{n}(x)

#∞

E0−θ_{E}

−

Z

dxq_{n}(x)d^{2}q_{n}(x)

dx^{2} , (68)
the first term of the result is zero since we assume that the probability of the
energy to be infinity or E_{0} to be zero. So we have

I = 4

Z

dx^{X}

n

q_{n}(x)d^{2}q_{n}(x)

dx^{2} , (69)

by the energy efficiency zero-condition we have I − κJ = 4

Z

dx^{X}

n

q_{n}(x)d^{2}q_{n}(x)

dx^{2} − κJ_{n}(q_{n}(x))

!

= 0. (70)

By axiom 3, the efficiency on the microlevel we have
q_{n}(x)d^{2}q_{n}(x)

dx^{2} − κJ_{n}(q_{n}(x)) = 0, n = 1, . . . , N. (71)

Combining eqs. (67) and (71) we find that the q_{n}’s obey
κJ_{n}

q_{n} = 1
2

dJ_{n}

dq_{n}, (72)

or 2κ

q_{n} = 1
2

d

dq_{n} ln J_{n}. (73)

Intgrating both sides by qn we have

J_{n}(q_{n}) = A_{n}q^{2κ}_{n} , A_{n}≥ 0. (74)
Using this equation in either eq. (67) or eq. (71) yields

d^{2}q_{n}(x)

dx^{2} = α^{2}_{n}q^{2κ−1}_{n} (x), α^{2}_{n}≡ κA_{n} ≥ 0. (75)
For the invariance principle we use the normalization of p(E), which is
weak in the sense that any p.d.f. is normalized. Hence for this phenomenon
we have minimum prior information, that is, maximum ignorance about the
independent variable in question. By classical assumptions, we set κ = 1,
which represents maximum information transfer from the bound information
of the phenomenon to the fisher information we gain as a result of observation.

So we have

d^{2}q_{n}(x)

dx^{2} = α^{2}_{n}q_{n}(x), (76)

the general solution of which is

q_{n}(x) = B_{n}e^{−α}^{n}^{x}+ C_{n}e^{α}^{n}^{x}, α_{n} ≥ 0. (77)
Since x is bounded below (by E_{0}− θ_{E}) yet unbounded above, the C_{n}’s must
vanish for p(E) to be normalizable. The solution now becomes

qn(x) = Bne^{−α}^{n}^{x}. (78)

In retrospective if the coordinate of energy x_{0} ≡ ix_{E} is taken to be real,
then the right hand side of eq. (75) would be negative, and we would have
obtained the general solution

q_{n}(x) = B_{n}e^{−iα}^{n}^{x}+ C_{n}e^{iα}^{n}^{x}, (79)
which is sinusoidal. Which then causes p(E) to be sinusoidal too, and cannot
be normalized.

From the form of the solution the N for this problem need only be one, so we have

q(x) = Be^{−αx}, (80)

and

p(x) = B^{2}e^{−2αx}. (81)

Using the change of variable x = E − θ_{E} we have

p(E) = Ce^{−2αE}, C ≡ B^{2}e^{2αθ}^{E}. (82)
We find the value of constants C and α by normalization and in terms of the
expectation value of E:

Z ∞ E0

p(E)dE = 1, (83)

hEi =

Z ∞ E0

Ep(E)dE. (84)

We have

p(E) = 1

hEi − E_{0}e^{−}^{hEi−E0}^{E−E0} , E ≥ E_{0}, (85)
and p(E) = 0 for other values of E. Shifting the origin of E by a constant
does not change the physical law, so we subtract E0 from E and get

p(E) = hEi^{−1}e^{−}^{hEi}^{E} , E ≥ 0. (86)
The energy of a perfect gas in equilibrium with three degrees of freedom
per molecule is

hE_{t}i = 3M kT

2 , (87)

so

hEi = hE_{t}i

M = 3kT

2 , (88)

and the energy distribution is

p(E) = 2

3kTe^{−}^{3kT}^{2E} . (89)

### 3.5 Newton’s Law of Motion

Lastly we present a mock derivation of Newton’s law of motion using a pseudo EPI procedure. We assume that the energy of a particle has two forms, kinetic and conservative potential, and the total energy is constant. We

define the position perturbation q(t) in terms of a function of time, then the constant energy requirement becomes

E = 1

2m dq(t) dt

!2

− V (q(t)) = const. (90) The fisher information for this phenomenon is

I = 4

Z

dt dq(t) dt

!2

= 8 m

Z

dt(E + V (q(t))) ≡ J, (91) and κ = 1 here. We have perfect efficiency because time can be measured to infinite accuracy in the non-relativistic picture.

We extremize

K = I − J = 4

Z

dt

dq(t) dt

!2

− 2E m − 2

mV (q(t))

(92)

using the Euler-Lagrange equation and get the solution
md^{2}q(t)

dt^{2} = −dV (q(t))

dq(t) , (93)

which is Newton’s law of motion.

### 4 The Geometrical Representation of Physi- cal Phenomenon

According to the view of Italian physicist E. R. Caianiello, uncertainty is in- herent in all branches of science, he obtained a geometrical representation of physics, especially quantum physics, using the theories and methods of infor- mation geometry. In his formulation [2], the quantum physical uncertainty appears as a “curvature” in relativistic phase space. He also tries to combine such representation (quantum geometry) to theories of entropy and informa- tion, so as to find a theoretical foundation for such representations. Like the principle of EPI, his goal is to describe physical phenomenon from the viewpoint of information. But the theories of information geometry, or the geometrization of information theory used for its foundation are more general than the Fisher information, and much more difficult to comprehend.

### 4.1 Information Geometry

Information geometry is a specialization of differential geometry that deals with the geometrical structure of probability distributions. Under this formu- lation, probability distributions are treated as points on a manifold, and the Fisher information is the distance between different probability distributions on these manifolds.

According to Caianiello [7], the geometrical representation of a model (or
theory) requires essentially the choice of a metric G and a connection Γ_{µ} and
the identification of a reference frame; these depend upon the “universe”

one wishes to model. Below we give a short introduction to these concepts of differential geometry and apply them to the special case of information geometry (see [8] for an introduction).

4.1.1 Manifolds

An N -dimensional manifold MN is a “curved” space embedded in an M -
dimensional affine space E_{M}, where M > N . For instance, a curved surface
or the surface of a sphere in 3-dimensions are both 2-dimensional manifolds.

A point on an N -dimensional manifold can be specified with N parameters
(or coordinates) x ≡ {x^{1}, . . . , x^{N}}, so there is a mapping from each point of
the manifold to a point on an affine space E_{N}.

Because probability distributions (probability amplitude and wave func-
tions) are used in quantum physics, Caianiello used manifolds with para-
metric distributions as points, and extend the definitions so obtained to de-
scriptions of quantum physics. The form of the most general parametric
distribution is ρ_{0}(x|z) where z ≡ (z_{1}, z_{2}, . . . , z_{m}) ∈ R^{m} is the random output
and x ≡ (x^{1}, x^{2}, . . . , x^{n}) ∈ R^{n} represent the parameters (the subscripts and
superscripts are all indices). So x is a point in EN, but its also a point in the
manifold M_{N} formed by the probability distributions ρ_{0}(x|z). The parame-
ters x provide a coordinate system for M_{N}, each point of which represents
a different probability distribution.

We will use Gaussian distributions to illustrate these concepts. The mean µ and standard deviation σ determines a Gaussian distribution

ρ(x|z) = 1

√2πσ exp

"

−(z − µ)^{2}
2σ^{2}

#

, (94)

where z is the single random output, and the parameter x ≡ (x^{1} = f^{1}(µ, σ), x^{2} =
f^{2}(µ, σ)) is determined by the mean and standard deviation. Hence Gaussian
distributions form a 2-dimensional manifold M_{2}.

4.1.2 Metric

The metric of a manifold M_{N} with N dimensions is a N × N symmetric
matrix G(x), with real elements gij(x) defined on every point of the manifold,
so the metric is a tensor field of some sort. The metric G(x) serves as
a standard of length or distance measure on a manifold, hence its name.

We define the infinitesimal distance between two points x and x + dx ≡
(x^{1}+ dx^{1}, . . . , x^{N} + dx^{N}) by the metric G(x) as

ds^{2}(x) = g_{ij}(x)dx^{i}dx^{j} (95)
(note the use of Einstein’s convention of summation). In affine spaces the
distance is defined as

ds^{2} =^{X}

i

(dx^{i})^{2}, (96)

hence its metric is the identity matrix I at all points of the space.

The manifold MN is thus a generalized space that has a “metric” G defined for the specification of distance between its points. From eq. (95) we can see that the coordinate axes may not be orthogonal, and that different coordinates may not have the same “weight” on the value of “distance”.

But more importantly, because the metric defined on different points are not in general equal, the space of a manifold is “deformed” compared to affine spaces, so the affine spaces spaned by the coordinate axes of two different points on the manifold may have no intersection.

We will now use the entropy to define a metric for the manifolds formed by probability distributions. The Shannon entropy of a p.d.f. is defined as

H(ρ(x|z)) = −

Z

ρ(x|z) ln ρ(x|z)dz, (97)

with continuous p.d.f. this may diverge, that is, it may not have meaning
under certain contexts, hence we use the cross entropy instead. The cross en-
tropy (or Kullback-Leibler information) of two Gaussian distributions ρ(x_{1}|z)
and ρ(x_{2}|z) is

Hc(x1, x2) =

Z

ρ(x1|z) ln ρ(x_{1}|z)

ρ(x_{2}|z)dz. (98)

The J -divergence of the two distributions ρ(x1|z) and ρ(x2|z) is defined as
J (x_{1}|x_{2}) = H_{c}(x_{1}, x_{2}) + H_{c}(x_{2}, x_{1}). (99)
We interpret the J -divergence as a sort of “distance” between these two
distributions. If we set x_{1} = x, x_{2} = x + dx ≡ (x^{1}+ dx^{1}, x^{2}+ dx^{2}), then the
infinitesimal distance can be defined as the J -divergence

J (x|x + dx) = ds^{2} = g_{hk}(x)dx^{h}dx^{k}, (100)

where, with the notation ∂_{h} = _{∂x}^{∂}h, we have the metric
ghk(x) = gkh(x) =

Z

ρ(x|z)∂hln ρ(x|z)∂kln ρ(x|z)dz, (101)
which forms a symmetric matrix G. Because the elements g_{hk}(x) are in the
form of the Fisher information (and yields the familiar form when h = k), the
metric G is a Fisher information matrix, we refer to it as the Fisher metric
in this context.

If we express the two parameters of the Gaussian distribution as
x^{1} = µ

σ^{2}, x^{2} = 1

2σ^{2}, (102)

then the Gaussian distribution becomes ρ(x|z) = exp

"

zx^{1}− z^{2}x^{2}− 1
4

(x^{1})^{2}
x^{2} +1

2ln x^{2}−1
2ln π

#

, (103) and the Fisher metric

G ≡ {g_{hk}} = σ^{2} 2µσ^{2}
2µσ^{2} 4µ^{2}σ^{2}+ 2σ^{4}

!

. (104)

4.1.3 Coordinate Transformations

A vector field A(x) in the affine space EN can be specified by its N com-
ponents A^{i}(x), which is the components on some coordinate system x =
{x^{1}, . . . , x^{N}}. If there is another coordinate system x^{0} = {x^{01}, . . . , x^{0N}} for
EN, then the components of A(x) on the new coordinate system can be ex-
pressed as

A^{0j}(x^{0}) = ∂x^{0j}

∂x^{i} A^{i}(x). (105)

We will now extend these concepts to fields defined on manifolds.

In the discussion below, we will be using local coordinate systems that are limited to a certain neighborhood of the manifold. Since manifolds are

“deformed”, there may not be a global coordinate system, but a small enough
locality of the manifold may be considered “flat”. Hence we will define two co-
ordinate systems in the neighborhood of M ∈ M_{N}, and let the coordinates of
M in the two coordinate systems be x = {x^{1}, . . . , x^{N}} and x^{0} = {x^{01}, . . . , x^{0N}}
respectively.

A scalar field F (x) on a manifold maps its points to a real number. Its variance under coordinate transformation is

F^{0}(x^{0}) = F (x), (106)

which is called invariance, since the value of a scalar field on the same point does not change under coordinate transformations.

A contravariant vector field A(x) on a manifold is the extension of normal
vector fields, the variance of its components A^{i}(x) under coordinate trans-
formation is

A^{0j}(x^{0}) = ∂x^{0j}

∂x^{i} A^{i}(x), (107)

which is the same as a normal vector field and is called contravariance.

A covariant vector field A(x) on a manifold with components A_{i}(x) is
defined by

A_{i}(x) = g_{ij}(x)A^{j}(x) (108)
with respect to a contravariant field and the metric. We can also consider
A(x) as a vector field with both contravariant and covariant components.

Under coordinate transformations, the covariant components change as
A^{0}_{j}(x^{0}) = ∂x^{i}

∂x^{0j}A_{i}(x), (109)

which is called covariance.

We can further generalize these concepts to a tensor field T (x) defined as
the direct product of p contravariant fields and q covariant fields. Its N^{p}×N^{q}
components T_{j}^{i}_{1}^{1}_{j}^{i}_{2}^{2}_{...j}^{...i}^{p}_{q}(x) change under coordinate transformation as

T_{j}^{0i}_{1}^{1}_{...j}^{...i}_{q}^{p}(x^{0}) = ∂x^{0i}^{1}

∂x^{k}^{1} · · ·∂x^{0i}^{p}

∂x^{k}^{p}

∂x^{l}^{1}

∂x^{0j}^{1} · · · ∂x^{l}^{q}

∂x^{0j}^{q}T_{l}^{k}_{1}^{1}_{...l}^{...k}_{q}^{p}(x), (110)
and we say that T (x) is p times contravariant and q times covariant. A scalar
field F (x) can then be considered as a tensor field 0 times contravariant and
0 times covariant.

If the metric of a manifold is 2 times covariant, that is,
g_{rs}^{0} (x^{0}) = ∂x^{i}

∂x^{0r}

∂x^{j}

∂x^{0s}g_{ij}(x), (111)

then we call this manifold a Riemannian manifold.

4.1.4 Connection

In affine space E_{N} the derivative is ∇ ≡ (∂_{1}, ∂_{2}, . . . , ∂_{N}), with i-th component

∇i = ∂i = _{∂x}^{∂}i. In manifolds, we define the i-th component of the covari-
ant derivative ∇_{i} on scalar fields and the components of contravariant and
covariant vector fields as

∇_{i}F (x) = ∂_{i}F (x), (112)

∇_{i}A^{k}(x) = ∂_{i}A^{k}(x) + Γ^{k}_{ji}(x)A^{j}(x), (113)
and

∇iAk(x) = ∂iAk(x) − Γ^{j}_{ki}(x)Aj(x) (114)
respectively. The Γ^{i}_{jk}(x)’s are the components of the affine connection field,
and are called affine connection coefficients, Christoffel symbols, or Γ-symbols.

They “connect”, or provides the relationship between coordinate systems of
different points on manifolds, hence is needed to describe derivatives on man-
ifolds. The name covariant derivative comes from the fact that the fields after
operation are increased in covariance by 1, so that ∇_{i}F (x) are components
of a covariant field, ∇_{i}A^{k}(x) components of a tensor field 1 time contravari-
ant and 1 time covariant, and ∇_{i}A_{k}(x) components of a 2 times covariant
tensor field. Note that the connection Γ(x) is not a tensor field in general,
only when the coordinate transformation is linear do the components of Γ(x)
change like the components of a tensor field 1 time contravariant and 2 times
covariant.

The covariant derivatives of general tensor fields can be found by ex-
tending eqs. (113) and (114). For example, the k-th covariant derivative of
Riemannian metric component g_{ij}(x) is

∇kgij(x) = ∂kgij(x) − Γ^{l}_{ik}(x)glj(x) − Γ^{l}_{jk}(x)gil(x). (115)
We define Γ_{jik} = g_{jl}Γ^{l}_{ik} (this is needed since Γ is not a tensor in general), the
equation above becomes

∇_{k}g_{ij}(x) = ∂_{k}g_{ij}(x) − Γ_{jik}(x) − Γ_{ijk}(x). (116)
If Γ_{ijk}= Γ_{ikj}, then the connection is called a Riemannian connection. Using
eq. (116), we can generate two more equations by permutation of the indices
i, j, k, add two of these equations and subtract by the third, with the assump-
tion of Riemannian connection, we can express the connection coefficients by
the metric components:

Γ_{ijk}(x) = 1

2(∂_{k}g_{ji}(x) + ∂_{j}g_{ik}(x) − ∂_{i}g_{kj}(x)) . (117)
We now define the connection for probability distribution manifolds. The
most general connection compatible with information geometry is

Γ^{(α)}_{ij,k}(x) = Γ_{kij}(x) − α
2

Z

∂_{i}ln ρ(x|z)∂_{j}ln ρ(x|z)∂_{k}ln ρ(x|z)dz. (118)
With the metric eq. (101) we have

Γ_{kij} =

Z

ρ

∂_{ij}^{2} ln ρ∂_{k}ln ρ + 1

2∂_{i}ln ρ∂_{j}ln ρ∂_{k}ln ρ

dz (119)

4.1.5 Curvature Tensor

The curvature tensor (or Riemannian tensor) R(x) of a manifold is a tensor field 1 time contravariant and 3 times covariant, the definition of its compo- nents is

R^{j}_{ikl}(x) = ∂_{k}Γ^{j}_{il}(x) − ∂_{l}Γ^{j}_{ik}(x) + Γ^{j}_{hk}(x)Γ^{h}_{il}(x) − Γ^{j}_{hl}(x)Γ^{h}_{ik}(x). (120)
On a Riemannian manifold, because g_{ij} = g_{ji} and Γ^{i}_{jk} = Γ^{i}_{kj}, we can define

R_{iklm} = g_{in}R^{n}_{klm} = 1

2(∂_{k}∂_{l}g_{im}+ ∂_{i}∂_{m}g_{kl}− ∂_{k}∂_{m}g_{il}− ∂_{i}∂_{l}g_{km})

+g_{np}(Γ^{n}_{kl}Γ^{p}_{im}− Γ^{n}_{km}Γ^{p}_{il}) , (121)
which is 4 times covariant, the fully covariant form of the Riemannian tensor.

By the Riemannian conditions we can further state the relations

R_{iklm} = −R_{kilm} (122)

R_{iklm} = −R_{ikml} (123)

R_{iklm} = R_{lmik} (124)

R_{iilm}= R_{ikll} = 0 (125)

We use Γ^{(α)}_{ij,k} as the connection for the Gaussian manifold, using the eqs.

(104), (118) and (119) for the metric and connection, we can obtain the curvature tensor component for the Gaussian manifold

R^{(α)}_{1212} = (1 − α^{2})σ^{6}. (126)
When α = 0, the connection becomes

Γ^{(0)}_{ijk} = Γ_{kij} (127)

and the curvature tensor becomes

R^{(0)}_{1212} = σ^{6}. (128)

The previous equation suggests that the curvature tensor expresses our lack of information since it vanishes only when σ = 0.

### 4.2 Quantum Geometry

4.2.1 The Metric and Connection

Because wave functions (complex probability amplitudes) are used instead of probability functions in quantum mechanics, we need to generalize the metric defined in information geometry to use wave functions.

We begin by changing the expression of the metric to
g_{hk}(x) = 4

Z

∂_{h}^{q}ρ(x|z)∂_{k}^{q}ρ(x|z)dz, (129)
which is in terms of probability amplitude ^{q}ρ(x|z). Similarly, the infinitesi-
mal distance (J -divergence) can be expressed as

ds^{2} = 4

Z

(∂_{h}√

ρdx^{h})(∂_{k}√

ρdx^{k})dz = 4

Z

(d√

ρ)^{2}dz. (130)
If we change the variable z to a discrete variable, with φ_{α}(x) =^{q}ρ(x|z = α),
we can then use the inner product form to express the metric

g_{hk}(x) = 4^{X}

α

∂_{h}φ_{α}(x)∂_{k}φ_{α}(x). (131)
The desired generalization of the information metric is now obvious: it is
(neglecting the irrelevant numerical factor)

g_{hk}(x) = g_{kh}(x) =

Z

ψ_{h}(x|z)ψ_{k}(x|z)dz. (132)
If ψ_{h}(x|z) = ∂_{h}φ(x|z) then eq. (132) yields the general holonomic case, if
also ∂_{h}φ = ∂_{h}φ, we return back to the standard information metric (101).

In quantum geometry we wish to use Riemannian manifolds, hence we
must have α = 0 in the connection Γ^{(α)}_{ij,k}, since only then would the covariant
derivative of the metric vanish. The Riemannian property of the metric can
be considered to correspond to the Hermitian property of density operators
in quantum physics.

So we set α = 0 in eq. (118), and by eq. (119) and the following relation (which is easily proven)

k

Y

i=1

∂_{i}ln ρ = 2^{k}ρ^{−}^{k}^{2}

k

Y

i=1

∂_{i}√

ρ, (133)

we have

Γ^{(0)}_{ij,k} = 4

Z

∂_{ij}^{2}√
ρ∂_{k}√

ρdz, (134)

which is also expressed with probability amplitudes.

4.2.2 The Uncertainty Relation

Since the metric is in the form of Fisher information, we can use the Cramer- Rao inequality to derive the uncertainty relation.

In the one-dimensional case (with manifold M_{1}), let ˆx be an unbiased
estimator for the single parameter x. With variance (∆x)^{2} ≡ h(ˆx − x)^{2}i, the
Cramer-Rao inequality is

(∆x)^{2}· g_{11}≥ 1, (135)

where the only component of the metric g11 is the Fisher information. Using
eq. (129) with x ≡ x_{1} we have

(∆x)^{2}· 4

Z

dz ∂√ ρ

∂x

!2

≥ 1, (136)

and with quantum physical identities√

ρ = φ, p_{x}= −i¯h_{∂x}^{∂} we get the uncer-
tainty relation

∆x · ∆p_{x} ≥ ¯h

2. (137)

Thus the uncertainty relation can be derived from Cramer-Rao inequality, and hence is not exclusive to quantum physics.

4.2.3 The Sign of Infinitesimal Distance ds^{2}

From the preceding derivation, the “infinitesimal distance” ds^{2} and “infinites-
imal cross-entropy” dH_{c} can be considered equivalent (to within appropiate
identification, see eqs. (99) and (100))

ds^{2} ≡ dH_{c} = g_{hk}dx^{h}dx^{k} =

Z

ψ_{h}dx^{h}ψ_{k}dx^{k}dz. (138)
Under special relativity a particle in space-time is described by the four
parameters (four-vector) x = {x^{1} = ct, x^{2} = ix, x^{3} = iy, x^{4} = iz}, and the
metric G is identity, we have

ds^{2} = c^{2}dt^{2}− dx^{2}− dy^{2}− dz^{2} ≥ 0, (139)
which is the requirement that the speed of particles not exceed c. From this
inequality we also have

dH_{c} ≥ 0. (140)

Since the previous equation is derived from a basic assumption of relativity, it can be considered as a basic physical principle, similar to other basic principles such as the second law of thermodynamics or the I-theorem of EPI.