III, Sampling, and Interpolation

(1)

Chapter 5

III, Sampling, and Interpolation

5.1 X-Ray Diffraction: Through a Glass Darkly

¹

Diffraction is not only an interesting phenomenon to look at, it is an important experimental tool, the tool being diffraction gratings. A diffraction grating is an aperture plane with a large number of parallel slits, closely spaced. See, for example

http://hyperphysics.phy-astr.gsu.edu/hbase/phyopt/grating.html.

Diffraction gratings are used to separate light of different wavelengths, and to measure wavelengths. I want to look briefly at this latter application.

X-rays were discovered by William Roentgen in 1895. It was not known whether they were particles or waves, but the wave hypothesis put their wavelength at about 10⁻⁸ cm. Using diffraction gratings was out of the question for experiments on X-rays because diffraction effects are only seen if the width of the slits is comparable to the wavelength. It was possible to build such gratings for experiments on visible light, where the wavelengths are between 400 and 700 nanometers (10⁻⁷ cm), but that extra order of magnitude to get to X-rays couldn’t be done.

A related set of mysteries had to do with the structure of crystals. It was thought that the macroscopic structure of crystals could be explained by a periodic arrangement of atoms, but there was no way to test this. In 1912 Max von Laue proposed that the purported periodic structure of crystals could be used to diffract X-rays, just as gratings diffracted visible light. He thus had three hypotheses:

1. X-rays are waves.

2. Crystals are periodic.

3. The spacing between atoms is of the order 10⁻⁸ cm.

Friedrich and Kniping carried out experiments that confirmed von Laue’s hypotheses and the subject of X-ray crystallography was born.

But you need to know some math.

11 Corinthians 13: When I was a child, I spake as a child, I understood as a child, I thought as a child: but when I became a man, I put away childish things. For now we see through a glass, darkly, but then face to face: now I know in part; but then shall I know even as also I am known.

(2)

Electron density distribution An important quantity to consider in crystallography is how the elec- trons are distributed among the atoms in the crystal. This is usually referred to as the electron density distribution of the crystal. We want to see how we might represent this as a function, and consider what happens to the function in the course of an X-ray diffraction experiment.

Let’s take the one-dimensional case as an illustration; we’ll look at the (more realistic) higher dimensional case later in the course. We view a one-dimensional crystal as an evenly spaced collection of atoms along a line. In fact, for purposes of approximation, we suppose that an infinite number of them are strung out along a line. If we describe the electron density distribution of a single atom by a function ρ(x) then the electron density distribution of the crystal with spacing p is the periodic function

ρ_p(x) = X∞ k=−∞

ρ(x − kp) .

As our discussion of diffraction might indicate, the Fourier transform of ρ_p(x) is proportional to the

“scattered amplitude” of X-rays diffracted by the crystal. Thus we want to write ρp(x) in a form that’s amenable to taking the Fourier transform. (Incidentally, it’s not unreasonable to suppose that ρ is rapidly decreasing — the electron density of a single atom dies off as we move away from the atom.)

As we’ll see, it’s convenient to write the periodized density as a convolution with a sum of shifted δ’s:

ρp(x) = X∞ k=−∞

ρ(x − pk) = X∞ k=−∞

δ(x − kp) ∗ ρ(x) =

X^∞

k=−∞

δ(x − kp)

∗ ρ(x) .

Now introduce

IIIp(x) = X∞ k=−∞

δ(x − kp) , so that, simply,

ρ_p= III_p∗ ρ .

III_p is the star of the show. Bracewell calls it the “shah function”, after the Cyrillic letter, and this has caught on. It’s also referred to as the Dirac comb (with spacing p).

Using the convolution theorem, we have

F ρ_p= F ρ · F III_p. What is F III_p? That’s a really interesting question.

5.2 The III Distribution

We want to develop the properties of III_p, particularly its Fourier transform. In fact, we met this distribution earlier, in Chapter 1. Rather, we met its Fourier transform — it’s the continuous buzz signal, as we’ll discuss further, below.

As a “standard” we take the spacing p to be 1, so we sum over the integer points and define III(x) =

X∞ k=−∞

δ(x − k) or III = X∞ k=−∞

δk.

(3)

5.2 The III Distribution 213

As above, for a series of δ’s spaced p apart we write III_p(x) =

X∞ k=−∞

δ(x − kp) or III_p = X∞ k=−∞

δ_kp.

I’ll mostly write δ’s “at points” in this section. It seems the more natural thing to do.

To see that the series for IIIp makes sense as a distribution, let ϕ be a test function; then

hIIIp, ϕi =

X^∞

k=−∞

δkp, ϕ

= X∞ k=−∞

hδkp, ϕi = X∞ k=−∞

ϕ(kp) .

This sum converges because of the rapid decrease of ϕ at ±∞.

There are two facets to the III’s versatility: periodizing and sampling. We’ll consider each in turn.

5.2.1 Periodizing with III

Our first application of III was as above, to write the periodization of the electron density function ρ of a single atom in a crystal as a convolution. The purpose was to periodize ρ to reflect the physical structure of the crystal. This is a general procedure. The III function furnishes a handy way of generating and working with periodic functions and distributions. Take that as an aphorism.

If f is a function or distribution for which convolution with III makes sense, then (f ∗ III_p)(t) =

X∞ k=−∞

f (t − pk)

is periodic with period p. Note that

f (at + b) ∗ III_p(t) = X∞ k=−∞

f (at + b − apk)

also has period p, and this can just as well be written in terms of a shifted III:

X∞ k=−∞

f (at + b − apk) = f (at) ∗ IIIp(t + ^b a) .

Convolving with IIIp now emerges as the basic, familiar way to produce a periodic function. However, some care must be taken; convolving with III to periodize doesn’t shift the graph and link them up, it shifts the graph and adds them up.

In many cases the series

X∞ k=−∞

f (t − pk)

will converge in some reasonable sense, often at least to define a periodic distribution (see Section 5.4). A common application is to form f ∗ IIIp when f is zero for |t| ≥ p/2. In this case the convolution exists and we naturally say that f ∗ III_p is the p-periodic extension of f .

(4)

I want to look at this operation in a little more detail, one reason being that it will make the discussion of sampling and aliasing, soon to come, much cleaner and easier. Recall the scaled rect function

Πp(x) =

(1 |x| < p/2 0 |x| ≥ p/2 If f is zero when |t| ≥ p/2 (note the ≥ not >) then

Π_pf = f and

f = Π_p(f ∗ III_p) .

In fact these two conditions are equivalent. That should be clear if you have the geometric picture in mind.

For example, shown below are the graphs of a function f (x) that is zero outside of |t| < p/2 and of three cycles of its periodization; that’s f (x + p) + f (x) + f (x − p) = f (x) ∗P1

k=−1δ(x − kp).

(5)

5.2 The III Distribution 215

Here are the algebraic details that go from the picture to the formulas. If Πpf = f then Πp(t)(f ∗ IIIp)(t) = Πp(t)((Πpf ) ∗ IIIp)(t)

= Πp(t) X∞ k=−∞

Πp(t − kp)f (t − kp)

= X∞ k=−∞

Πp(t)Πp(t − kp)f (t − kp) = Πp(t)f (t) = f (t)

since

Πp(t)Πp(t − kp) =

(Π_p(t) k = 0

0 k 6= 0

On the other hand, if f = Πp(f ∗ IIIp) then

Πpf = Πp(Πp(f ∗ IIIp)) = Π²_p(f ∗ IIIp) = Πp(f ∗ IIIp) = f .

If we had defined Π_p differently at ±p/2 (in other cultures either Π_p(±p/2) = 1 or Π_p(±p/2) = 1/2 are typical) then the calculations and results above would hold except for the translates of ±p/2, a discrete set of points. Such an exceptional set generally poses no problems in applications.

This all seems pretty innocent, but cutting off a distribution by Πp(a discontinuous function) is not part of the theory. We only defined the product of a distribution and a smooth function. In general we’ll proceed as though all is well, though careful justifications can take some work (which we won’t do). Be not afraid.

5.2.2 Sampling with III

The flip side of periodizing with III is sampling with III. Here’s what this means. Suppose we multiply III by a function f . Then as a distribution

f (x)III(x) = X∞ k=−∞

f (x)δ(x − k) = X∞ k=−∞

f (k)δ(x − k) .

Multiplying III by f “samples” f at the integer points, in the sense that it “records” the values of f at those points in the sum.

There’s nothing sacred about sampling at the integers of course. Sampling using III_p means

f (x)III_p(x) = X∞ k=−∞

f (kp)δ(x − kp) ,

so f is sampled at the points kp. Scaled or not, the thing to keep in mind about the shah function is that it takes evenly spaced samples of a function f .

To summarize:

• Convolving a function with III (with IIIp) produces a periodic function with period 1 (with period p).

• Multiplying a function by III (by IIIp) samples the function at the integer points (at the points pk).

(6)

5.2.3 Scaling identity for IIIp

There’s a simple scaling identity for III_p that comes up often enough in formulas and derivations to make it worth pointing out. We’ve defined

IIIp(x) = X∞ k=−∞

δ(x − kp) ,

scaling the spacing of the impulses by p, but it’s also natural to consider III(px) =

X∞ k=−∞

δ(px − k) .

Now recall the scaling property of δ; for p > 0,

δ(px) = 1 pδ(x) . Plugging this into the formula for III(px) gives

III(px) = X∞ k=−∞

δ(px − k)

= X∞ k=−∞

δ

p(x − ^k p)

= X∞ k=−∞

1 pδ

x −^k

p

= 1

pIII_1/p(x) To give it its own display:

III(px) = ¹

pIII_1/p(x)

(It would be a good exercise to derive this in a variable-free environment, using the delay operator τp and the scaling operator σp.) By the same token,

III_p(x) = ¹ pIII

1 px

.

5.3 The Fourier Transform of III, or, The deepest fact about the inte- gers is well known to every electrical engineer and spectroscopist

The most interesting thing about III is what happens when we take its Fourier transform. If we start with the definition

III(x) = X∞ k=−∞

δ(x − k) .

and apply what we know about the Fourier transform of δ (it’s 1) plus the shift theorem, we obtain F III(s) =

X∞ k=−∞

e^−2πiks.

(7)

5.3 The Fourier Transform of III 217

Since we’re summing over all positive and negative k we can write this as F III(s) =

X∞ k=−∞

e^2πiks.

which looks more like a Fourier series. We did see this when we introduced the buzz signal. It sounds like a signal with every harmonic present in equal amounts. It sounds terrible.

The expression

X∞ k=−∞

e^2πiks

actually does make sense as a distribution, as we’ll see, but it’s not yet a helpful expression. Instead, to find the Fourier transform of III we go back to the definition in terms of tempered distributions. If ϕ is a Schwartz function then

hF III, ϕi = hIII, F ϕi . On the right hand side,

hIII, F ϕi =

X^∞

k=−∞

δk, F ϕ

= X∞ k=−∞

hδk, F ϕi = X∞ k=−∞

F ϕ(k) And now we have something absolutely remarkable.

The Poisson summation formula: Let ϕ be a Schwartz function. Then X∞

k=−∞

F ϕ(k) = X∞ k=−∞

ϕ(k)

This result actually holds for other classes of functions (the Schwartz class was certainly not known to Poisson!) but that’s not important for us.

The Poisson summation formula is the deepest fact known about the integers. It’s known to every electrical engineer and every spectroscopist because of what it says about the Fourier transform of F III. We’ll settle that now and come back to the derivation of the formula afterward.

We pick up our calculation of F III where we left off:

hF III, ϕi = X∞ k=−∞

F ϕ(k)

= X∞ k=−∞

ϕ(k) (because of the Poisson summation formula)

= X∞ k=−∞

hδ_k, ϕi (definition of δ_k)

=

X^∞

k=−∞

δ_k, ϕ

= hIII, ϕi .

(8)

Comparing where we started to where we ended up, we conclude that F III = III .

Outstanding. The III distribution is its own Fourier transform. (See also Section 5.10.)

Proof of the Poisson Summation Formula The proof of the Poisson summation formula is an excellent example of the power of having two different representations of the same thing, an idea certainly at the heart of Fourier analysis. Remember the maxim: If you can evaluate an expression in two different ways it’s likely you’ve done something significant.

Given a test function ϕ(t) we periodize to Φ(t) of period 1:

Φ(t) = (ϕ ∗ III)(t) = X∞ k=−∞

ϕ(t − k) .

As a periodic function, Φ has a Fourier series:

Φ(t) = X∞ m=−∞

Φ(m)eˆ ^2πimt.

Let’s find the Fourier coefficients of Φ(t).

Φ(m) =ˆ Z 1

0

e^−2πimtΦ(t) dt

= Z 1

0

X∞ k=−∞

e^−2πimtϕ(t − k) dt = X∞ k=−∞

Z 1 0

e^−2πimtϕ(t − k) dt

= X∞ k=−∞

Z −k+1

−k

e^{−2πim(t+k)}ϕ(t) dt

= X∞ k=−∞

Z −k+1

−k

e^−2πimte^−2πimkϕ(t) dt (using e^−2πimk = 1)

= Z ∞

−∞

e^−2πimtϕ(t) dt

= F ϕ(m) . Therefore

Φ(t) = X∞ m=−∞

F ϕ(m)e^2πimt.

(We’ve actually seen this calculation before, in a disguised form; look back to Section 3.5 on the relationship between the solutions of the heat equation on the line and on the circle.)

Since Φ is a smooth function, the Fourier series converges. Now compute Φ(0) two ways, one way from plugging into its definition and the other from plugging into its Fourier series:

Φ(0) = X∞ k=−∞

ϕ(−k) = X∞ k=−∞

ϕ(k)

Φ(0) = X∞ k=−∞

F ϕ(k)e^2πin0= X∞ k=−∞

F ϕ(k) Done.

(9)

5.4 Periodic Distributions and Fourier series 219

The Fourier transform of IIIp From F III = III we can easily deduce the formula for F IIIp. Using the identities

III_p(x) = ¹ pIII

1 px

and III(px) = ¹

pIII_1/p(x) . we have

F III_p(s) = ¹ pF

III

x p

= ¹

ppF III(ps) (stretch theorem)

= III(ps)

= ¹

pIII_1/p(s) 5.3.1 Crystal gazing

Let’s return now to the setup for X-ray diffraction for a one-dimensional crystal. We described the electron density distribution of a single atom by a function ρ(x) and the electron density distribution of the crystal with spacing p as

ρ_p(x) = X∞ k=−∞

ρ(x − kp) = (ρ ∗ III_p)(x) . Then

F ρp(s) = F (ρ ∗ IIIp)(s)

= (F ρ · F IIIp)(s)

= F ρ(s)¹

pIII_1/p(s)

= X∞ k=−∞

1 pF ρ

k p

δ

s −^k

p

Here’s the significance of this. In an X-ray diffraction experiment what you see on the X-ray film is a bunch of spots, corresponding to F ρp. The intensity of each spot is proportional to the magnitude of the Fourier transform of the electron density ρ and the spots are spaced a distance 1/p apart, not p apart. If you were an X-ray crystallographer and didn’t know your Fourier transforms, you might assume that there is a relation of direct proportion between the spacing of the dots on the film and the atoms in the crystal, but it’s a reciprocal relation — kiss your Nobel prize goodbye. Every spectroscopist knows this.

We’ll see a similar relation when we consider higher dimensional Fourier transforms and higher dimensional III-functions. A III-function will be associated with a lattice and the Fourier transform will be a III-function associated with the reciprocal or dual lattice. This phenomenon has turned out to be important in image processing; see, for example, Digital Video Processing by A. M. Tekalp.

5.4 Periodic Distributions and Fourier series

I want to collect a few facts about periodic distributions and Fourier series, and show how we can use III as a convenient tool for “classical” Fourier series.

(10)

Periodicity The notion of periodicity for distributions is invariance under the delay operator τp, i.e., a distribution (or a function, for that matter) is periodic with period p if

τpS = S .

This is the “variable free” definition, since we’re not supposed to write S(x − p) = S(x) or S(x + p) = S(x)

which is the usual way of expressing periodicity. It’s a pleasure to report that III_p is periodic with period p.

You can see that most easily by doing what we’re not supposed to do:

IIIp(x + p) = X∞ k=−∞

δ(x + p − kp) = X∞ k=−∞

δ(x − (k − 1)p) = X∞ k=−∞

δ(x − kp) = IIIp(x).

It’s also easy to give a variable-free demonstration, which amounts to the same thing:

τ_pIII_p = X∞ k=−∞

τ_pδ_kp= X∞ k=−∞

δ_kp+p= X∞ k=−∞

δ_p(k+1)= X∞ k=−∞

δ_kp = III_p.

When we periodize a test function ϕ by forming the convolution, Φ(x) = (ϕ ∗ III_p)(x) ,

it’s natural to view the periodicity of Φ as a consequence of the periodicity of IIIp. By this I mean we can appeal to:

• If S or T is periodic of period p then S ∗ T (when it is defined) is periodic of period p.

Let me show this for functions (something we could have done way back) and I’ll let you establish the general result. Suppose f is periodic of period p. Consider (f ∗ g)(x + p). We have

(f ∗ g)(x + p) = Z ∞

−∞

f (x + p − y)g(y) dy = Z ∞

−∞

f (x − y)g(y) dy = (f ∗ g)(x).

The same argument works if instead g is periodic.

So, on the one hand, convolving with III_p produces a periodic function. On the other hand, suppose Φ is periodic of period p and we cut out one period of it by forming Π_pΦ. We get Φ back, in toto, by forming the convolution with IIIp; that is,

Φ = ϕ ∗ IIIp= (ΠpΦ) ∗ IIIp

(Well, this is almost right. The cut-off Π_pΦ is zero at ±p/2 while Φ(±p/2) certainly may not be zero.

These “exceptions” at the end-points won’t affect the discussion here in any substantive way.²)

The upshot of this is that something is periodic if and only if it is a convolution with III_p. This is a nice point of view. I’ll take this up further in Section 5.10.

2We can either: (a) ignore this problem; (b) jigger the definition of Πp to make it really true, which has other problems; or (c) say that the statement is true as an equality between distributions, and tell ourselves that modifying the functions at a discrete set of points will not affect that equality.

(11)

5.4 Periodic Distributions and Fourier series 221

Fourier series for III Taking the Fourier series of III term by term we arrived at F III =

X∞ k=−∞

e^2πikt,

and if we next use F III = III we would then have III =

X∞ k=−∞

e^2πikt. The series

X∞ k=−∞

e^2πikt does define a distribution, for

X^∞

k=−∞

e^2πikt, ϕ

= Z ∞

−∞

X∞ k=−∞

e^2πiktϕ(t) dt

exists for any test function ϕ because ϕ is rapidly decreasing. There’s a pretty straightforward development of Fourier series for tempered distributions, and while we won’t enter into it, suffice it to say we do indeed have

III = X∞ k=−∞

e^2πikt.

The right hand side really is the Fourier series for III. But, by the way, you can’t prove this without proving the Poisson summation formula and that F III = III, so Fourier series isn’t a shortcut to the latter in this case.

Remember that we saw the finite version of the Fourier series for III back in Fourier series section:

DN(t) = XN n=−N

e^2πint = sin(π(2N + 1)t) sin πt . Here’s the graph for N = 20:

(12)

It’s now really true that

DN → III

as N → ∞, where the convergence is in the sense of distributions.

Fourier transform of a Fourier series When we first started to work with tempered distributions, I said that we would be able to take the Fourier transform of functions that didn’t have one, i.e., functions for which the integral defining the (classical) Fourier transform does not exist. We’ve made good on that promise, including complex exponentials, for which

F e^2πikt/p= δ

s − ^k

p

.

With this we can now find the Fourier transform of a Fourier series. If ϕ(t) =

X∞ k=−∞

c_ke^2πikt/p

then

F ϕ(s) = X∞ k=−∞

c_kF e^2πikt/p= X∞ k=−∞

c_kδ

s − ^k

p

It may well be that the seriesP∞

k=−∞cke^2πikt/p converges to define a tempered distribution — that’s not asking too much³ — even if it doesn’t converge pointwise to ϕ(t). Then it still makes sense to consider its Fourier transform and the formula, above, is OK.

Rederiving Fourier series for a periodic function We can turn this around and rederive the formula for Fourier series as a consequence of our work on Fourier transforms. Suppose Φ is periodic of period p and write, as we know we can,

Φ = ϕ ∗ III_p

where ϕ is one period of Φ, say ϕ = ΠpΦ. Take the Fourier transform of both sides and boldly invoke the convolution theorem:

F Φ = F (ϕ ∗ IIIp) = F ϕ · F IIIp= F ϕ · ¹ pIII_1/p, or, at points,

F Φ(s) = F ϕ(s) ¹ p

X∞ k=−∞

δ

s −^k

p

!

= ¹ p

X∞ k=−∞

F ϕ

k p

δ

s − ^k

p

.

Now boldly take the inverse Fourier transform:

Φ(t) = X∞ k=−∞

1 pF ϕ

k p

e^2πikt/p (the F ϕ

k p

are constants) . But

1 pF ϕ

k p

= ¹ p

Z ∞

−∞

e^{−2πi(k/p)t}ϕ(t) dt

= ¹ p

Z ∞

−∞

e^{−2πi(k/p)t}Π_p(t)Φ(t) dt = ¹ p

Z p/2

−p/2

e^{−2πi(k/p)t}Φ(t) dt ,

3For example, if ϕ is integrable so that the coefficients ck tend to zero. Or even less than that will do, just as long as the coefficients don’t grow too rapidly.

(13)

5.5 Sampling Signals 223

and this is the k-th Fourier coefficient ck of Φ. We’ve rederived Φ(t) =

X∞ k=−∞

cke^2πikt/p, where ck = ¹ p

Z p/2

−p/2

e^{−2πi(k/p)t}Φ(t) dt .

5.5 Sampling Signals

In the previous lecture we studied three properties of III that make it so useful in many applications. They are:

• Periodizing

◦ Convolving with III periodizes a function.

• Sampling

◦ Multiplying by III samples a function.

• The Fourier transform of III is III.

◦ Convolving and multiplying are themselves flip sides of the same coin via the convolution theorem for Fourier transforms.

We are now about to combine all of these ideas in a spectacular way to treat the problem of “sampling and interpolation”. Let me state the problem this way:

• Given a signal f (t) and a collection of samples of the signal, i.e., values of the signal at a set of points f (t0), f (t1), f (t2), . . . , to what extent can one interpolate the values f (t) at other points from the sample values?

This is an old question, and a broad one, and it would appear on the surface to have nothing to do with III’s or Fourier transforms, or any of that. But we’ve already seen some clues, and the full solution is set to unfold.

5.5.1 Sampling sines and bandlimited signals

Why should we expect to be able to do interpolation at all? Imagine putting down a bunch of dots — maybe even infinitely many — and asking someone to pass a curve through them that agrees everywhere exactly with a predetermined mystery function passing through those dots. Ridiculous. But it’s not ridiculous. If a relatively simple hypothesis is satisfied then interpolation can be done! Here’s one way of getting some intuitive sense of the problem and what that hypothesis should be.

Suppose we know a signal is a single sinusoid. A sinusoid repeats, so if we have enough information to pin it down over one period, or cycle, then we know the whole thing. How many samples — how many values of the function — within one period do we need to know to know which sinusoid we have? We need three samples strictly within one cycle. You can think of the graph, or you can think of the equation: A general sinusoid is of the form A sin(2πνt + φ). There are three unknowns, the amplitude A, the frequency ν and the phase φ. We would expect to need three equations to find the unknowns, hence we need values of the function at three points, three samples.

(14)

What if the signal is a sum of sinusoids, say XN n=1

Ansin(2πnνt + φn) .

Sample points for the sum are “morally” sample points for the individual harmonics, though not explicitly.

We need to take enough samples to get sufficient information to determine all of the unknowns for all of the harmonics. Now, in the time it takes for the combined signal to go through one cycle, the individual harmonics will have gone through several cycles, the lowest frequency harmonic through one cycle, the lower frequency harmonics through a few cycles, say, and the higher frequency harmonics through many.

We have to take enough samples of the combined signal so that as the individual harmonics go rolling along we’ll be sure to have at least three samples in some cycle of every harmonic.

To simplify and standardize we assume that we take evenly spaced samples (in t). Since we’ve phrased things in terms of cycles per second, to understand how many samples are enough it’s then also better to think in terms of “sampling rate”, i.e., samples/sec instead of “number of samples”. If we are to have at least three samples strictly within a cycle then the sample points must be strictly less than a half-cycle apart. A sinusoid of frequency ν goes through a half-cycle in 1/2ν seconds so we want

spacing between samples = number of seconds number of samples < 1

2ν . The more usual way of putting this is

sampling rate = samples/sec > 2ν .

This is the rate at which we should sample a given sinusoid of frequency ν to guarantee that a single cycle will contain at least three sample points. Furthermore, if we sample at this rate for a given frequency, we will certainly have more than three sample points in some cycle of any harmonic at a lower frequency.

Note that the sampling rate has units 1/seconds and that sample points are 1/(sampling rate) seconds apart.

For the combined signal — a sum of harmonics — the higher frequencies are driving up the sampling rate; specifically, the highest frequency is driving up the rate. To think of the interpolation problem geometrically, high frequencies cause more rapid oscillations, i.e., rapid changes in the function over small intervals, so to hope to interpolate such fluctuations accurately we’ll need a lot of sample points and thus a high sampling rate. For example, here’s a picture of the sum of two sinusoids one of low frequency and one of high frequency.

(15)

5.6 Sampling and Interpolation for Bandlimited Signals 225

If we sample at too low rate we might miss the wiggles entirely. We might mistakenly think we had only the low frequency sinusoid, and, moreover, if all we had to go on were the samples we wouldn’t even know we’d made a mistake! We’ll come back to just this problem a little later.

If we sample at a rate greater than twice the highest frequency, our sense is that we will be sampling often enough for all the lower harmonics as well, and we should be able to determine everything. The problem here is if the spectrum is unbounded. If, as for a square wave, we have a full Fourier series and not just a finite sum of sinusoids, then we have no hope of sampling frequently enough to determine the combined signal from the samples. For a square wave, for example, there is no “highest frequency”. That’s trouble.

It’s time to define ourselves out of this trouble.

Bandlimited signals From the point of view of the preceding discussion, the problem for interpolation, is high frequencies, and the best thing a signal can be is a finite Fourier series. The latter is much too restrictive for applications, of course, so what’s the “next best” thing a signal can be? It’s one for which there is a highest frequency. These are the bandlimited signals — signals whose Fourier transforms are identically zero outside of a finite interval. Such a signal has a bounded spectrum; there is a “highest frequency”.

More formally:

• A signal f (t) is bandlimited if there is a finite number p such that F f (s) = 0 for all |s| ≥ p/2. The smallest number p for which this is true is called the bandwidth of f (t).

There’s a question about having F f be zero at the endpoints ±p/2 as part of the definition. For the fol- lowing discussion on sampling and interpolation, it’s easiest to assume this is the case, and treat separately some special cases when it isn’t. For those who want to know more, read the next paragraph.

Some technical remarks If f (t) is an integrable function then F f (s) is continuous, so if F f (s) = 0 for all |s| > p/2 then F f (±p/2) = 0 as well. On the other hand, it’s also common first to define the support of a function (integrable or not) as the complement of the largest open set on which the function is identically zero. (This definition can also be given for distributions.) This makes the support closed, being the complement of an open set. For example, if F f (s) is identically zero for |s| > p/2, and on no larger open set, then the support of F f is the closed interval [−p/2, +p/2]. Thus, with this definition, even if F f (±p/2) = 0 the endpoints ±p/2 are included in the support of F f .

One then says, as an alternate definition, that f is bandlimited if the support of F f is closed and bounded.

In mathematical terms, a closed, bounded set (in Rⁿ) is said to be compact, and so the shorthand definition of bandlimited is that F f has compact support. A typical compact set is a closed interval, like [−p/2, +p/2], but we could also take finite unions of closed intervals. This definition is probably the one more often given, but it’s a little more involved to set up, as you’ve just witnessed. Whichever definition of bandlimited one adopts there are always questions about what happens at the endpoints anyway, as we’ll see.

5.6 Sampling and Interpolation for Bandlimited Signals

We’re about to solve the interpolation problem for bandlimited signals. We’ll show that interpolation is possible by finding an explicit formula that does the job. Before going through the solution, however, I want to make a general observation that’s independent of the interpolation problem but is important to it.

It is unphysical to consider a signal as lasting forever in time. A physical signal f (t) is naturally “time- limited”, meaning that f (t) is identically zero on |t| ≥ q/2 for some q — there just isn’t any signal beyond

(16)

a point. On the other hand, it is very physical to consider a bandlimited signal, one with no frequencies beyond a certain point, or at least no frequencies that our instruments can register. Well, we can’t have both, at least not in the ideal world of mathematics. Here is where mathematical description meets physical expectation — and they disagree. The fact is:

• A signal cannot be both timelimited and bandlimited.

What this means in practice is that there must be inaccuracies in a mathematical model of a phenomenon that assumes a signal is both timelimited and bandlimited. Such a model can be at best an approximation, and one has to be prepared to estimate the errors as they may affect measurements and conclusions.

Here’s one argument why the statement is true; I’ll give a more complete proof of a more general statement in Appendix 1. Suppose f is bandlimited, say F f (s) is zero for |s| ≥ p/2. Then

F f = Πp· F f . Take the inverse Fourier transform of both sides to obtain

f (t) = p sinc pt ∗ f (t) .

Now sinc pt “goes on forever”; it decays but it has nonzero values all the way out to ±∞. Hence the convolution with f also goes on forever; it is not timelimited.

sinc as a “convolution identity” There’s an interesting observation that goes along with the argument we just gave. We’re familiar with δ acting as an “identity element” for convolution, meaning

f ∗ δ = f .

This important property of δ holds for all signals for which the convolution is defined. We’ve just seen for the more restricted class of bandlimited functions, with spectrum from −p/2 to +p/2, that the sinc function also has this property:

p sinc pt ∗ f (t) = f (t) .

The Sampling Theorem Ready to solve the interpolation problem? It uses all the important properties of III, but it goes so fast that you might miss the fun entirely if you read too quickly.

Suppose f (t) is bandlimited with F f (s) identically zero for |s| ≥ p/2. We periodize F f using IIIp and then cut off to get F f back again:

F f = Π_p(F f ∗ III_p) . This is the crucial equation.

(17)

5.6 Sampling and Interpolation for Bandlimited Signals 227

Now take the inverse Fourier transform:

f (t) = F⁻¹F f (t) = F⁻¹(Πp(F f ∗ IIIp))(t)

= F⁻¹Π_p(t) ∗ F⁻¹(F f ∗ III_p)(t)

(taking F⁻¹ turns multiplication into convolution)

= F⁻¹Π_p(t) ∗ (F⁻¹F f (t) · F⁻¹III_p(t))

(ditto, except it’s convolution turning into multiplication)

= p sinc pt ∗ (f (t) ·¹

pIII_1/p(t))

= sinc pt ∗ X∞ k=−∞

f

k p

δ

t −^k

p

(the sampling property of III_p)

= X∞ k=−∞

f

k p

sinc pt ∗ δ

t −^k

p

= X∞ k=−∞

f

k p

sinc p

t − ^k

p

(the sifting property of δ)

We’ve just established the classic “Sampling Theorem”, though it might be better to call it the “interpolation theorem”. Here it is as a single statement:

• If f (t) is a signal with F f (s) identically zero for |s| ≥ p/2 then f (t) =

X∞ k=−∞

f

k p

sinc p

t − ^k

p

.

Some people write the formula as

f (t) = X∞ k=−∞

f

k p

sinc(pt − k) ,

but I generally prefer to emphasize the sample points t_k = ^k

p

and then to write the formula as

f (t) = X∞ k=−∞

f (tk) sinc p(t − tk) .

What does the formula do, once again? It computes any value of f in terms of sample values. Here are a few general comments to keep in mind:

• The sample points are spaced 1/p apart — the reciprocal of the bandwidth.⁴

4That sort of reciprocal phenomenon is present again in higher dimensional versions of the sampling formula. This will be a later topic for us.

(18)

• The formula involves infinitely many sample points — k/p for k = 0, ±1, ±2, . . . .

So don’t think you’re getting away too cheaply, and realize that any practical implementation can only involve a finite number of terms in the sum, so will necessarily be an approximation.

◦ Since a bandlimited signal cannot be timelimited we should expect to have to take samples all the way out to ±∞. However, sampling a bandlimited periodic signal, i.e., a finite Fourier series, requires only a finite number of samples. We’ll cover this, below.

Put the outline of the argument for the sampling theorem into your head — it’s important. Starting with a bandlimited signal, there are three parts:

• Periodize the Fourier transform.

• Cut off this periodic function to get back where you started.

• Take the inverse Fourier transform.

Cutting off in the second step, a multiplication, exactly undoes periodizing in the first step, a convolution, provided that F f = Π_p(F f ∗ III_p). But taking the inverse Fourier transform swaps multiplication with convolution and this is why something nontrivial happens. It’s almost obscene the way this works.

Sampling rates and the Nyquist frequency The bandwidth determines the minimal sampling rate we can use to reconstruct the signal from its samples. I’d almost say that the bandwidth is the minimal sampling rate except for the slight ambiguity about where the spectrum starts being identically zero (the

“endpoint problem”). Here’s the way the situation is usually expressed: If the (nonzero) spectrum runs from −νmax to νmax then we need

sampling rate > 2ν_max to reconstruct the signal from its samples.

The number 2νmax is often called the Nyquist frequency, after Harry Nyquist, God of Sampling, who was the first engineer to consider these problems for the purpose of communications. There are other names associated with this circle of ideas, most notably E. Whittaker, a mathematician, and C. Shannon, an all around genius and founder of Information Theory. The formula as we’ve given it is often referred to as the Shannon Sampling Theorem.

The derivation of the formula gives us some one-sided freedom, or rather the opportunity to do more work than we have to. We cannot take p smaller than the length of the interval where F f is supported, the bandwidth, but we can take it larger. That is, if p is the bandwidth and q > p we can periodize F f to have period q by convolving with IIIq and we still have the fundamental equation

F f = Π_q(F f ∗ III_q) .

(Draw a picture.) The derivation can then proceed exactly as above and we get f (t) =

X∞ k=−∞

f (τ_k) sinc q(t − τ_k) where the sample points are

τ_k = k q .

These sample points are spaced closer together than the sample points t_k = k/p. The sampling rate is higher than we need. We’re doing more work than we have to.

(19)

5.7 Interpolation a Little More Generally 229

5.7 Interpolation a Little More Generally

Effective approximation and interpolation of signals raises a lot of interesting and general questions. One approach that provides a good framework for many such questions is to bring in orthogonality. It’s very much analogous to the way we looked at Fourier series.

Interpolation and orthogonality We begin with still another amazing property of sinc functions

— they form an orthonormal collection. Specifically, the family of sinc functions {sinc(t − n) : n = 0, ±1, ±2, . . .} is orthonormal with respect to the usual inner product on L²(R). Recall that the inner product is

(f, g) = Z ∞

−∞

f (t)g(t) dt .

The calculation to establish the orthonormality property of the sinc functions uses the general Parseval

identity, Z ∞

−∞

f (t)g(t) dt = Z ∞

−∞

F f (s)F g(s) ds . We then have

Z ∞

−∞

sinc(t − n) sinc(t − m) dt = Z ∞

−∞

(e^−2πisnΠ(s)) (e^−2πismΠ(s)) ds

= Z ∞

−∞

e^2πis(m−n)Π(s)Π(s) ds = Z 1/2

−1/2

e^2πis(m−n)ds

From here direct integration will give you that this is 1 when n = m and 0 when n 6= m.

In case you’re fretting over it, the sinc function is in L²(R) and the product of two sinc functions is integrable. Parseval’s identity holds for functions in L²(R), though we did not establish this.

Now let’s consider bandlimited signals g(t), and to be definite let’s suppose the spectrum is contained in

−1/2 ≤ s ≤ 1/2. Then the Nyquist sampling rate is 1, i.e., we sample at the integer points, and the interpolation formula takes the form

g(t) = X∞ n=−∞

g(n) sinc(t − n) .

Coupled with the result on orthogonality, this formula suggest that the family of sinc functions forms an orthonormal basis for the space of bandlimited signals with spectrum in [−1/2, 1/2], and that we’re expressing g(t) in terms of this basis. To see that this really is the case, we interpret the coefficients (the

(20)

sample values g(n)) as the inner product of g(t) with sinc(t − n). We have, again using Parseval, (g(t), sinc(t − n)) =

Z ∞

−∞

g(t) sinc(t − n) dt

= Z ∞

−∞

F g(s)F (sinc(t − n)) ds (by Parseval)

= Z ∞

−∞

F g(s)(e^−2πisnΠ(s)) ds

= Z 1/2

−1/2

F g(s)e^2πinsds

= Z ∞

−∞

F g(s)e^2πinsds (because g is bandlimited)

= g(n) (by Fourier inversion)

It’s perfect! The interpolation formula says that g(t) is written in terms of an orthonormal basis, and the coefficient g(n), the n-th sampled value of g(t), is exactly the projection of g(t) onto the n-th basis element:

g(t) = X∞ n=−∞

g(n) sinc(t − n) = X∞ n=−∞

g(t), sinc(t − n)

sinc(t − n) .

Lagrange interpolation Certainly for computational questions, going way back, it is desirable to find reasonably simple approximations of complicated functions, particularly those arising from solutions to differential equations.⁵ The classic way to approximate is to interpolate. That is, to find a simple function that, at least, assumes the same values as the complicated function at a given finite set of points. Curve fitting, in other words. The classic way to do this is via polynomials. One method, presented here just for your general background and know-how, is due to Lagrange.

Suppose we have n points t1, t2, . . . , tn. We want a polynomial of degree n − 1 that assumes given values at the n sample points. (Why degree n − 1?)

For this, we start with an n-th degree polynomial that vanishes exactly at those points. This is given by p(t) = (t − t₁)(t − t₂) · · · (t − t_n) .

Next put

p_k(t) = p(t) t − tk

.

Then pk(t) is a polynomial of degree n − 1; we divide out the factor (t − tk) and so pk(t) vanishes at the same points as p(t) except at t_k. Next consider the quotient

p_k(t) pk(tk).

This is again a polynomial of degree n − 1. The key property is that pk(t)/pk(tk) vanishes at the sample points t_j except at the point t_k where the value is 1; i.e.,

p_k(t_j) pk(tk) =

(1 j = k 0 j 6= k

5The sinc function may not really qualify as an “easy approximation”. How is it computed, really?

(21)

5.8 Finite Sampling for a Bandlimited Periodic Signal 231

To interpolate a function by a polynomial (to fit a curve through a given set of points) we just scale and add. That is, suppose we have a function g(t) and we want a polynomial that has values g(t1), g(t2), . . . , g(t_n) at the points t₁, t₂, . . . , t_n. We get this by forming the sum

p(t) = Xn k=1

g(tk) p_k(t) pk(tk).

This does the trick. It is known as the Lagrange Interpolation Polynomial. Remember, unlike the sampling formula we’re not reconstructing all the values of g(t) from a set of sample values. We’re approximating g(t) by a polynomial that has the same values as g(t) at a prescribed set of points.

The sinc function is an analog of the p_k(t)/p_k(t_k) for “Fourier interpolation”, if we can call it that. With sinc t = sin πt

πt .

we recall some properties, analogous to the polynomials we built above:

• sinc t = 1 when t = 0

• sinc t = 0 at nonzero integer points t = ±1, ±2, . . . . Now shift this and consider

sinc(t − k) = sin π(t − k) π(t − k) . This has the value 1 at t = k and is zero at the other integers.

Suppose we have our signal g(t) and the sample points . . . , g(−2), g(−1), g(0), g(1), g(2), . . . . So, again, we’re sampling at evenly spaced points, and we’ve taken the sampling rate to be 1 just to simplify. To interpolate these values we would then form the sum

X∞ n=−∞

g(k) sinc(t − k) .

There it is again — the general interpolation formula. In the case that g(t) is bandlimited (bandwidth 1 in this example) we know we recover all values of g(t) from the sample values.

5.8 Finite Sampling for a Bandlimited Periodic Signal

We started this whole discussion of sampling and interpolation by arguing that one ought to be able to interpolate the values of a finite sum of sinusoids from knowledge of a finite number of samples. Let’s see how this works out, but rather than starting from scratch let’s use what we’ve learned about sampling for general bandlimited signals.

As always, it’s best to work with the complex form of a sum of sinusoids, so we consider a real signal given by

f (t) = XN k=−N

cke^2πikt/q, c−k = ck.

f (t) is periodic of period q. Recall that c−k = ck. Some of the coefficients may be zero, but we assume that c_N 6= 0.

There are 2N + 1 terms in the sum (don’t forget k = 0) and it should take 2N + 1 sampled values over one period to determine f (t) completely. You might think it would take twice this many sampled values

(22)

because the values of f (t) are real and we have to determine complex coefficients. But remember that c−k = ck, so if we know ck we know c−k. Think of the 2N + 1 sample values as enough information to determine the real number c₀ and the N complex numbers c₁, c₂, . . . , c_N.

The Fourier transform of f is

F f (s) = XN k=−N

c_kδ

s − k

q

and the spectrum goes from −N/q to N/q. The sampling formula applies to f (t), and we can write an equation of the form

f (t) = X∞ k=−∞

f (tk) sinc p(t − tk) ,

but it’s a question of what to take for the sampling rate, and hence how to space the sample points.

We want to make use of the known periodicity of f (t). If the sample points tk are a fraction of a period apart, say q/M for an M to be determined, then the values f (t_k) with t_k = kq/M , k = 0, ±1, ±2, . . . will repeat after M samples. We’ll see how this collapses the interpolation formula.

To find the right sampling rate, p, think about the derivation of the sampling formula, the first step being: “periodize F f ”. The Fourier transform F f is a bunch of δ’s spaced 1/q apart (and scaled by the coefficients ck). The natural periodization of F f is to keep the spacing 1/q in the periodized version, essentially making the periodized F f a scaled version of III_1/q. We do this by convolving F f with III_p where p/2 is the midpoint between N/q, the last point in the spectrum of F f , and the point (N + 1)/q, which is the next point 1/q away. Here’s a picture.

1

0 1/q N /qp/2

-1/q -p/2-N/q

(N +1)/q -(N+1)/q

Thus we find p from p 2 = 1

2

N

q +N + 1 q

= (2N + 1)

2q , or p = 2N + 1

q .

We periodize F f by III_p (draw yourself a picture of this!), cut off by Π_p, then take the inverse Fourier transform. The sampling formula back in the time domain is

f (t) = X∞ k=−∞

f (t_k) sinc p(t − t_k)

with

t_k = ^k p.

(23)

5.8 Finite Sampling for a Bandlimited Periodic Signal 233

With our particular choice of p let’s now see how the q-periodicity of f (t) comes into play. Write M = 2N + 1 ,

so that

t_k = ^k p = kq

M .

Then, to repeat what we said earlier, the sample points are spaced a fraction of a period apart, q/M , and after f (t₀), f (t₁), . . . , f (t_{M −1}) the sample values repeat, e.g., f (t_M) = f (t₀), f (t_{M +1}) = f (t₁) and so on.

More succinctly,

tk+k⁰M = tk + k⁰q , and so

f (t_k+k⁰_M) = f (tk + k⁰q) = f (tk) ,

for any k and k⁰. Using this periodicity of the coefficients in the sampling formula, the single sampling sum splits into M sums as:

X∞ k=−∞

f (tk) sinc p(t − tk)

= f (t₀) X∞ m=−∞

sinc(pt − mM ) + f (t₁) X∞ m=−∞

sinc(pt − (1 + mM )) +

f (t2) X∞ m=−∞

sinc(pt − (2 + mM )) + · · · + f (tM −1) X∞ m=−∞

sinc(pt − (M − 1 + mM ))

Those sums of sincs on the right are periodizations of sinc pt and, remarkably, they have a simple closed form expression. The k-th sum is

X∞ m=−∞

sinc(pt − k − mM ) = sinc(pt − k) ∗ III_M/p(t) = sinc(pt − k)

sinc(_M¹ (pt − k)) = sinc(p(t − t_k)) sinc(¹_q(t − tk)).

(I’ll give a derivation of this at the end of this section.) Using these identities, we find that the sampling formula to interpolate

f (t) = XN k=−N

c_ke^2πikt/q

from 2N + 1 = M sampled values is

f (t) = X2N k=0

f (t_k)sinc(p(t − t_k))

sinc(¹_q(t − tk)), where p = ^{2N + 1}

q , t_k = ^k

p = ^kq 2N + 1. This is the “finite sampling theorem” for periodic functions.

It might also be helpful to write the sampling formula in terms of frequencies. Thus, if the lowest frequency is νmin = 1/q and the highest frequency is νmax= N νmin then

f (t) = X2N k=0

f (tk)sinc((2ν_max+ ν_min)(t − t_k))

sinc(ν_min(t − t_k)) , where tk = ^kq 2N + 1.

(24)

The sampling rate is

sampling rate = 2νmax+ νmin. Compare this to

sampling rate > 2ν_max for a general bandlimited function.

Here’s a simple example of the formula. Take f (t) = cos 2πt. There’s only one frequency, and ν_min = νmax= 1. Then N = 1, the sampling rate is 3 and the sample points are t0 = 0, t1 = 1/3, and t2 = 2/3.

The formula says

cos 2πt = sinc 3t

sinc t + cos_2π 3

 sinc(3(t −¹₃))

sinc(t −¹₃) + cos_4π 3

 sinc(3(t −²₃)) sinc(t − ²₃) .

Does this really work? I’m certainly not going to plow through the trig identities needed to check it!

However, here’s a plot of the right hand side.

Any questions? Ever thought you’d see such a complicated way of writing cos 2πt?

Periodizing sinc Functions In applying the general sampling theorem to the special case of a periodic signal, we wound up with sums of sinc functions which we recognized (sharp-eyed observers that we are) to be periodizations. Then, out of nowhere, came a closed form expression for such periodizations as a ratio of sinc functions. Here’s where this comes from, and here’s a fairly general result that covers it.

Lemma Let p, q > 0 and let N be the largest integer strictly less than pq/2. Then X∞

k=−∞

sinc(pt − kpq) = sinc(pt) ∗ III_q(t) = 1 pq

sin((2N + 1)πt/q) sin(πt/q) .

There’s a version of this lemma with N ≤ pq/2, too, but that’s not important for us. In terms of sinc functions the formula is

sinc(pt) ∗ III_q(t) = 2N + 1 pq

sinc((2N + 1)t/q) sinc(t/q) .