4.2 The Right Functions for Fourier Transforms: Rapidly Decreasing Functions

(1)

Distributions and Their Fourier Transforms

4.1 The Day of Reckoning

We’ve been playing a little fast and loose with the Fourier transform — applying Fourier inversion, appealing to duality, and all that. “Fast and loose” is an understatement if ever there was one, but it’s also true that we haven’t done anything “wrong”. All of our formulas and all of our applications have been correct, if not fully justified. Nevertheless, we have to come to terms with some fundamental questions. It will take us some time, but in the end we will have settled on a very wide class of signals with these properties:

• The allowed signals include δ’s, unit steps, ramps, sines, cosines, and all other standard signals that the world’s economy depends on.

• The Fourier transform and its inverse are defined for all of these signals.

• Fourier inversion works.

These are the three most important features of the development to come, but we’ll also reestablish some of our specific results and as an added benefit we’ll even finish off differential calculus!

4.1.1 A too simple criterion and an example

It’s not hard to write down an assumption on a function that guarantees the existence of its Fourier transform and even implies a little more than existence.

• If Z ∞

−∞

|f (t)| dt < ∞ then F f and F⁻¹f exist and are continuous.

Existence follows from

|F f (s)| =

Z ∞

−∞

e^−2πistf (t) dt

≤ Z ∞

−∞

|e^−2πist| |f (t)| dt = Z ∞

−∞

|f (t)| dt < ∞ .

(2)

Here we’ve used that the magnitude of the integral is less that the integral of the magnitude.¹ There’s actually something to say here, but while it’s not complicated, I’d just as soon defer this and other comments on “general facts on integrals” to Section 4.3; read it if only lightly — it provides some additional orientation.

Continuity is the little extra information we get beyond existence. Continuity follows as follows. For any s and s⁰ we have

|F f (s) − F f (s⁰)| =

Z ∞

−∞

e^−2πistf (t) dt − Z ∞

−∞

e^−2πis⁰^tf (t) dt

=

Z ∞

−∞

(e^−2πist− e^−2πis⁰^t)f (t) dt ≤

Z ∞

−∞

|e^−2πist− e^−2πis⁰^t| |f (t)| dt As a consequence ofR∞

−∞|f (t)| dt < ∞ we can take the limit as s⁰→ s inside the integral. If we do that then |e^−2πist− e^−2πis⁰^t| → 0, that is,

|F f (s) − F f (s⁰)| → 0 as s⁰→ s

which says that F f (s) is continuous. The same argument works to show that F⁻¹f is continuous.²

We haven’t said anything here about Fourier inversion — no such statement appears in the criterion. Let’s look right away at an example.

The very first example we computed, and still an important one, is the Fourier transform of Π. We found directly that

F Π(s) = Z ∞

−∞

e^−2πistΠ(t) dt = Z 1/2

−1/2

e^−2πistdt = sinc s .

No problem there, no problem whatsoever. The criterion even applies; Π is in L¹(R) since Z ∞

−∞

|Π(t)| dt = Z 1/2

−1/2

1 dt = 1 .

Furthermore, the transform F Π(s) = sinc s is continuous. That’s worth remarking on: Although the signal jumps (Π has a discontinuity) the Fourier transform does not, just as guaranteed by the preceding result

— make this part of your intuition on the Fourier transform vis `a vis the signal.

Appealing to the Fourier inversion theorem and what we called duality, we then said F sinc(t) =

Z ∞

−∞

e^−2πistsinc t dt = Π(s) .

Here we have a problem. The sinc function does not satisfy the integrability criterion. It is my sad duty

to inform you that Z ∞

−∞

| sinc t| dt = ∞ .

I’ll give you two ways of seeing the failure of | sinc t| to be integrable. First, if sinc did satisfy the criterion R∞

−∞| sinc t| dt < ∞ then its Fourier transform would be continuous. But its Fourier transform, which has

1Magnitude, not absolute value, because the integral is complex number.

2So another general fact we’ve used here is that we can take the limit inside the integral. Save yourself for other things and let some of these “general facts” ride without insisting on complete justifications — they’re everywhere once you let the rigor police back on the beat.

(3)

to come out to be Π, is not continuous. Or, if you don’t like that, here’s a direct argument. We can find infinitely many intervals where | sin πt| ≥ 1/2; this happens when t is between 1/6 and 5/6, and that repeats for infinitely many intervals, for example on I_n = [¹₆ + 2n,⁵₆+ 2n], n = 0, 1, 2, . . . , because sin πt is periodic of period 2. The In all have length 2/3. On In we have |t| ≤ ⁵₆+ 2n, so

1

|t| ≥ 1 5/6 + 2n

and Z

I_n

| sin πt|

π|t| dt ≥ 1 2π

1 5/6 + 2n

Z

I_n

dt = 1 2π

2 3

1 5/6 + 2n.

Then Z ∞

−∞

| sin πt|

π|t| dt ≥X

n

Z

In

| sin πt|

π|t| dt = 1 3π

X∞ n=1

1

5/6 + 2n = ∞ .

It’s true that | sinc t| = | sin πt/πt| tends to 0 as t → ±∞ — the 1/t factor makes that happen — but not

“fast enough” to make the integral of | sinc t| converge.

This is the most basic example in the theory! It’s not clear that the integral defining the Fourier transform of sinc exists, at least it doesn’t follow from the criterion. Doesn’t this bother you? Isn’t it a little embarrassing that multibillion dollar industries seem to depend on integrals that don’t converge?

In fact, there isn’t so much of a problem with either Π or sinc. It is true that Z ∞

−∞

e^−2πist sinc s ds =

(1 |t| < ¹₂ 0 |t| > ¹₂

However showing this — evaluating the improper integral that defines the Fourier transform — requires special arguments and techniques. The sinc function oscillates, as do the real and imaginary parts of the complex exponential, and integrating e^−2πistsinc s involves enough cancellation for the limit

lim

a→−∞

b→∞

Z b a

e^−2πist sinc s ds

to exist.

Thus Fourier inversion, and duality, can be pushed through in this case. At least almost. You’ll notice that I didn’t say anything about the points t = ±1/2, where there’s a jump in Π in the time domain. In those cases the improper integral does not exist, but with some additional interpretations one might be able to convince a sympathetic friend that

Z ∞

−∞

e−2πi(±1/2)ssinc s ds = ¹₂

in the appropriate sense (invoking “principle value integrals” — more on this in a later lecture). At best this is post hoc and needs some fast talking.³

The truth is that cancellations that occur in the sinc integral or in its Fourier transform are a very subtle and dicey thing. Such risky encounters are to be avoided. We’d like a more robust, trustworthy theory.

3One might also then argue that defining Π(±1/2) = 1/2 is the best choice. I don’t want to get into it.

(4)

The news so far Here’s a quick summary of the situation. The Fourier transform of f (t) is defined

when Z ∞

−∞

|f (t)| dt < ∞ .

We allow f to be complex valued in this definition. The collection of all functions on R satisfying this condition is denoted by L¹(R), the superscript 1 indicating that we integrate |f (t)| to the first power.⁴ The L¹-norm of F is defined by

kf k₁= Z ∞

−∞

|f (t)| dt .

Many of the examples we worked with are L¹-functions — the rect function, the triangle function, the exponential decay (one or two-sided), Gaussians — so our computations of the Fourier transforms in those cases were perfectly justifiable (and correct). Note that L¹-functions can have discontinuities, as in the rect function.

The criterion says that if f ∈ L¹(R) then F f exists. We can also say

|F f (s)| =

Z ∞

−∞

e^−2πistf (t) dt ≤

Z ∞

−∞

|f (t)| dt = kf k1. That is:

• The magnitude of the Fourier transform is bounded by the L¹-norm of the function.

This is a handy estimate to be able to write down — we’ll use it shortly. However, to issue a warning:

Fourier transforms of L¹(R) functions may themselves not be in L¹, like for the sinc function, so we don’t know without further work what more can be done, if anything.

The conclusion is that L¹-integrability of a signal is just too simple a criterion on which to build a really helpful theory. This is a serious issue for us to understand. Its resolution will greatly extend the usefulness of the methods we have come to rely on.

There are other problems, too. Take, for example, the signal f (t) = cos 2πt. As it stands now, this signal does not even have a Fourier transform — does not have a spectrum! — for the integral

Z ∞

−∞

e^−2πistcos 2πt dt does not converge, no way, no how. This is no good.

Before we bury L¹(R) as too restrictive for our needs, here’s one more good thing about it. There’s actually a stronger consequence for F f than just continuity.

• If Z ∞

−∞

|f (t)| dt < ∞ then F f (s) → 0 as s → ±∞.

4And the letter “L” indicating that it’s really the Lebesgue integral that should be employed.

(5)

This is called the Riemann-Lebesgue lemma and it’s more difficult to prove than showing simply that F f is continuous. I’ll comment on it later; see Section 4.19. One might view the result as saying that F f (s) is at least trying to be integrable. It’s continuous and it tends to zero as s → ±∞. Unfortunately, the fact that F f (s) → 0 does not imply that it’s integrable (think of sinc, again).⁵ If we knew something, or could insist on something about the rate at which a signal or its transform tends to zero at ±∞ then perhaps we could push on further.

4.1.2 The path, the way

To repeat, we want our theory to encompass the following three points:

• The allowed signals include δ’s, unit steps, ramps, sines, cosines, and all other standard signals that the world’s economy depends on.

• The Fourier transform and its inverse are defined for all of these signals.

• Fourier inversion works.

Fiddling around with L¹(R) or substitutes, putting extra conditions on jumps — all have been used. The path to success lies elsewhere. It is well marked and firmly established, but it involves a break with the classical point of view. The outline of how all this is settled goes like this:

1. We single out a collection of functions S for which convergence of the Fourier integrals is assured, for which a function and its Fourier transform are both in S, and for which Fourier inversion works.

Furthermore, Parseval’s identity holds:

Z ∞

−∞

|f (x)|²dx = Z ∞

−∞

|F f (s)|²ds .

This much is classical; new ideas with new intentions, yes, but not new objects. Perhaps surprisingly it’s not so hard to find a suitable collection S, at least if one knows what one is looking for. But what comes next is definitely not “classical”. It had been first anticipated and used effectively in an early form by O. Heaviside, developed, somewhat, and dismissed, mostly, soon after by less talented people, then cultivated by and often associated with the work of P. Dirac, and finally refined by L. Schwartz.

2. S forms a class of test functions which, in turn, serve to define a larger class of generalized functions or distributions, called, for this class of test functions the tempered distributions, T . Precisely because S was chosen to be the ideal Fourier friendly space of classical signals, the tempered distributions are likewise well suited for Fourier methods. The collection of tempered distributions includes, for example, L¹ and L²-functions (which can be wildly discontinuous), the sinc function, and complex exponentials (hence periodic functions). But it includes much more, like the delta functions and related objects.

3. The Fourier transform and its inverse will be defined so as to operate on these tempered distributions, and they operate to produce distributions of the same type. Thus the inverse Fourier transform can be applied, and the Fourier inversion theorem holds in this setting.

4. In the case when a tempered distributions “comes from a function” — in a way we’ll make precise

— the Fourier transform reduces to the usual definition as an integral, when the integral makes sense. However, tempered distributions are more general than functions, so we really will have done something new and we won’t have lost anything in the process.

5For that matter, a function in L¹(R) need not tend to zero at ±∞; that’s also discussed in Appendix 1.

(6)

Our goal is to hit the relatively few main ideas in the outline above, suppressing the considerable mass of details. In practical terms this will enable us to introduce delta functions and the like as tools for computation, and to feel a greater measure of confidence in the range of applicability of the formulas.

We’re taking this path because it works, it’s very interesting, and it’s easy to compute with. I especially want you to believe the last point.

We’ll touch on some other approaches to defining distributions and generalized Fourier transforms, but as far as I’m concerned they are the equivalent of vacuum tube technology. You can do distributions in other ways, and some people really love building things with vacuum tubes, but wouldn’t you rather learn something a little more up to date?

4.2 The Right Functions for Fourier Transforms: Rapidly Decreasing Functions

Mathematics progresses more by making intelligent definitions than by proving theorems. The hardest work is often in formulating the fundamental concepts in the right way, a way that will then make the deductions from those definitions (relatively) easy and natural. This can take awhile to sort out, and a subject might be reworked several times as it matures; when new discoveries are made and one sees where things end up, there’s a tendency to go back and change the starting point so that the trip becomes easier. Mathematicians may be more self-conscious about this process, but there are certainly examples in engineering where close attention to the basic definitions has shaped a field — think of Shannon’s work on Information Theory, for a particularly striking example.

Nevertheless, engineers, in particular, often find this tiresome, wanting to do something and not “just talk about it”: “Devices don’t have hypotheses”, as one of my colleagues put it. One can also have too much of a good thing — too many trips back to the starting point to rewrite the rules can make it hard to follow the game, especially if one has already played by the earlier rules. I’m sympathetic to both of these criticisms, and for our present work on the Fourier transform I’ll try to steer a course that makes the definitions reasonable and lets us make steady forward progress.

4.2.1 Smoothness and decay

To ask “how fast” F f (s) might tend to zero, depending on what additional assumptions we might make about the function f (x) beyond integrability, will lead to our defining “rapidly decreasing functions”, and this is the key. Integrability is too weak a condition on the signal f , but it does imply that F f (s) is continuous and tends to 0 at ±∞. What we’re going to do is study the relationship between the smoothness of a function — not just continuity, but how many times it can be differentiated — and the rate at which its Fourier transform decays at infinity.

We’ll always assume that f (x) is absolutely integrable, and so has a Fourier transform. Let’s suppose, more stringently, that

• xf (x) is integrable, i.e., Z ∞

−∞

|xf (x)| dx < ∞ .

(7)

Then xf (x) has a Fourier transform, and so does −2πixf (x) and its Fourier transform is F (−2πixf (x)) =

Z ∞

−∞

(−2πix)e^−2πisxf (x) dx

= Z ∞

−∞

d

dse^−2πisx

f (x) dx = ^d ds

Z ∞

−∞

e^−2πisxf (x) dx

(switching d/ds and the integral is justified by the integrability of |xf (x)|)

= ^d

ds(F f )(s)

This says that the Fourier transform F f (s) is differentiable and that its derivative is F (−2πixf (x)). When f (x) is merely integrable we know that F f (s) is merely continuous, but with the extra assumption on the integrability of xf (x) we conclude that F f (s) is actually differentiable. (And its derivative is continuous.

Why?)

For one more go-round in this direction, what if x²f (x) is integrable? Then, by the same argument, F ((−2πix)²f (x)) =

Z ∞

−∞

(−2πix)²e^−2πisxf (x) dx

= Z ∞

−∞

d²

ds²e^−2πisx

f (x) dx = ^d

2

ds² Z ∞

−∞

e^−2πisxf (x) dx = ^d

2

ds²(F f )(s) , and we see that F f is twice differentiable. (And its second derivative is continuous.)

Clearly we can proceed like this, and as a somewhat imprecise headline we might then announce:

• Faster decay of f (x) at infinity leads to a greater smoothness of the Fourier transform.

Now let’s take this in another direction, with an assumption on the smoothness of the signal. Suppose f (x) is differentiable, that its derivative is integrable, and that f (x) → 0 as x → ±∞. I’ve thrown in all the assumptions I need to justify the following calculation:

F f (s) = Z ∞

−∞

e^−2πisxf (x) dx

=

f (x)e^−2πisx

−2πis

^x=∞

x=−∞

− Z ∞

−∞

e^−2πisx

−2πisf⁰(x) dx

(integration by parts with u = f (x), dv = e^−2πisxdx)

= ¹

2πis Z ∞

−∞

e^−2πisxf⁰(x) dx (using f (x) → 0 as x → ±∞)

= ¹

2πis(F f⁰)(s) We then have

|F f (s)| = 1

2πs|(F f⁰)(s)| ≤ 1

2πskf⁰k₁.

The last inequality follows from the result: “The Fourier transform is bounded by the L¹-norm of the function”. This says that F f (s) tends to 0 at ±∞ like 1/s. (Remember that kf⁰k₁ is some fixed number here, independent of s.) Earlier we commented (without proof ) that if f is integrable then F f tends to 0 at ±∞, but here with the stronger assumptions we get a stronger conclusion, that F f tends to zero at a certain rate.

(8)

Let’s go one step further in this direction. Suppose f (x) is twice differentiable, that its first and second derivatives are integrable, and that f (x) and f⁰(x) tend to 0 as x → ±∞. The same argument gives

F f (s) = Z ∞

−∞

e^−2πisxf (x) dx

= ¹

2πis Z ∞

−∞

e^−2πisxf⁰(x) dx (picking up on where we were before)

= ¹

2πis

f⁰(x)e^−2πisx

−2πis

x=∞

x=−∞

− Z ∞

−∞

e^−2πisx

−2πisf⁰⁰(x) dx

(integration by parts with u = f⁰(x), dv = e^−2πisxdx)

= ¹

(2πis)² Z ∞

−∞

e^−2πisxf⁰⁰(x) dx (using f⁰(x) → 0 as x → ±∞)

= ¹

(2πis)²(F f⁰⁰)(s) Thus

|F f (s)| ≤ 1

|2πs|²kf⁰⁰k₁ and we see that F f (s) tends to 0 like 1/s².

The headline:

• Greater smoothness of f (x), plus integrability, leads to faster decay of the Fourier transform at ∞.

Remark on the derivative formula for the Fourier transform The astute reader will have noticed that in the course of our work we rederived the derivative formula

F f⁰(s) = 2πisF f (s)

which we’ve used before, but here we needed the assumption that f (x) → 0, which we didn’t mention before. What’s up? With the technology we have available to us now, the derivation we gave, above, is the correct derivation. That is, it proceeds via integration by parts, and requires some assumption like f (x) → 0 as x → ±∞. In homework (and in the solutions to the homework) you may have given a derivation that used duality. That only works if Fourier inversion is known to hold. This was OK when the rigor police were off duty, but not now, on this day of reckoning. Later, when we develop a generalization of the Fourier transform, we’ll see that the derivative formula again holds without what seem now to be extraneous conditions.

We could go on as we did above, comparing the consequences of higher differentiability, integrability, smoothness and decay, bouncing back and forth between the function and its Fourier transform. The great insight in making use of these observations is that the simplest and most useful way to coordinate all these phenomena is to allow for arbitrarily great smoothness and arbitrarily fast decay. We would like to have both phenomena in play. Here is the crucial definition.

Rapidly decreasing functions

A function f (x) is said to be rapidly decreasing at ±∞ if 1. It is infinitely differentiable.

(9)

2. For all positive integers m and n, x^m ^d

n

dxⁿf (x)

→ 0 as x → ±∞

In words, any positive power of x times any order derivative of f tends to zero at infinity.

Note that m and n are independent in this definition. That is, we insist that, say, the 5th power of x times the 17th derivative of f (x) tends to zero, and that the 100th power of x times the first derivative of f (x) tends to zero; and whatever you want.

Are there any such functions? Any infinitely differentiable function that is identically zero outside some finite interval is one example, and I’ll even write down a formula for one of these later. Another example is f (x) = e^−x². You may already be familiar with the phrase “the exponential grows faster than any power of x”, and likewise with the phrase “e^−x² decays faster than any power of x.”⁶ In fact, any derivative of e^−x² decays faster than any power of x as x → ±∞, as you can check with L’Hopital’s rule, for example.

We can express this exactly as in the definition:

x^m ^d

n

dxⁿe^−x²

→ 0 as x → ±∞

There are plenty of other rapidly decreasing functions. We also remark that if f (x) is rapidly decreasing then it is in L¹(R) and in L²(R); check that yourself.

An alternative definition An equivalent definition for a function to be rapidly decreasing is to assume that for any positive integers m and n there is a constant C_mn such that

x^m ^d

n

dxⁿf (x)

≤ C_mn as x → ±∞ .

In words, the mth power of x times the nth derivative of f remains bounded for all m and n, though the constant will depend on which m and n we take. This condition implies the “tends to zero” condition, above. Convince yourself of that, the key being that m and n are arbitrary and independent. We’ll use this second, equivalent condition often, and it’s a matter of taste which one takes as a definition.

Let us now praise famous men It was the French mathematician Laurent Schwartz who singled out this relatively simple condition to use in the service of the Fourier transform. In his honor the set of rapidly decreasing functions is usually denoted by S (a script S) and called the Schwartz class of functions.

Let’s start to see why this was such a good idea.

1. The Fourier transform of a rapidly decreasing function is rapidly decreasing. Let f (x) be a function in S. We want to show that F f (s) is also in S. The condition involves derivatives of F f , so what comes in is the derivative formula for the Fourier transform and the version of that formula for higher derivatives. As we’ve already seen

2πisF f (s) =

F ^d

dxf

(s) .

6I used e^−x² as an example instead of e^−x(for which the statement is true as x → ∞) because I wanted to include x → ±∞, and I used e^−x² instead of e^−|x| because I wanted the example to be smooth. e^−|x| has a corner at x = 0.

(10)

As we also noted,

d

dsF f (s) = F (−2πixf (x)) .

Because f (x) is rapidly decreasing, the higher order versions of these formulas are valid; the derivations require either integration by parts or differentiating under the integral sign, both of which are justified.

That is,

(2πis)ⁿF f (s) =

F ^d

n

dxⁿf

(s) dⁿ

dsⁿF f (s) = F (−2πix)ⁿf (x) .

(We follow the convention that the zeroth order derivative leaves the function alone.)

Combining these formulas one can show, inductively, that for all nonnegative integers m and n, F

_dⁿ

dxⁿ (−2πix)^mf (x)

= (2πis)ⁿ^d

m

ds^mF f (s) . Note how m and n enter in the two sides of the equation.

We use this last identity together with the estimate for the Fourier transform in terms of the L¹-norm of the function. Namely,

|s|ⁿ

d^m ds^mF f (s)

= (2π)^m−n F

_dⁿ

dxⁿ(x^mf (x))≤ (2π)^m−n

dⁿ

dxⁿ(x^mf (x))

1

The L¹-norm on the right hand side is finite because f is rapidly decreasing. Since the right hand side depends on m and n, we have shown that there is a constant Cmn with

sⁿ

d^m

ds^mF f (s)

≤ C_mn.

This implies that F f is rapidly decreasing. Done.

2. Fourier inversion works on S. We first establish the inversion theorem for a timelimited function in S. Suppose that f (t) is smooth and for some T is identically zero for |t| ≥ T /2, rather than just tending to zero at ±∞. In this case we can periodize f (t) to get a smooth, periodic function of period T . Expand the periodic function as a converging Fourier series. Then for −T /2 ≤ t ≤ T /2,

f (t) = X∞ n=−∞

c_ne^2πint/T

= X∞ n=−∞

e^2πint/T

1 T

Z T /2

−T /2

e^−2πinx/Tf (x) dx

= X∞ n=−∞

e^2πint/T

1 T

Z ∞

−∞

e^−2πinx/Tf (x) dx

= X∞ n=−∞

e^2πint/TF f

_n T

₁ T .

Our intention is to let T get larger and larger. What we see is a Riemann sum for the integral Z ∞

−∞

e^2πistF f (s) ds = F⁻¹F f (t) ,

(11)

and the Riemann sum converges to the integral because of the smoothness of f . (I have not slipped anything past you here, but I don’t want to quote the precise results that make all this legitimate.) Thus

f (t) = F⁻¹F f (t) ,

and the Fourier inversion theorem is established for timelimited functions in S.

When f is not timelimited we use “windowing”. The idea is to cut f (t) off smoothly.⁷ The interesting thing in the present context — for theoretical rather than practical use — is to make the window so smooth that the “windowed” function is still in S. Some of the details are in Section 4.20, but here’s the setup.

We take a function c(t) that is identically 1 for −1/2 ≤ t ≤ 1/2, that goes smoothly (infinitely differentiable) down to zero as t goes from 1/2 to 1 and from −1/2 to −1, and is then identically 0 for t ≥ 1 and t ≤ −1.

This is a smoothed version of the rectangle function Π(t); instead of cutting off sharply at ±1/2 we bring the function smoothly down to zero. You can certainly imagine drawing such a function:

In Section 4.20 I’ll give an explicit formula for this.

Now scale c(t) to c_n(t) = c(t/n). That is, c_n(t) is 1 for t between −n/2 and n/2, goes smoothly down to 0 between ±n/2 and ±n and is then identically 0 for |t| ≥ n. Next, the function f_n(t) = c_n(t) · f (t) is a timelimited function in S. Hence the earlier reasoning shows that the Fourier inversion theorem holds for fn and F fn. The window eventually moves past every t, that is, fn(t) → f (t) as n → ∞. Some estimates based on the properties of the cut-off function — which I won’t go through — show that the Fourier inversion theorem also holds in the limit.

3. Parseval holds in S. We’ll actually derive a more general result than Parseval’s identity, namely:

If f (x) and g(x) are complex valued functions in S then Z ∞

−∞

f (x)g(x) dx = Z ∞

−∞

F f (s)F g(s) ds .

As a special case, if we take f = g then f (x)f (x) = |f (x)|² and the identity becomes Z ∞

−∞

|f (x)|²dx = Z ∞

−∞

|F f (s)|²ds .

7The design of windows, like the design of filters, is as much an art as a science.

(12)

To get the first result we’ll use the fact that we can recover g from its Fourier transform via the inversion theorem. That is,

g(x) = Z ∞

−∞

F g(s)e^2πisxds .

The complex conjugate of the integral is the integral of the complex conjugate, hence g(x) =

Z ∞

−∞

F g(s)e^−2πisxds .

The derivation is straightforward, using one of our favorite tricks of interchanging the order of integration:

Z ∞

−∞

f (x)g(x) dx = Z ∞

−∞

f (x)

Z ∞

−∞

F g(s)e^−2πisxds

dx

= Z ∞

−∞

Z ∞

−∞

f (x)F g(s)e^−2πisxds ds

= Z ∞

−∞

Z ∞

−∞

f (x)F g(s)e^−2πisxdx dx

= Z ∞

−∞

Z ∞

−∞

f (x)e^−2πisxdx

F g(s) ds

= Z ∞

−∞

F f (s)F g(s) ds

All of this works perfectly — the initial appeal to the Fourier inversion theorem, switching the order of integration — if f and g are rapidly decreasing.

4.3 A Very Little on Integrals

This section on integrals, more of a mid-chapter appendix, is not a short course on integration. It’s here to provide a little, but only a little, background explanation for some of the statements made earlier. The star of this section is you. Here you go.

Integrals are first defined for positive functions In the general approach to integration (of real- valued functions) you first set out to define the integral for nonnegative functions. Why? Because however general a theory you’re constructing, an integral is going to be some kind of limit of sums and you’ll want to know when that kind of limit exists. If you work with positive (or at least nonnegative) functions then the issues for limits will be about how big the function gets, or about how big the sets are where the function is or isn’t big. You feel better able to analyze accumulations than to control conspiratorial cancellations.

So you first define your integral for functions f (x) with f (x) ≥ 0. This works fine. However, you know full well that your definition won’t be too useful if you can’t extend it to functions which are both positive and negative. Here’s how you do this. For any function f (x) you let f⁺(x) be its positive part :

f⁺(x) = max{f (x), 0}

Likewise, you let

f⁻(x) = max{−f (x), 0}

be its negative part.⁸ (Tricky: the “negative part” as you’ve defined it is actually a positive function; taking

−f (x) flips over the places where f (x) is negative to be positive. You like that kind of thing.) Then f = f⁺− f⁻

8A different use of the notation f⁻than we had before, but we’ll never use this one again.

(13)

while

|f | = f⁺+ f⁻.

You now say that f is integrable if both f⁺ and f⁻are integrable — a condition which makes sense since f⁺ and f⁻ are both nonnegative functions — and by definition you set

Z

f =^Z f⁺−^Zf⁻.

(For complex-valued functions you apply this to the real and imaginary parts.) You follow this approach for integrating functions on a finite interval or on the whole real line. Moreover, according to this definition

|f | is integrable if f is because then

Z

|f | =

Z

(f⁺+ f⁻) =

Z

f⁺+

Z

f⁻

and f⁺ and f⁻ are each integrable.⁹ It’s also true, conversely, that if |f | is integrable then so is f . You show this by observing that

f⁺ ≤ |f | and f⁻≤ |f | and this implies that both f⁺ and f⁻ are integrable.

• You now know where the implication Z ∞

−∞

|f (t)| dt < ∞ ⇒ F f exists comes from.

You get an easy inequality out of this development:

^Zf

≤^Z |f | .

In words, “the absolute value of the integral is at most the integral of the absolute value”. And sure that’s true, because R

f may involve cancellations of the positive and negative values of f while R

|f | won’t have such cancellations. You don’t shirk from a more formal argument:

^Z f

=

^Z(f⁺− f⁻) =

^Zf⁺−

Z

f⁻

≤ ^Zf⁺

+^Zf⁻

=^Z f⁺+^Z f⁻ (since f⁺ and f⁻ are both nonnegative)

=^Z (f⁺+ f⁻) =^Z |f | .

• You now know where the second inequality in

|F f (s) − F f (s⁰)| =

Z ∞

−∞

e^−2πist− e^−2πis⁰^t

f (t) dt

≤

Z ∞

−∞

e^−2πist− e^−2πis⁰^t

|f(t)| dt comes from; this came up in showing that F f is continuous.

9Some authors reserve the term “summable” for the case whenR

|f | < ∞, i.e., for when bothR

f⁺andR

f⁻are finite. They still define R

f =R f⁺−R

f⁻but they allow the possibility that one of the integrals on the right may be ∞, in which case R f is ∞ or −∞ and they don’t refer to f as summable.

(14)

sinc stinks What about the sinc function and trying to make sense of the following equation?

F sinc(s) = Z ∞

−∞

e^−2πistsinc t dt

According to the definitions you just gave, the sinc function is not integrable. In fact, the argument I gave

to show that Z ∞

−∞

| sinc t| dt = ∞ (the second argument) can be easily modified to show that both

Z ∞

−∞

sinc⁺t dt = ∞ and Z ∞

−∞

sinc⁻t dt = ∞ . So if you wanted to write Z ∞

−∞

sinc t dt = Z ∞

−∞

sinc⁺t dt − Z ∞

−∞

sinc⁻t dt

you’d be faced with ∞ − ∞. Bad. The integral of sinc (and also the integral of F sinc) has to be understood as a limit,

lim

a→−∞, b→∞

Z b a

e^−2πist sinc t dt

Evaluating this is a classic of contour integration and the residue theorem, which you may have seen in a class on “Functions of a Complex Variable”. I won’t do it. You won’t do it. Ahlfors did it: See Complex Analysis, third edition, by Lars Ahlfors, pp. 156–159.

You can relax now. I’ll take it from here.

Subtlety vs. cleverness. For the full mathematical theory of Fourier series and Fourier integrals one needs the Lebesgue integral, as I’ve mentioned before. Lebesgue’s approach to defining the integral allows a wider class of functions to be integrated and it allows one to establish very general, very helpful results of the type “the limit of the integral is the integral of the limit”, as in

fn→ f ⇒ lim

n→∞

Z ∞

−∞

fn(t) dt = Z ∞

−∞

n→∞lim fn(t) dt = Z ∞

−∞

f (t) dt .

You probably do things like this routinely, and so do mathematicians, but it takes them a year or so of graduate school before they feel good about it. More on this in just a moment.

The definition of the Lebesgue integral is based on a study of the size, or measure, of the sets where a function is big or small, and you don’t wind up writing down the same kinds of “Riemann sums” you used in calculus to define the integral. Interestingly, the constructions and definitions of measure theory, as Lebesgue and others developed it, were later used in reworking the foundations of probability. But now take note of the following quote of the mathematician T. K¨orner from his book Fourier Analysis :

Mathematicians find it easier to understand and enjoy ideas which are clever rather than subtle.

Measure theory is subtle rather than clever and so requires hard work to master.

More work than we’re willing to do, and need to do. But here’s one more thing:

(15)

The general result allowing one to pull a limit inside the integral sign is the Lebesgue dominated convergence theorem. It says: If fnis a sequence of integrable functions that converges pointwise to a function f except possibly on a set of measure 0, and if there is an integrable function g with |f_n| ≤ g for all n (the

“dominated” hypothesis) then f is integrable and

n→∞lim Z ∞

−∞

f_n(t) dt = Z ∞

−∞

f (t) dt .

There’s a variant of this that applies when the integrand depends on a parameter. It goes: If f (x, t₀) = lim_t→t₀f (x, t) for all x, and if there is an integrable function g such that |f (x, t)| ≤ g(x) for all x then

t→tlim₀ Z ∞

−∞

f (x, t) dt = Z ∞

−∞

f (x, t₀) dx .

The situation described in this result comes up in many applications, and it’s good to know that it holds in great generality.

Integrals are not always just like sums. Here’s one way they’re different, and it’s important to realize this for our work on Fourier transforms. For sums we have the result that

X

n

anconverges implies an→ 0 .

We used this fact together with Parseval’s identity for Fourier series to conclude that the Fourier coefficients tend to zero. You also all know the classic counterexample to the converse of the statement:

1

n → 0 but X∞ n=1

1

n diverges . For integrals, however, it is possible that Z ∞

−∞

f (x) dx

exists but f (x) does not tend to zero at ±∞. Make f (x) nonzero (make it equal to 1, if you want) on thinner and thinner intervals going out toward infinity. Then f (x) doesn’t decay to zero, but you can make the intervals thin enough so that the integral converges. I’ll leave an exact construction up to you.

How about this example? P∞

n=1nΠ n³(x − n)

How shall we test for convergence of integrals? The answer depends on the context, and different choices are possible. Since the convergence of Fourier integrals is at stake, the important thing to measure is the size of a function “at infinity” — does it decay fast enough for the integrals to converge.¹⁰ Any kind of measuring requires a “standard”, and for judging the decay (or growth) of a function the easiest and most common standard is to measure using powers of x. The “ruler” based on powers of x reads:

Z ∞ a

dx x^p is

(infinite if 0 < p ≤ 1 finite if p > 1

10For now, at least, let’s assume that the only cause for concern in convergence of integrals is decay of the function at infinity, not some singularity at a finite point.

(16)

You can check this by direct integration. We take the lower limit a to be positive, but a particular value is irrelevant since the convergence or divergence of the integral depends on the decay near infinity. You can formulate the analogous statements for integrals −∞ to −a.

To measure the decay of a function f (x) at ±∞ we look at

x→±∞lim |x|^p|f (x)|

If, for some p > 1, this is bounded then f (x) is integrable. If there is a 0 < p ≤ 1 for which the limit is unbounded, i.e., equals ∞, then f (x) is not integrable.

Standards are good only if they’re easy to use, and powers of x, together with the conditions on their integrals are easy to use. You can use these tests to show that every rapidly decreasing function is in both L¹(R) and L²(R).

4.4 Distributions

Our program to extend the applicability of the Fourier transform has several steps. We took the first step last time:

We defined S, the collection of rapidly decreasing functions. In words, these are the infinitely differentiable functions whose derivatives decrease faster than any power of x at infinity. These functions have the properties that:

1. If f (x) is in S then F f (s) is in S.

2. If f (x) is in S then F⁻¹F f = f .

We’ll sometimes refer to the functions in S simply as Schwartz functions.

The next step is to use the functions in S to define a broad class of “generalized functions”, or as we’ll say, tempered distributions T , which will include S as well as some nonintegrable functions, sine and cosine, δ functions, and much more, and for which the two properties, above, continue to hold.

I want to give a straightforward, no frills treatment of how to do this. There are two possible approaches.

1. Tempered distributions defined as limits of functions in S.

This is the “classical” (vacuum tube) way of defining generalized functions, and it pretty much applies only to the delta function, and constructions based on the delta function. This is an important enough example, however, to make the approach worth our while.

The other approach, the one we’ll develop more fully, is:

2. Tempered distributions defined via operating on functions in S.

We also use a different terminology and say that tempered distributions are paired with functions in S, returning a number for the pairing of a distribution with a Schwartz function.

In both cases it’s fair to say that “distributions are what distributions do”, in that fundamentally they are defined by how they act on “genuine” functions, those in S. In the case of “distributions as limits”, the nature of the action will be clear but the kind of objects that result from the limiting process is sort of hazy. (That’s the problem with this approach.) In the case of “distributions as operators” the nature of

(17)

the objects is clear, but just how they are supposed to act is sort of hazy. (And that’s the problem with this approach, but it’s less of a problem.) You may find the second approach conceptually more difficult, but removing the “take a limit” aspect from center stage really does result in a clearer and computationally easier setup. The second approach is actually present in the first, but there it’s cluttered up by framing the discussion in terms of approximations and limits. Take your pick which point of view you prefer, but it’s best if you’re comfortable with both.

4.4.1 Distributions as limits

The first approach is to view generalized functions as some kind of limit of ordinary functions. Here we’ll work with functions in S, but other functions can be used; see Appendix 3.

Let’s consider the delta function as a typical and important example. You probably met δ as a mathe- matical, idealized impulse. You learned: “It’s concentrated at the point zero, actually infinite at the point zero, and it vanishes elsewhere.” You probably learned to represent this graphically as a spike:

Don’t worry, I don’t want to disabuse you of these ideas, or of the picture. I just want to refine things somewhat.

As an approximation to δ through functions in S one might consider the family of Gaussians g(x, t) = 1

√

2πte^−x²^/2t, t > 0 . We remarked earlier that the Gaussians are rapidly decreasing functions.

Here’s a plot of some functions in the family for t = 2, 1, 0.5, 0.1, 0.05 and 0.01. The smaller the value of t, the more sharply peaked the function is at 0 (it’s more and more “concentrated” there), while away from 0 the functions are hugging the axis more and more closely. These are the properties we’re trying to capture, approximately.

(18)

As an idealization of a function concentrated at x = 0, δ should then be a limit δ(x) = lim

t→0g(x, t) .

This limit doesn’t make sense as a pointwise statement — it doesn’t define a function — but it begins to make sense when one shows how the limit works operationally when “paired” with other functions. The pairing, by definition, is by integration, and to anticipate the second approach to distributions, we’ll write this as

hg(x, t), ϕi = Z ∞

−∞

g(x, t)ϕ(x) dx .

(Don’t think of this as an inner product. The angle bracket notation is just a good notation for pairing.¹¹) The fundamental result — what it means for the g(x, t) to be “concentrated at 0” as t → 0 — is

t→0lim Z ∞

−∞

g(x, t)ϕ(x) dx = ϕ(0) .

Now, whereas you’ll have a hard time making sense of lim_t→0g(x, t) alone, there’s no trouble making sense of the limit of the integral, and, in fact, no trouble proving the statement just above. Do observe, however, that the statement: “The limit of the integral is the integral of the limit.” is thus not true in this case.

The limit of the integral makes sense but not the integral of the limit.¹² We can and will define the distribution δ by this result, and write

hδ, ϕi = lim

t→0

Z ∞

−∞

g(x, t)ϕ(x) dx = ϕ(0) .

I won’t go through the argument for this here, but see Section 4.6.1 for other ways of getting to δ and for a general result along these lines.

11Like one pairs “bra” vectors with “ket” vectors in quantum mechanics to make a hA|Bi — a bracket.

12If you read the Appendix on integrals from the preceding lecture, where the validity of such a result is stated as a variant of the Lebesgue Dominated Convergence theorem, what goes wrong here is that g(t, x)ϕ(x) will not be dominated by an integrable function since g(0, t) is tending to ∞.

(19)

The Gaussians tend to ∞ at x = 0 as t → 0, and that’s why writing simply δ(x) = limt→0g(x, t) doesn’t make sense. One would have to say (and people do say, though I have a hard time with it) that the delta function has these properties:

• δ(x) = 0 for x 6= 0

• δ(0) = ∞

• Z ∞

−∞

δ(x) dx = 1

These reflect the corresponding (genuine) properties of the g(x, t):

• lim

t→0g(x, t) = 0 if x 6= 0

• lim

t→0g(0, t) = ∞

• Z ∞

−∞

g(x, t) dx = 1

The third property is our old friend, the second is clear from the formula, and you can begin to believe the first from the shape of the graphs. The first property is the flip side of “concentrated at a point”, namely to be zero away from the point where the function is concentrated.

The limiting process also works with convolution:

t→0lim(g ∗ ϕ)(a) = lim

t→0

Z ∞

−∞

g(a − x, t)ϕ(x) dx = ϕ(a) . This is written

(δ ∗ ϕ)(a) = ϕ(a)

as shorthand for the limiting process that got us there, and the notation is then pushed so far as to write the delta function itself under the integral, as in

(δ ∗ ϕ)(a) = Z ∞

−∞

δ(a − x)ϕ(x) dx = ϕ(a) . Let me declare now that I am not going to try to talk you out of writing this.

The equation

(δ ∗ ϕ)(a) = ϕ(a)

completes the analogy: “δ is to 1 as convolution is to multiplication”.

Why concentrate? Why would one want a function concentrated at a point in the first place? We’ll certainly have plenty of applications of delta functions very shortly, and you’ve probably already seen a variety through classes on systems and signals in EE or on quantum mechanics in physics. Indeed, it would be wrong to hide the origin of the delta function. Heaviside used δ (without the notation) in his applications and reworking of Maxwell’s theory of electromagnetism. In EE applications, starting with Heaviside, you find the “unit impulse” used, as an idealization, in studying how systems respond to sharp, sudden inputs. We’ll come back to this latter interpretation when we talk about linear systems. The symbolism, and the three defining properties of δ listed above, were introduced later by P. Dirac in the

(20)

service of calculations in quantum mechanics. Because of Dirac’s work, δ is often referred to as the “Dirac δ function”.

For the present, let’s take a look back at the heat equation and how the delta function comes in there.

We’re perfectly set up for that.

We have seen the family of Gaussians

g(x, t) = 1

√

2πte^−x²^/2t, t > 0

before. They arose in solving the heat equation for an “infinite rod”. Recall that the temperature u(x, t) at a point x and time t satisfies the partial differential equation

ut= ¹₂uxx.

When an infinite rod (the real line, in other words) is given an initial temperature f (x) then u(x, t) is given by the convolution with g(x, t):

u(x, t) = g(x, t) ∗ f (x) = 1

√

2πte^−x²^/2t∗ f (x) = Z ∞

−∞

√1

2πte^−(x−y)²^/2tf (y) dy .

One thing I didn’t say at the time, knowing that this day would come, is how one recovers the initial temperature f (x) from this formula. The initial temperature is at t = 0, so this evidently requires that we take the limit:

lim

t→0⁺

u(x, t) = lim

t→0⁺

g(x, t) ∗ f (x) = (δ ∗ f )(x) = f (x) .

Out pops the initial temperature. Perfect. (Well, there have to be some assumptions on f (x), but that’s another story.)

4.4.2 Distributions as linear functionals

Farewell to vacuum tubes The approach to distributions we’ve just followed, illustrated by defining δ, can be very helpful in particular cases and where there’s a natural desire to have everything look as

“classical” as possible. Still and all, I maintain that adopting this approach wholesale to defining and working with distributions is using technology from a bygone era. I haven’t yet defined the collection of tempered distributions T which is supposed to be the answer to all our Fourier prayers, and I don’t know how to do it from a purely “distributions as limits” point of view. It’s time to transistorize.

In the preceding discussion we did wind up by considering a distribution, at least δ, in terms of how it acts when paired with a Schwartz function. We wrote

hδ, ϕi = ϕ(0) as shorthand for the result of taking the limit of the pairing

hg(x, t), ϕ(x)i = Z ∞

−∞

g(x, t)ϕ(x) dx .

The second approach to defining distributions takes this idea — “the outcome” of a distribution acting on a test function — as a starting point rather than as a conclusion. The question to ask is what aspects of “outcome”, as present in the approach via limits, do we try to capture and incorporate in the basic definition?

(21)

Mathematical functions defined on R, “live at points”, to use the hip phrase. That is, you plug in a particular point from R, the domain of the function, and you get a particular value in the range, as for instance in the simple case when the function is given by an algebraic expression and you plug values into the expression. Generalized functions — distributions — do not live at points. The domain of a generalized function is not a set of numbers. The value of a generalized function is not determined by plugging in a number from R and determining a corresponding number. Rather, a particular value of a distribution is determined by how it “operates” on a particular test function. The domain of a generalized function is a set of test functions. As they say in Computer Science, helpfully:

• You pass a distribution a test function and it returns a number.

That’s not so outlandish. There are all sorts of operations you’ve run across that take a signal as an argument and return a number. The terminology of “distributions” and “test functions”, from the dawn of the subject, is even supposed to be some kind of desperate appeal to physical reality to make this reworking of the earlier approaches more appealing and less “abstract”. See label 4.5 for a weak attempt at this, but I can only keep up that physical pretense for so long.

Having come this far, but still looking backward a little, recall that we asked which properties of a pairing

— integration, as we wrote it in a particular case in the first approach — do we want to subsume in the general definition. To get all we need, we need remarkably little. Here’s the definition:

Tempered distributions A tempered distribution T is a complex-valued continuous linear functional on the collection S of Schwartz functions (called test functions ). We denote the collection of all tempered distributions by T .

That’s the complete definition, but we can unpack it a bit:

1. If ϕ is in S then T (ϕ) is a complex number. (You pass a distribution a Schwartz function, it returns a complex number.)

• We often write this action of T on ϕ as hT , ϕi and say that T is paired with ϕ. (This terminology and notation are conventions, not commandments.)

2. A tempered distribution is linear operating on test functions:

T (α₁ϕ₁+ α₂ϕ₂) = α₁T (ϕ₁) + α₂T (ϕ₂) or, in the other notation,

hT , α1ϕ1+ α2ϕ2i = α1hT , ϕ1i + α2hT , ϕ2i, for test functions ϕ1, ϕ2 and complex numbers α1, α2.

3. A tempered distribution is continuous: if ϕn is a sequence of test functions in S with ϕn → ϕ in S then

T (ϕ_n) → T (ϕ) , also written hT , ϕ_ni → hT , ϕi .

Also note that two tempered distributions T1 and T2 are equal if they agree on all test functions:

T₁= T₂ if T₁(ϕ) = T₂(ϕ) (hT₁, ϕi = hT₂, ϕi) for all ϕ in S . This isn’t part of the definition, it’s just useful to write down.