**Distributions and Their Fourier** **Transforms**

**4.1** **The Day of Reckoning**

We’ve been playing a little fast and loose with the Fourier transform — applying Fourier inversion, appeal- ing to duality, and all that. “Fast and loose” is an understatement if ever there was one, but it’s also true that we haven’t done anything “wrong”. All of our formulas and all of our applications have been correct, if not fully justified. Nevertheless, we have to come to terms with some fundamental questions. It will take us some time, but in the end we will have settled on a very wide class of signals with these properties:

*• The allowed signals include δ’s, unit steps, ramps, sines, cosines, and all other standard signals that*
the world’s economy depends on.

• The Fourier transform and its inverse are defined for all of these signals.

• Fourier inversion works.

These are the three most important features of the development to come, but we’ll also reestablish some of our specific results and as an added benefit we’ll even finish off differential calculus!

**4.1.1** **A too simple criterion and an example**

It’s not hard to write down an assumption on a function that guarantees the existence of its Fourier transform and even implies a little more than existence.

• If Z ∞

−∞

*|f (t)| dt < ∞ then F f and F*^{−1}*f exist and are continuous.*

Existence follows from

*|F f (s)| =*

Z ∞

−∞

*e*^{−2πist}*f (t) dt*

≤ Z ∞

−∞

*|e*^{−2πist}*| |f (t)| dt =*
Z ∞

−∞

*|f (t)| dt < ∞ .*

Here we’ve used that the magnitude of the integral is less that the integral of the magnitude.^{1} There’s
actually something to say here, but while it’s not complicated, I’d just as soon defer this and other
comments on “general facts on integrals” to Section 4.3; read it if only lightly — it provides some additional
orientation.

Continuity is the little extra information we get beyond existence. Continuity follows as follows. For any
*s and s*^{0} we have

*|F f (s) − F f (s*^{0})| =

Z ∞

−∞

*e*^{−2πist}*f (t) dt −*
Z ∞

−∞

*e*^{−2πis}^{0}^{t}*f (t) dt*

=

Z ∞

−∞

*(e*^{−2πist}*− e*^{−2πis}^{0}^{t}*)f (t) dt*
≤

Z ∞

−∞

*|e*^{−2πist}*− e*^{−2πis}^{0}^{t}*| |f (t)| dt*
As a consequence ofR∞

−∞*|f (t)| dt < ∞ we can take the limit as s*^{0}*→ s inside the integral. If we do that*
*then |e*^{−2πist}*− e*^{−2πis}^{0}* ^{t}*| → 0, that is,

*|F f (s) − F f (s*^{0})| → 0 as *s*^{0}*→ s*

*which says that F f (s) is continuous. The same argument works to show that F*^{−1}*f is continuous.*^{2}

We haven’t said anything here about Fourier inversion — no such statement appears in the criterion. Let’s look right away at an example.

The very first example we computed, and still an important one, is the Fourier transform of Π. We found directly that

*F Π(s) =*
Z ∞

−∞

*e*^{−2πist}*Π(t) dt =*
Z *1/2*

*−1/2*

*e*^{−2πist}*dt = sinc s .*

*No problem there, no problem whatsoever. The criterion even applies; Π is in L*^{1}**(R) since**
Z ∞

−∞

*|Π(t)| dt =*
Z *1/2*

*−1/2*

*1 dt = 1 .*

*Furthermore, the transform F Π(s) = sinc s is continuous. That’s worth remarking on: Although the signal*
*jumps (Π has a discontinuity) the Fourier transform does not, just as guaranteed by the preceding result*

— make this part of your intuition on the Fourier transform vis `a vis the signal.

Appealing to the Fourier inversion theorem and what we called duality, we then said
*F sinc(t) =*

Z ∞

−∞

*e*^{−2πist}*sinc t dt = Π(s) .*

*Here we have a problem. The sinc function does not satisfy the integrability criterion. It is my sad duty*

to inform you that Z ∞

−∞

*| sinc t| dt = ∞ .*

*I’ll give you two ways of seeing the failure of | sinc t| to be integrable. First, if sinc did satisfy the criterion*
R∞

−∞*| sinc t| dt < ∞ then its Fourier transform would be continuous. But its Fourier transform, which has*

1Magnitude, not absolute value, because the integral is complex number.

2So another general fact we’ve used here is that we can take the limit inside the integral. Save yourself for other things and let some of these “general facts” ride without insisting on complete justifications — they’re everywhere once you let the rigor police back on the beat.

*to come out to be Π, is not continuous. Or, if you don’t like that, here’s a direct argument. We can*
*find infinitely many intervals where | sin πt| ≥ 1/2; this happens when t is between 1/6 and 5/6, and that*
*repeats for infinitely many intervals, for example on I** _{n}* = [

^{1}

_{6}

*+ 2n,*

^{5}

_{6}

*+ 2n], n = 0, 1, 2, . . . , because sin πt*

*is periodic of period 2. The I*

*n*

*all have length 2/3. On I*

*n*

*we have |t| ≤*

^{5}

_{6}

*+ 2n, so*

1

*|t|* ≥ 1
*5/6 + 2n*

and Z

*I*_{n}

*| sin πt|*

*π|t|* *dt ≥* 1
*2π*

1
*5/6 + 2n*

Z

*I*_{n}

*dt =* 1
*2π*

2 3

1
*5/6 + 2n.*

Then Z ∞

−∞

*| sin πt|*

*π|t|* *dt ≥*X

*n*

Z

*I*n

*| sin πt|*

*π|t|* *dt =* 1
*3π*

X∞
*n=1*

1

*5/6 + 2n* *= ∞ .*

*It’s true that | sinc t| = | sin πt/πt| tends to 0 as t → ±∞ — the 1/t factor makes that happen — but not*

*“fast enough” to make the integral of | sinc t| converge.*

This is the most basic example in the theory! It’s not clear that the integral defining the Fourier transform of sinc exists, at least it doesn’t follow from the criterion. Doesn’t this bother you? Isn’t it a little embarrassing that multibillion dollar industries seem to depend on integrals that don’t converge?

*In fact, there isn’t so much of a problem with either Π or sinc. It is true that*
Z ∞

−∞

*e*^{−2πist}*sinc s ds =*

(1 *|t| <* ^{1}_{2}
0 *|t| >* ^{1}_{2}

However showing this — evaluating the improper integral that defines the Fourier transform — requires
special arguments and techniques. The sinc function oscillates, as do the real and imaginary parts of the
*complex exponential, and integrating e*^{−2πist}*sinc s involves enough cancellation for the limit*

lim

*a→−∞*

*b→∞*

Z *b*
*a*

*e*^{−2πist}*sinc s ds*

to exist.

Thus Fourier inversion, and duality, can be pushed through in this case. At least almost. You’ll notice
*that I didn’t say anything about the points t = ±1/2, where there’s a jump in Π in the time domain. In*
those cases the improper integral does not exist, but with some additional interpretations one might be
able to convince a sympathetic friend that

Z ∞

−∞

*e**−2πi(±1/2)s**sinc s ds =* ^{1}_{2}

in the appropriate sense (invoking “principle value integrals” — more on this in a later lecture). At best
*this is post hoc and needs some fast talking.*^{3}

The truth is that cancellations that occur in the sinc integral or in its Fourier transform are a very subtle and dicey thing. Such risky encounters are to be avoided. We’d like a more robust, trustworthy theory.

3*One might also then argue that defining Π(±1/2) = 1/2 is the best choice. I don’t want to get into it.*

**The news so far** *Here’s a quick summary of the situation. The Fourier transform of f (t) is defined*

when Z ∞

−∞

*|f (t)| dt < ∞ .*

**We allow f to be complex valued in this definition. The collection of all functions on R satisfying this***condition is denoted by L*^{1}**(R), the superscript 1 indicating that we integrate |f (t)| to the first power.**^{4}
*The L*^{1}*-norm of F is defined by*

*kf k*_{1}=
Z ∞

−∞

*|f (t)| dt .*

*Many of the examples we worked with are L*^{1}-functions — the rect function, the triangle function, the
exponential decay (one or two-sided), Gaussians — so our computations of the Fourier transforms in those
*cases were perfectly justifiable (and correct). Note that L*^{1}-functions can have discontinuities, as in the
rect function.

*The criterion says that if f ∈ L*^{1}**(R) then F f exists. We can also say**

*|F f (s)| =*

Z ∞

−∞

*e*^{−2πist}*f (t) dt*
≤

Z ∞

−∞

*|f (t)| dt = kf k*1*.*
That is:

*• The magnitude of the Fourier transform is bounded by the L*^{1}-norm of the function.

This is a handy estimate to be able to write down — we’ll use it shortly. However, to issue a warning:

*Fourier transforms of L*^{1}**(R) functions may themselves not be in L**^{1}, like for the sinc function, so we
don’t know without further work what more can be done, if anything.

*The conclusion is that L*^{1}-integrability of a signal is just too simple a criterion on which to build a really
helpful theory. This is a serious issue for us to understand. Its resolution will greatly extend the usefulness
of the methods we have come to rely on.

*There are other problems, too. Take, for example, the signal f (t) = cos 2πt. As it stands now, this signal*
*does not even have a Fourier transform — does not have a spectrum! — for the integral*

Z ∞

−∞

*e*^{−2πist}*cos 2πt dt*
does not converge, no way, no how. This is no good.

*Before we bury L*^{1}**(R) as too restrictive for our needs, here’s one more good thing about it. There’s actually**
*a stronger consequence for F f than just continuity.*

• If Z ∞

−∞

*|f (t)| dt < ∞ then F f (s) → 0 as s → ±∞.*

4And the letter “L” indicating that it’s really the Lebesgue integral that should be employed.

*This is called the Riemann-Lebesgue lemma and it’s more difficult to prove than showing simply that F f*
*is continuous. I’ll comment on it later; see Section 4.19. One might view the result as saying that F f (s) is*
*at least trying to be integrable. It’s continuous and it tends to zero as s → ±∞. Unfortunately, the fact*
*that F f (s) → 0 does not imply that it’s integrable (think of sinc, again).*^{5} If we knew something, or could
*insist on something about the rate at which a signal or its transform tends to zero at ±∞ then perhaps*
we could push on further.

**4.1.2** **The path, the way**

To repeat, we want our theory to encompass the following three points:

*• The allowed signals include δ’s, unit steps, ramps, sines, cosines, and all other standard signals that*
the world’s economy depends on.

• The Fourier transform and its inverse are defined for all of these signals.

• Fourier inversion works.

*Fiddling around with L*^{1}**(R) or substitutes, putting extra conditions on jumps — all have been used. The**
path to success lies elsewhere. It is well marked and firmly established, but it involves a break with the
classical point of view. The outline of how all this is settled goes like this:

1. We single out a collection of functions S for which convergence of the Fourier integrals is assured,
*for which a function and its Fourier transform are both in S, and for which Fourier inversion works.*

Furthermore, Parseval’s identity holds:

Z ∞

−∞

*|f (x)|*^{2}*dx =*
Z ∞

−∞

*|F f (s)|*^{2}*ds .*

*This much is classical; new ideas with new intentions, yes, but not new objects. Perhaps surprisingly*
it’s not so hard to find a suitable collection S, at least if one knows what one is looking for. But
what comes next is definitely not “classical”. It had been first anticipated and used effectively in an
early form by O. Heaviside, developed, somewhat, and dismissed, mostly, soon after by less talented
people, then cultivated by and often associated with the work of P. Dirac, and finally refined by
L. Schwartz.

*2. S forms a class of test functions which, in turn, serve to define a larger class of generalized functions or*
*distributions, called, for this class of test functions the tempered distributions, T . Precisely because*
S was chosen to be the ideal Fourier friendly space of classical signals, the tempered distributions
are likewise well suited for Fourier methods. The collection of tempered distributions includes, for
*example, L*^{1} *and L*^{2}-functions (which can be wildly discontinuous), the sinc function, and complex
exponentials (hence periodic functions). But it includes much more, like the delta functions and
related objects.

3. The Fourier transform and its inverse will be defined so as to operate on these tempered distributions, and they operate to produce distributions of the same type. Thus the inverse Fourier transform can be applied, and the Fourier inversion theorem holds in this setting.

4. In the case when a tempered distributions “comes from a function” — in a way we’ll make precise

— the Fourier transform reduces to the usual definition as an integral, when the integral makes sense. However, tempered distributions are more general than functions, so we really will have done something new and we won’t have lost anything in the process.

5*For that matter, a function in L*^{1}**(R) need not tend to zero at ±∞; that’s also discussed in Appendix 1.**

Our goal is to hit the relatively few main ideas in the outline above, suppressing the considerable mass of details. In practical terms this will enable us to introduce delta functions and the like as tools for computation, and to feel a greater measure of confidence in the range of applicability of the formulas.

We’re taking this path because it works, it’s very interesting, and it’s easy to compute with. I especially want you to believe the last point.

We’ll touch on some other approaches to defining distributions and generalized Fourier transforms, but as far as I’m concerned they are the equivalent of vacuum tube technology. You can do distributions in other ways, and some people really love building things with vacuum tubes, but wouldn’t you rather learn something a little more up to date?

**4.2** **The Right Functions for Fourier Transforms: Rapidly Decreasing** **Functions**

Mathematics progresses more by making intelligent definitions than by proving theorems. The hardest work is often in formulating the fundamental concepts in the right way, a way that will then make the deductions from those definitions (relatively) easy and natural. This can take awhile to sort out, and a subject might be reworked several times as it matures; when new discoveries are made and one sees where things end up, there’s a tendency to go back and change the starting point so that the trip becomes easier. Mathematicians may be more self-conscious about this process, but there are certainly examples in engineering where close attention to the basic definitions has shaped a field — think of Shannon’s work on Information Theory, for a particularly striking example.

*Nevertheless, engineers, in particular, often find this tiresome, wanting to do something and not “just talk*
about it”: “Devices don’t have hypotheses”, as one of my colleagues put it. One can also have too much
of a good thing — too many trips back to the starting point to rewrite the rules can make it hard to
follow the game, especially if one has already played by the earlier rules. I’m sympathetic to both of these
criticisms, and for our present work on the Fourier transform I’ll try to steer a course that makes the
definitions reasonable and lets us make steady forward progress.

**4.2.1** **Smoothness and decay**

*To ask “how fast” F f (s) might tend to zero, depending on what additional assumptions we might make*
*about the function f (x) beyond integrability, will lead to our defining “rapidly decreasing functions”,*
*and this is the key. Integrability is too weak a condition on the signal f , but it does imply that F f (s) is*
*continuous and tends to 0 at ±∞. What we’re going to do is study the relationship between the smoothness*
of a function — not just continuity, but how many times it can be differentiated — and the rate at which
its Fourier transform decays at infinity.

*We’ll always assume that f (x) is absolutely integrable, and so has a Fourier transform. Let’s suppose,*
more stringently, that

*• xf (x) is integrable, i.e.,*
Z ∞

−∞

*|xf (x)| dx < ∞ .*

*Then xf (x) has a Fourier transform, and so does −2πixf (x) and its Fourier transform is*
*F (−2πixf (x)) =*

Z ∞

−∞

*(−2πix)e*^{−2πisx}*f (x) dx*

= Z ∞

−∞

*d*

*ds**e*^{−2πisx}

*f (x) dx =* ^{d}*ds*

Z ∞

−∞

*e*^{−2πisx}*f (x) dx*

*(switching d/ds and the integral is justified by the integrability of |xf (x)|)*

= ^{d}

*ds**(F f )(s)*

*This says that the Fourier transform F f (s) is differentiable and that its derivative is F (−2πixf (x)). When*
*f (x) is merely integrable we know that F f (s) is merely continuous, but with the extra assumption on the*
*integrability of xf (x) we conclude that F f (s) is actually differentiable. (And its derivative is continuous.*

Why?)

*For one more go-round in this direction, what if x*^{2}*f (x) is integrable? Then, by the same argument,*
*F ((−2πix)*^{2}*f (x)) =*

Z ∞

−∞

*(−2πix)*^{2}*e*^{−2πisx}*f (x) dx*

= Z ∞

−∞

*d*^{2}

*ds*^{2}*e*^{−2πisx}

*f (x) dx =* ^{d}

2

*ds*^{2}
Z ∞

−∞

*e*^{−2πisx}*f (x) dx =* ^{d}

2

*ds*^{2}*(F f )(s) ,*
*and we see that F f is twice differentiable. (And its second derivative is continuous.)*

Clearly we can proceed like this, and as a somewhat imprecise headline we might then announce:

*• Faster decay of f (x) at infinity leads to a greater smoothness of the Fourier transform.*

Now let’s take this in another direction, with an assumption on the smoothness of the signal. Suppose
*f (x) is differentiable, that its derivative is integrable, and that f (x) → 0 as x → ±∞. I’ve thrown in all*
the assumptions I need to justify the following calculation:

*F f (s) =*
Z ∞

−∞

*e*^{−2πisx}*f (x) dx*

=

*f (x)e*^{−2πisx}

*−2πis*

^{x=∞}

*x=−∞*

− Z ∞

−∞

*e*^{−2πisx}

*−2πisf*^{0}*(x) dx*

*(integration by parts with u = f (x), dv = e*^{−2πisx}*dx)*

= ^{1}

*2πis*
Z ∞

−∞

*e*^{−2πisx}*f*^{0}*(x) dx* *(using f (x) → 0 as x → ±∞)*

= ^{1}

*2πis**(F f*^{0}*)(s)*
We then have

*|F f (s)| =* 1

*2πs|(F f*^{0}*)(s)| ≤* 1

*2πskf*^{0}k_{1}*.*

*The last inequality follows from the result: “The Fourier transform is bounded by the L*^{1}-norm of the
*function”. This says that F f (s) tends to 0 at ±∞ like 1/s. (Remember that kf*^{0}k_{1} is some fixed number
*here, independent of s.) Earlier we commented (without proof ) that if f is integrable then F f tends to 0*
*at ±∞, but here with the stronger assumptions we get a stronger conclusion, that F f tends to zero at a*
*certain rate.*

*Let’s go one step further in this direction. Suppose f (x) is twice differentiable, that its first and second*
*derivatives are integrable, and that f (x) and f*^{0}*(x) tend to 0 as x → ±∞. The same argument gives*

*F f (s) =*
Z ∞

−∞

*e*^{−2πisx}*f (x) dx*

= ^{1}

*2πis*
Z ∞

−∞

*e*^{−2πisx}*f*^{0}*(x) dx* (picking up on where we were before)

= ^{1}

*2πis*

*f*^{0}*(x)e*^{−2πisx}

*−2πis*

*x=∞*

*x=−∞*

− Z ∞

−∞

*e*^{−2πisx}

*−2πisf*^{00}*(x) dx*

*(integration by parts with u = f*^{0}*(x), dv = e*^{−2πisx}*dx)*

= ^{1}

*(2πis)*^{2}
Z ∞

−∞

*e*^{−2πisx}*f*^{00}*(x) dx* *(using f*^{0}*(x) → 0 as x → ±∞)*

= ^{1}

*(2πis)*^{2}*(F f*^{00}*)(s)*
Thus

*|F f (s)| ≤* 1

*|2πs|*^{2}*kf*^{00}k_{1}
*and we see that F f (s) tends to 0 like 1/s*^{2}.

The headline:

*• Greater smoothness of f (x), plus integrability, leads to faster decay of the Fourier transform at ∞.*

**Remark on the derivative formula for the Fourier transform** The astute reader will have noticed
that in the course of our work we rederived the derivative formula

*F f*^{0}*(s) = 2πisF f (s)*

*which we’ve used before, but here we needed the assumption that f (x) → 0, which we didn’t mention*
before. What’s up? With the technology we have available to us now, the derivation we gave, above, is
the correct derivation. That is, it proceeds via integration by parts, and requires some assumption like
*f (x) → 0 as x → ±∞. In homework (and in the solutions to the homework) you may have given a*
derivation that used duality. That only works if Fourier inversion is known to hold. This was OK when the
rigor police were off duty, but not now, on this day of reckoning. Later, when we develop a generalization
of the Fourier transform, we’ll see that the derivative formula again holds without what seem now to be
extraneous conditions.

We could go on as we did above, comparing the consequences of higher differentiability, integrability,
smoothness and decay, bouncing back and forth between the function and its Fourier transform. The great
insight in making use of these observations is that the simplest and most useful way to coordinate all these
*phenomena is to allow for arbitrarily great smoothness and arbitrarily fast decay. We would like to have*
both phenomena in play. Here is the crucial definition.

**Rapidly decreasing functions**

*A function f (x) is said to be rapidly decreasing at ±∞ if*
1. It is infinitely differentiable.

*2. For all positive integers m and n,*
*x*^{m}^{d}

n

*dx*^{n}*f (x)*

→ 0 as *x → ±∞*

*In words, any positive power of x times any order derivative of f tends to zero at infinity.*

*Note that m and n are independent in this definition. That is, we insist that, say, the 5th power of x times*
*the 17th derivative of f (x) tends to zero, and that the 100th power of x times the first derivative of f (x)*
tends to zero; and whatever you want.

Are there any such functions? Any infinitely differentiable function that is identically zero outside some
finite interval is one example, and I’ll even write down a formula for one of these later. Another example is
*f (x) = e*^{−x}^{2}. You may already be familiar with the phrase “the exponential grows faster than any power
*of x”, and likewise with the phrase “e*^{−x}^{2} *decays faster than any power of x.”*^{6} In fact, any derivative of
*e*^{−x}^{2} *decays faster than any power of x as x → ±∞, as you can check with L’Hopital’s rule, for example.*

We can express this exactly as in the definition:

*x*^{m}^{d}

n

*dx*^{n}*e*^{−x}^{2}

→ 0 as *x → ±∞*

*There are plenty of other rapidly decreasing functions. We also remark that if f (x) is rapidly decreasing*
*then it is in L*^{1}**(R) and in L**^{2}**(R); check that yourself.**

**An alternative definition** An equivalent definition for a function to be rapidly decreasing is to assume
*that for any positive integers m and n there is a constant C** _{mn}* such that

*x*^{m}^{d}

n

*dx*^{n}*f (x)*

*≤ C** _{mn}* as

*x → ±∞ .*

*In words, the mth power of x times the nth derivative of f remains bounded for all m and n, though the*
*constant will depend on which m and n we take. This condition implies the “tends to zero” condition,*
*above. Convince yourself of that, the key being that m and n are arbitrary and independent. We’ll use*
this second, equivalent condition often, and it’s a matter of taste which one takes as a definition.

**Let us now praise famous men** It was the French mathematician Laurent Schwartz who singled out
this relatively simple condition to use in the service of the Fourier transform. In his honor the set of rapidly
*decreasing functions is usually denoted by S (a script S) and called the Schwartz class of functions.*

Let’s start to see why this was such a good idea.

**1. The Fourier transform of a rapidly decreasing function is rapidly decreasing.** *Let f (x) be*
*a function in S. We want to show that F f (s) is also in S. The condition involves derivatives of F f , so*
what comes in is the derivative formula for the Fourier transform and the version of that formula for higher
derivatives. As we’ve already seen

*2πisF f (s) =*

F ^{d}

*dx**f*

*(s) .*

6*I used e*^{−x}^{2} *as an example instead of e*^{−x}*(for which the statement is true as x → ∞) because I wanted to include x → ±∞,*
*and I used e*^{−x}^{2} *instead of e*^{−|x|} *because I wanted the example to be smooth. e*^{−|x|} *has a corner at x = 0.*

As we also noted,

*d*

*ds**F f (s) = F (−2πixf (x)) .*

*Because f (x) is rapidly decreasing, the higher order versions of these formulas are valid; the derivations*
require either integration by parts or differentiating under the integral sign, both of which are justified.

That is,

*(2πis)*^{n}*F f (s) =*

F ^{d}

n

*dx*^{n}*f*

*(s)*
*d*^{n}

*ds*^{n}*F f (s) = F (−2πix)*^{n}*f (x)*
*.*

(We follow the convention that the zeroth order derivative leaves the function alone.)

*Combining these formulas one can show, inductively, that for all nonnegative integers m and n,*
F

_{d}^{n}

*dx*^{n} *(−2πix)*^{m}*f (x)*

*= (2πis)*^{n}^{d}

m

*ds*^{m}*F f (s) .*
*Note how m and n enter in the two sides of the equation.*

*We use this last identity together with the estimate for the Fourier transform in terms of the L*^{1}-norm of
the function. Namely,

*|s|*^{n}

*d*^{m}
*ds*^{m}*F f (s)*

* = (2π)** ^{m−n}*
F

_{d}^{n}

*dx*^{n}*(x*^{m}*f (x))**≤ (2π)*^{m−n}

*d*^{n}

*dx*^{n}*(x*^{m}*f (x))*

1

*The L*^{1}*-norm on the right hand side is finite because f is rapidly decreasing. Since the right hand side*
*depends on m and n, we have shown that there is a constant C**mn* with

*s*^{n}

*d*^{m}

*ds*^{m}*F f (s)*

*≤ C*_{mn}*.*

*This implies that F f is rapidly decreasing. Done.*

**2. Fourier inversion works on S.** *We first establish the inversion theorem for a timelimited function*
*in S. Suppose that f (t) is smooth and for some T is identically zero for |t| ≥ T /2, rather than just tending*
*to zero at ±∞. In this case we can periodize f (t) to get a smooth, periodic function of period T . Expand*
*the periodic function as a converging Fourier series. Then for −T /2 ≤ t ≤ T /2,*

*f (t) =*
X∞
*n=−∞*

*c*_{n}*e*^{2πint/T}

=
X∞
*n=−∞*

*e*^{2πint/T}

1
*T*

Z *T /2*

*−T /2*

*e*^{−2πinx/T}*f (x) dx*

=
X∞
*n=−∞*

*e*^{2πint/T}

1
*T*

Z ∞

−∞

*e*^{−2πinx/T}*f (x) dx*

=
X∞
*n=−∞*

*e*^{2πint/T}*F f*

_{n}*T*

_{1}
*T* *.*

*Our intention is to let T get larger and larger. What we see is a Riemann sum for the integral*
Z ∞

−∞

*e*^{2πist}*F f (s) ds = F*^{−1}*F f (t) ,*

*and the Riemann sum converges to the integral because of the smoothness of f . (I have not slipped*
anything past you here, but I don’t want to quote the precise results that make all this legitimate.) Thus

*f (t) = F*^{−1}*F f (t) ,*

and the Fourier inversion theorem is established for timelimited functions in S.

*When f is not timelimited we use “windowing”. The idea is to cut f (t) off smoothly.*^{7} The interesting
thing in the present context — for theoretical rather than practical use — is to make the window so smooth
that the “windowed” function is still in S. Some of the details are in Section 4.20, but here’s the setup.

*We take a function c(t) that is identically 1 for −1/2 ≤ t ≤ 1/2, that goes smoothly (infinitely differentiable)*
*down to zero as t goes from 1/2 to 1 and from −1/2 to −1, and is then identically 0 for t ≥ 1 and t ≤ −1.*

*This is a smoothed version of the rectangle function Π(t); instead of cutting off sharply at ±1/2 we bring*
the function smoothly down to zero. You can certainly imagine drawing such a function:

In Section 4.20 I’ll give an explicit formula for this.

*Now scale c(t) to c*_{n}*(t) = c(t/n). That is, c*_{n}*(t) is 1 for t between −n/2 and n/2, goes smoothly down*
*to 0 between ±n/2 and ±n and is then identically 0 for |t| ≥ n. Next, the function f*_{n}*(t) = c*_{n}*(t) · f (t) is*
a timelimited function in S. Hence the earlier reasoning shows that the Fourier inversion theorem holds
*for f**n* *and F f**n**. The window eventually moves past every t, that is, f**n**(t) → f (t) as n → ∞. Some*
estimates based on the properties of the cut-off function — which I won’t go through — show that the
Fourier inversion theorem also holds in the limit.

**3. Parseval holds in S.** We’ll actually derive a more general result than Parseval’s identity, namely:

*If f (x) and g(x) are complex valued functions in S then*
Z ∞

−∞

*f (x)g(x) dx =*
Z ∞

−∞

*F f (s)F g(s) ds .*

*As a special case, if we take f = g then f (x)f (x) = |f (x)|*^{2} and the identity becomes
Z ∞

−∞

*|f (x)|*^{2}*dx =*
Z ∞

−∞

*|F f (s)|*^{2}*ds .*

7The design of windows, like the design of filters, is as much an art as a science.

*To get the first result we’ll use the fact that we can recover g from its Fourier transform via the inversion*
theorem. That is,

*g(x) =*
Z ∞

−∞

*F g(s)e*^{2πisx}*ds .*

The complex conjugate of the integral is the integral of the complex conjugate, hence
*g(x) =*

Z ∞

−∞

*F g(s)e*^{−2πisx}*ds .*

The derivation is straightforward, using one of our favorite tricks of interchanging the order of integration:

Z ∞

−∞

*f (x)g(x) dx =*
Z ∞

−∞

*f (x)*

Z ∞

−∞

*F g(s)e*^{−2πisx}*ds*

*dx*

= Z ∞

−∞

Z ∞

−∞

*f (x)F g(s)e*^{−2πisx}*ds ds*

= Z ∞

−∞

Z ∞

−∞

*f (x)F g(s)e*^{−2πisx}*dx dx*

= Z ∞

−∞

Z ∞

−∞

*f (x)e*^{−2πisx}*dx*

*F g(s) ds*

= Z ∞

−∞

*F f (s)F g(s) ds*

All of this works perfectly — the initial appeal to the Fourier inversion theorem, switching the order of
*integration — if f and g are rapidly decreasing.*

**4.3** **A Very Little on Integrals**

This section on integrals, more of a mid-chapter appendix, is not a short course on integration. It’s here to provide a little, but only a little, background explanation for some of the statements made earlier. The star of this section is you. Here you go.

**Integrals are first defined for positive functions** In the general approach to integration (of real-
*valued functions) you first set out to define the integral for nonnegative functions. Why? Because however*
general a theory you’re constructing, an integral is going to be some kind of limit of sums and you’ll want to
know when that kind of limit exists. If you work with positive (or at least nonnegative) functions then the
issues for limits will be about how big the function gets, or about how big the sets are where the function
is or isn’t big. You feel better able to analyze accumulations than to control conspiratorial cancellations.

*So you first define your integral for functions f (x) with f (x) ≥ 0. This works fine. However, you know*
full well that your definition won’t be too useful if you can’t extend it to functions which are both positive
*and negative. Here’s how you do this. For any function f (x) you let f*^{+}*(x) be its positive part :*

*f*^{+}*(x) = max{f (x), 0}*

Likewise, you let

*f*^{−}*(x) = max{−f (x), 0}*

*be its negative part.*^{8} (Tricky: the “negative part” as you’ve defined it is actually a positive function; taking

*−f (x) flips over the places where f (x) is negative to be positive. You like that kind of thing.) Then*
*f = f*^{+}*− f*^{−}

8*A different use of the notation f*^{−}than we had before, but we’ll never use this one again.

while

*|f | = f*^{+}*+ f*^{−}*.*

*You now say that f is integrable if both f*^{+} *and f*^{−}are integrable — a condition which makes sense since
*f*^{+} *and f*^{−} *are both nonnegative functions — and by definition you set*

Z

*f =*^{Z} *f*^{+}−^{Z}*f*^{−}*.*

(For complex-valued functions you apply this to the real and imaginary parts.) You follow this approach for integrating functions on a finite interval or on the whole real line. Moreover, according to this definition

*|f | is integrable if f is because then*

Z

*|f | =*

Z

*(f*^{+}*+ f*^{−}) =

Z

*f*^{+}+

Z

*f*^{−}

*and f*^{+} *and f*^{−} are each integrable.^{9} *It’s also true, conversely, that if |f | is integrable then so is f . You*
show this by observing that

*f*^{+} *≤ |f |* and *f*^{−}*≤ |f |*
*and this implies that both f*^{+} *and f*^{−} are integrable.

• You now know where the implication Z ∞

−∞

*|f (t)| dt < ∞ ⇒ F f exists comes from.*

You get an easy inequality out of this development:

^{Z}*f*

≤^{Z} *|f | .*

In words, “the absolute value of the integral is at most the integral of the absolute value”. And sure that’s true, because R

*f may involve cancellations of the positive and negative values of f while* R

*|f | won’t have*
such cancellations. You don’t shirk from a more formal argument:

^{Z} *f*

=

^{Z}*(f*^{+}*− f*^{−})
=

^{Z}*f*^{+}−

Z

*f*^{−}

≤
^{Z}*f*^{+}

+^{Z}*f*^{−}

=^{Z} *f*^{+}+^{Z} *f*^{−} *(since f*^{+} *and f*^{−} are both nonnegative)

=^{Z} *(f*^{+}*+ f*^{−}) =^{Z} *|f | .*

• You now know where the second inequality in

*|F f (s) − F f (s*^{0})| =

Z ∞

−∞

*e*^{−2πist}*− e*^{−2πis}^{0}^{t}

*f (t) dt*

≤

Z ∞

−∞

*e*^{−2πist}*− e*^{−2πis}^{0}^{t}

* |f(t)| dt*
*comes from; this came up in showing that F f is continuous.*

9Some authors reserve the term “summable” for the case whenR

*|f | < ∞, i.e., for when both*R

*f*^{+}andR

*f*^{−}*are finite. They*
still define R

*f =*R
*f*^{+}−R

*f*^{−}but they allow the possibility that one of the integrals on the right may be ∞, in which case
R *f is ∞ or −∞ and they don’t refer to f as summable.*

**sinc stinks** What about the sinc function and trying to make sense of the following equation?

*F sinc(s) =*
Z ∞

−∞

*e*^{−2πist}*sinc t dt*

According to the definitions you just gave, the sinc function is not integrable. In fact, the argument I gave

to show that Z ∞

−∞

*| sinc t| dt = ∞*
(the second argument) can be easily modified to show that both

Z ∞

−∞

sinc^{+}*t dt = ∞* and
Z ∞

−∞

sinc^{−}*t dt = ∞ .*
So if you wanted to write Z ∞

−∞

*sinc t dt =*
Z ∞

−∞

sinc^{+}*t dt −*
Z ∞

−∞

sinc^{−}*t dt*

*you’d be faced with ∞ − ∞. Bad. The integral of sinc (and also the integral of F sinc) has to be understood*
as a limit,

lim

*a→−∞, b→∞*

Z *b*
*a*

*e*^{−2πist}*sinc t dt*

Evaluating this is a classic of contour integration and the residue theorem, which you may have seen in a
*class on “Functions of a Complex Variable”. I won’t do it. You won’t do it. Ahlfors did it: See Complex*
*Analysis, third edition, by Lars Ahlfors, pp. 156–159.*

You can relax now. I’ll take it from here.

**Subtlety vs. cleverness.** For the full mathematical theory of Fourier series and Fourier integrals one
needs the Lebesgue integral, as I’ve mentioned before. Lebesgue’s approach to defining the integral allows
a wider class of functions to be integrated and it allows one to establish very general, very helpful results
of the type “the limit of the integral is the integral of the limit”, as in

*f**n**→ f ⇒* lim

*n→∞*

Z ∞

−∞

*f**n**(t) dt =*
Z ∞

−∞

*n→∞*lim *f**n**(t) dt =*
Z ∞

−∞

*f (t) dt .*

You probably do things like this routinely, and so do mathematicians, but it takes them a year or so of graduate school before they feel good about it. More on this in just a moment.

*The definition of the Lebesgue integral is based on a study of the size, or measure, of the sets where a*
function is big or small, and you don’t wind up writing down the same kinds of “Riemann sums” you
*used in calculus to define the integral. Interestingly, the constructions and definitions of measure theory,*
as Lebesgue and others developed it, were later used in reworking the foundations of probability. But now
take note of the following quote of the mathematician T. K¨*orner from his book Fourier Analysis :*

Mathematicians find it easier to understand and enjoy ideas which are clever rather than subtle.

Measure theory is subtle rather than clever and so requires hard work to master.

More work than we’re willing to do, and need to do. But here’s one more thing:

*The general result allowing one to pull a limit inside the integral sign is the Lebesgue dominated convergence*
*theorem. It says: If f**n**is a sequence of integrable functions that converges pointwise to a function f except*
*possibly on a set of measure 0, and if there is an integrable function g with |f*_{n}*| ≤ g for all n (the*

*“dominated” hypothesis) then f is integrable and*

*n→∞*lim
Z ∞

−∞

*f*_{n}*(t) dt =*
Z ∞

−∞

*f (t) dt .*

*There’s a variant of this that applies when the integrand depends on a parameter. It goes: If f (x, t*_{0}) =
lim_{t→t}_{0}*f (x, t) for all x, and if there is an integrable function g such that |f (x, t)| ≤ g(x) for all x then*

*t→t*lim_{0}
Z ∞

−∞

*f (x, t) dt =*
Z ∞

−∞

*f (x, t*_{0}*) dx .*

The situation described in this result comes up in many applications, and it’s good to know that it holds in great generality.

**Integrals are not always just like sums.** Here’s one way they’re different, and it’s important to realize
this for our work on Fourier transforms. For sums we have the result that

X

*n*

*a**n**converges implies a**n**→ 0 .*

We used this fact together with Parseval’s identity for Fourier series to conclude that the Fourier coefficients tend to zero. You also all know the classic counterexample to the converse of the statement:

1

*n* → 0 but
X∞
*n=1*

1

*n* *diverges .*
For integrals, however, it is possible that Z ∞

−∞

*f (x) dx*

*exists but f (x) does not tend to zero at ±∞. Make f (x) nonzero (make it equal to 1, if you want) on*
*thinner and thinner intervals going out toward infinity. Then f (x) doesn’t decay to zero, but you can make*
the intervals thin enough so that the integral converges. I’ll leave an exact construction up to you.

**How about this example?** P∞

*n=1**nΠ n*^{3}*(x − n)*

**How shall we test for convergence of integrals?** The answer depends on the context, and different
choices are possible. Since the convergence of Fourier integrals is at stake, the important thing to measure
is the size of a function “at infinity” — does it decay fast enough for the integrals to converge.^{10} Any kind
of measuring requires a “standard”, and for judging the decay (or growth) of a function the easiest and
*most common standard is to measure using powers of x. The “ruler” based on powers of x reads:*

Z ∞
*a*

*dx*
*x*^{p} is

(infinite *if 0 < p ≤ 1*
finite *if p > 1*

10For now, at least, let’s assume that the only cause for concern in convergence of integrals is decay of the function at infinity, not some singularity at a finite point.

*You can check this by direct integration. We take the lower limit a to be positive, but a particular value is*
irrelevant since the convergence or divergence of the integral depends on the decay near infinity. You can
*formulate the analogous statements for integrals −∞ to −a.*

*To measure the decay of a function f (x) at ±∞ we look at*

*x→±∞*lim *|x|*^{p}*|f (x)|*

*If, for some p > 1, this is bounded then f (x) is integrable. If there is a 0 < p ≤ 1 for which the limit is*
*unbounded, i.e., equals ∞, then f (x) is not integrable.*

*Standards are good only if they’re easy to use, and powers of x, together with the conditions on their*
integrals are easy to use. You can use these tests to show that every rapidly decreasing function is in both
*L*^{1}**(R) and L**^{2}**(R).**

**4.4** **Distributions**

Our program to extend the applicability of the Fourier transform has several steps. We took the first step last time:

We defined S, the collection of rapidly decreasing functions. In words, these are the infinitely
*differentiable functions whose derivatives decrease faster than any power of x at infinity. These*
functions have the properties that:

*1. If f (x) is in S then F f (s) is in S.*

*2. If f (x) is in S then F*^{−1}*F f = f .*

We’ll sometimes refer to the functions in S simply as Schwartz functions.

The next step is to use the functions in S to define a broad class of “generalized functions”, or as we’ll say,
*tempered distributions T , which will include S as well as some nonintegrable functions, sine and cosine, δ*
functions, and much more, and for which the two properties, above, continue to hold.

I want to give a straightforward, no frills treatment of how to do this. There are two possible approaches.

1. Tempered distributions defined as limits of functions in S.

This is the “classical” (vacuum tube) way of defining generalized functions, and it pretty much applies only to the delta function, and constructions based on the delta function. This is an important enough example, however, to make the approach worth our while.

The other approach, the one we’ll develop more fully, is:

2. Tempered distributions defined via operating on functions in S.

*We also use a different terminology and say that tempered distributions are paired with functions*
in S, returning a number for the pairing of a distribution with a Schwartz function.

In both cases it’s fair to say that “distributions are what distributions do”, in that fundamentally they are
*defined by how they act on “genuine” functions, those in S. In the case of “distributions as limits”, the*
nature of the action will be clear but the kind of objects that result from the limiting process is sort of
hazy. (That’s the problem with this approach.) In the case of “distributions as operators” the nature of

the objects is clear, but just how they are supposed to act is sort of hazy. (And that’s the problem with this approach, but it’s less of a problem.) You may find the second approach conceptually more difficult, but removing the “take a limit” aspect from center stage really does result in a clearer and computationally easier setup. The second approach is actually present in the first, but there it’s cluttered up by framing the discussion in terms of approximations and limits. Take your pick which point of view you prefer, but it’s best if you’re comfortable with both.

**4.4.1** **Distributions as limits**

The first approach is to view generalized functions as some kind of limit of ordinary functions. Here we’ll work with functions in S, but other functions can be used; see Appendix 3.

*Let’s consider the delta function as a typical and important example. You probably met δ as a mathe-*
matical, idealized impulse. You learned: “It’s concentrated at the point zero, actually infinite at the point
zero, and it vanishes elsewhere.” You probably learned to represent this graphically as a spike:

*Don’t worry, I don’t want to disabuse you of these ideas, or of the picture. I just want to refine things*
somewhat.

*As an approximation to δ through functions in S one might consider the family of Gaussians*
*g(x, t) =* 1

√

*2πte*^{−x}^{2}^{/2t}*,* *t > 0 .*
We remarked earlier that the Gaussians are rapidly decreasing functions.

*Here’s a plot of some functions in the family for t = 2, 1, 0.5, 0.1, 0.05 and 0.01. The smaller the value*
*of t, the more sharply peaked the function is at 0 (it’s more and more “concentrated” there), while away*
from 0 the functions are hugging the axis more and more closely. These are the properties we’re trying to
capture, approximately.

*As an idealization of a function concentrated at x = 0, δ should then be a limit*
*δ(x) = lim*

*t→0**g(x, t) .*

This limit doesn’t make sense as a pointwise statement — it doesn’t define a function — but it begins to
*make sense when one shows how the limit works operationally when “paired” with other functions. The*
pairing, by definition, is by integration, and to anticipate the second approach to distributions, we’ll write
this as

*hg(x, t), ϕi =*
Z ∞

−∞

*g(x, t)ϕ(x) dx .*

(Don’t think of this as an inner product. The angle bracket notation is just a good notation for pairing.^{11})
*The fundamental result — what it means for the g(x, t) to be “concentrated at 0” as t → 0 — is*

*t→0*lim
Z ∞

−∞

*g(x, t)ϕ(x) dx = ϕ(0) .*

Now, whereas you’ll have a hard time making sense of lim_{t→0}*g(x, t) alone, there’s no trouble making sense*
of the limit of the integral, and, in fact, no trouble proving the statement just above. Do observe, however,
that the statement: “The limit of the integral is the integral of the limit.” is thus not true in this case.

The limit of the integral makes sense but not the integral of the limit.^{12}
*We can and will define the distribution δ by this result, and write*

*hδ, ϕi = lim*

*t→0*

Z ∞

−∞

*g(x, t)ϕ(x) dx = ϕ(0) .*

*I won’t go through the argument for this here, but see Section 4.6.1 for other ways of getting to δ and for*
a general result along these lines.

11*Like one pairs “bra” vectors with “ket” vectors in quantum mechanics to make a hA|Bi — a bracket.*

12If you read the Appendix on integrals from the preceding lecture, where the validity of such a result is stated as a variant
*of the Lebesgue Dominated Convergence theorem, what goes wrong here is that g(t, x)ϕ(x) will not be dominated by an*
*integrable function since g(0, t) is tending to ∞.*

*The Gaussians tend to ∞ at x = 0 as t → 0, and that’s why writing simply δ(x) = lim**t→0**g(x, t) doesn’t*
*make sense. One would have to say (and people do say, though I have a hard time with it) that the delta*
function has these properties:

*• δ(x) = 0 for x 6= 0*

*• δ(0) = ∞*

• Z ∞

−∞

*δ(x) dx = 1*

*These reflect the corresponding (genuine) properties of the g(x, t):*

• lim

*t→0**g(x, t) = 0 if x 6= 0*

• lim

*t→0**g(0, t) = ∞*

• Z ∞

−∞

*g(x, t) dx = 1*

The third property is our old friend, the second is clear from the formula, and you can begin to believe the
first from the shape of the graphs. The first property is the flip side of “concentrated at a point”, namely
*to be zero away from the point where the function is concentrated.*

The limiting process also works with convolution:

*t→0*lim*(g ∗ ϕ)(a) = lim*

*t→0*

Z ∞

−∞

*g(a − x, t)ϕ(x) dx = ϕ(a) .*
This is written

*(δ ∗ ϕ)(a) = ϕ(a)*

as shorthand for the limiting process that got us there, and the notation is then pushed so far as to write the delta function itself under the integral, as in

*(δ ∗ ϕ)(a) =*
Z ∞

−∞

*δ(a − x)ϕ(x) dx = ϕ(a) .*
*Let me declare now that I am not going to try to talk you out of writing this.*

The equation

*(δ ∗ ϕ)(a) = ϕ(a)*

*completes the analogy: “δ is to 1 as convolution is to multiplication”.*

**Why concentrate?** Why would one want a function concentrated at a point in the first place? We’ll
*certainly have plenty of applications of delta functions very shortly, and you’ve probably already seen a*
variety through classes on systems and signals in EE or on quantum mechanics in physics. Indeed, it
*would be wrong to hide the origin of the delta function. Heaviside used δ (without the notation) in his*
applications and reworking of Maxwell’s theory of electromagnetism. In EE applications, starting with
Heaviside, you find the “unit impulse” used, as an idealization, in studying how systems respond to sharp,
sudden inputs. We’ll come back to this latter interpretation when we talk about linear systems. The
*symbolism, and the three defining properties of δ listed above, were introduced later by P. Dirac in the*

*service of calculations in quantum mechanics. Because of Dirac’s work, δ is often referred to as the “Dirac*
*δ function”.*

For the present, let’s take a look back at the heat equation and how the delta function comes in there.

We’re perfectly set up for that.

We have seen the family of Gaussians

*g(x, t) =* 1

√

*2πte*^{−x}^{2}^{/2t}*,* *t > 0*

*before. They arose in solving the heat equation for an “infinite rod”. Recall that the temperature u(x, t)*
*at a point x and time t satisfies the partial differential equation*

*u**t*= ^{1}_{2}*u**xx**.*

*When an infinite rod (the real line, in other words) is given an initial temperature f (x) then u(x, t) is given*
*by the convolution with g(x, t):*

*u(x, t) = g(x, t) ∗ f (x) =* 1

√

*2πte*^{−x}^{2}^{/2t}*∗ f (x) =*
Z ∞

−∞

√1

*2πte*^{−(x−y)}^{2}^{/2t}*f (y) dy .*

One thing I didn’t say at the time, knowing that this day would come, is how one recovers the initial
*temperature f (x) from this formula. The initial temperature is at t = 0, so this evidently requires that we*
take the limit:

lim

*t→0*^{+}

*u(x, t) = lim*

*t→0*^{+}

*g(x, t) ∗ f (x) = (δ ∗ f )(x) = f (x) .*

*Out pops the initial temperature. Perfect. (Well, there have to be some assumptions on f (x), but that’s*
another story.)

**4.4.2** **Distributions as linear functionals**

**Farewell to vacuum tubes** *The approach to distributions we’ve just followed, illustrated by defining δ,*
can be very helpful in particular cases and where there’s a natural desire to have everything look as

“classical” as possible. Still and all, I maintain that adopting this approach wholesale to defining and working with distributions is using technology from a bygone era. I haven’t yet defined the collection of tempered distributions T which is supposed to be the answer to all our Fourier prayers, and I don’t know how to do it from a purely “distributions as limits” point of view. It’s time to transistorize.

*In the preceding discussion we did wind up by considering a distribution, at least δ, in terms of how it acts*
when paired with a Schwartz function. We wrote

*hδ, ϕi = ϕ(0)*
as shorthand for the result of taking the limit of the pairing

*hg(x, t), ϕ(x)i =*
Z ∞

−∞

*g(x, t)ϕ(x) dx .*

The second approach to defining distributions takes this idea — “the outcome” of a distribution acting on a test function — as a starting point rather than as a conclusion. The question to ask is what aspects of “outcome”, as present in the approach via limits, do we try to capture and incorporate in the basic definition?

**Mathematical functions defined on R, “live at points”, to use the hip phrase. That is, you plug in a**
**particular point from R, the domain of the function, and you get a particular value in the range, as for**
instance in the simple case when the function is given by an algebraic expression and you plug values into
the expression. Generalized functions — distributions — do not live at points. The domain of a generalized
function is not a set of numbers. The value of a generalized function is not determined by plugging in a
**number from R and determining a corresponding number. Rather, a particular value of a distribution is**
*determined by how it “operates” on a particular test function. The domain of a generalized function is a*
set of test functions. As they say in Computer Science, helpfully:

• You pass a distribution a test function and it returns a number.

That’s not so outlandish. There are all sorts of operations you’ve run across that take a signal as an argument and return a number. The terminology of “distributions” and “test functions”, from the dawn of the subject, is even supposed to be some kind of desperate appeal to physical reality to make this reworking of the earlier approaches more appealing and less “abstract”. See label 4.5 for a weak attempt at this, but I can only keep up that physical pretense for so long.

Having come this far, but still looking backward a little, recall that we asked which properties of a pairing

— integration, as we wrote it in a particular case in the first approach — do we want to subsume in the general definition. To get all we need, we need remarkably little. Here’s the definition:

**Tempered distributions** *A tempered distribution T is a complex-valued continuous linear functional*
*on the collection S of Schwartz functions (called test functions ). We denote the collection of all tempered*
distributions by T .

That’s the complete definition, but we can unpack it a bit:

*1. If ϕ is in S then T (ϕ) is a complex number. (You pass a distribution a Schwartz function, it returns*
a complex number.)

*• We often write this action of T on ϕ as hT , ϕi and say that T is paired with ϕ. (This terminology*
and notation are conventions, not commandments.)

2. A tempered distribution is linear operating on test functions:

*T (α*_{1}*ϕ*_{1}*+ α*_{2}*ϕ*_{2}*) = α*_{1}*T (ϕ*_{1}*) + α*_{2}*T (ϕ*_{2})
or, in the other notation,

*hT , α*1*ϕ*1*+ α*2*ϕ*2*i = α*1*hT , ϕ*1*i + α*2*hT , ϕ*2*i,*
*for test functions ϕ*1*, ϕ*2 *and complex numbers α*1*, α*2.

*3. A tempered distribution is continuous: if ϕ**n* *is a sequence of test functions in S with ϕ**n* *→ ϕ in S*
then

*T (ϕ*_{n}*) → T (ϕ) ,* also written *hT , ϕ*_{n}*i → hT , ϕi .*

*Also note that two tempered distributions T*1 *and T*2 are equal if they agree on all test functions:

*T*_{1}*= T*_{2} if *T*_{1}*(ϕ) = T*_{2}*(ϕ)* *(hT*_{1}*, ϕi = hT*_{2}*, ϕi)* *for all ϕ in S .*
This isn’t part of the definition, it’s just useful to write down.