**Chapter 3**

**Convolution**

**3.1** **A ∗ is Born**

How can we use one signal to modify another? Some of the properties of the Fourier transform that we have already derived can be thought of as addressing this question. The easiest is the result on additivity, according to which

*F (f + g) = F f + F g.*

*Adding the signal g(t) to the signal f (t) adds the amounts F g(s) to the frequency components F f (s).*

*(Symmetrically, f (t) modifies g(t) in the same way.) The spectrum of f + g may be more or less “compli-*
*cated” than the spectrum of f and g alone, and it’s an elementary operation in both the time domain and*
the frequency domain that produces or eliminates the complications. It’s an operation that’s also easily
undone: See some frequencies you don’t like in the spectrum (a bad buzz)? Then try adding something in
or subtracting something out and see what the signal looks like.

We can view the question of using one signal to modify another in either the time domain or in the frequency
domain, sometimes with equal ease and sometimes with one point of view preferred. We just looked at
*sums, what about products? The trivial case is multiplying by a constant, as in F (af )(s) = aF f (s). The*
energies of the harmonics are all affected by the same amount, so, thinking of music for example, the signal
*sounds the same, only louder or softer. It’s much less obvious how to scale the harmonics separately. That*
is, as a question “in the frequency domain”, we ask:

*Is there some combination of the signals f (t) and g(t) so that in the frequency domain the*
Fourier transform is

*F g(s)F f (s) ?*

*In other words, in the time domain can we combine the signal g(t) with the signal f (t) so that*
*the frequency components F f (s) of f (t) are scaled by the frequency components F g(s) of g(t)?*

*(Once again this is symmetric — we could say that the frequency components F g(s) are scaled*
*by the frequency components F f (s).)*

Let’s check this out, and remember that the rigor police are off duty. No arrests will be made for unstated assumptions, divergent integrals, etc.

*The product of the Fourier transforms of f (t) and g(t) is*
*F g(s) F f (s) =*

Z ∞

−∞

*e*^{−2πist}*g(t) dt*
Z ∞

−∞

*e*^{−2πisx}*f (x) dx .*

We used different variables of integration in the two integrals because we’re going to combine the product
into an iterated integral.^{1}

Z ∞

−∞

*e*^{−2πist}*g(t) dt*
Z ∞

−∞

*e*^{−2πisx}*f (x) dx =*
Z ∞

−∞

Z ∞

−∞

*e*^{−2πist}*e*^{−2πisx}*g(t)f (x) dt dx*

= Z ∞

−∞

Z ∞

−∞

*e*^{−2πis(t+x)}*g(t)f (x) dt dx*

= Z ∞

−∞

Z ∞

−∞

*e*^{−2πis(t+x)}*g(t) dt*

*f (x) dx*

*Now make the change of variable u = t + x in the inner integral. Then t = u − x, du = dt, and the limits*
are the same. The result is

Z ∞

−∞

Z ∞

−∞

*e*^{−2πis(t+x)}*g(t) dt*

*f (x) dx =*
Z ∞

−∞

Z ∞

−∞

*e*^{−2πisu}*g(u − x) du*

*f (x) dx*
Next, switch the order of integration:

Z ∞

−∞

Z ∞

−∞

*e*^{−2πisu}*g(u − x) du*

*f (x) dx =*
Z ∞

−∞

Z ∞

−∞

*e*^{−2πisu}*g(u − x)f (x) du dx*

= Z ∞

−∞

Z ∞

−∞

*e*^{−2πisu}*g(u − x)f (x) dx du*

= Z ∞

−∞

*e*^{−2πisu}

Z ∞

−∞

*g(u − x)f (x) dx*

*du*

*Look at what’s happened here. The inner integral is a function of u. Let’s set it up on its own:*

*h(u) =*
Z ∞

−∞

*g(u − x)f (x) dx .*
*Then the outer integral produces the Fourier transform of h:*

Z ∞

−∞

*e*^{−2πisu}

Z ∞

−∞

*g(u − x)f (x) dx*

*du =*

Z ∞

−∞

*e*^{−2πisu}*h(u) du = F h(s)*

*Switching the variable name for h from h(u) to h(t) (solely for psychological comfort), we have discovered*
*that the signals f (t) and g(t) are combined into a signal*

*h(t) =*
Z ∞

−∞

*g(t − x)f (x) dx .*
In other words,

*F h(s) = F g(s)F f (s) .*
Remarkable.

We have solved our problem. The only thing to do is to realize what we’ve done and declare it to the world. We make the following definition:

1If you’re uneasy with this (never mind issues of convergence) you might convince yourself that it’s correct by working your way backwards from the double integral to the product of the two single integrals.

3.1 **A ∗ is Born** 97

• **Convolution defined** *The convolution of two functions g(t) and f (t) is the function*
*h(t) =*

Z ∞

−∞

*g(t − x)f (x) dx .*
We use the notation

*(g ∗ f )(t) =*
Z ∞

−∞

*g(t − x)f (x) dx .*

We can now proudly announce:

• **Convolution Theorem** *F (g ∗ f )(s) = F g(s)F f (s)*

• In words: Convolution in the time domain corresponds to multiplication in the frequency domain.^{2}
Recall that when we studied Fourier series, convolution came up in the form

*(g ∗ f )(t) =*
Z 1

0

*g(t − x)f (x) dx .*

*In that setting, for the integral to make sense, i.e., to be able to evaluate g(t − x) at points outside the*
*interval from 0 to 1, we had to assume that g was periodic. That’s not an issue in the present setting,*
*where we assume that f (t) and g(t) are defined for all t, so the factors in the integral*

Z ∞

−∞

*g(t − x)f (x) dx*

are defined everywhere. There may be questions to raise about whether the integral converges, and there are, but at least the setup makes sense.

**Remark on notation, again** *It’s common to see the people write the convolution as g(t) ∗ f (t), putting*
*the variable t in each of g and f . There are times when that’s OK, even sometimes preferable to introducing*
a lot of extra notation, but in general I think it’s a bad idea because it can lead to all sorts of abuses and
*possible mistakes. For example, what’s g(2t) ∗ f (t)? If you plugged in too casually you might write this as*

the integral Z ∞

−∞

*g(2t − x)f (x) dx .*
*That’s wrong. The right answer in convolving g(2t) and f (t) is*

Z ∞

−∞

*g(2(t − x))f (x) dx =*
Z ∞

−∞

*g(2t − 2x)f (x) dx .*
Make sure you understand why the first is wrong and second is right.^{3}

2What we’ve just gone through is the same sort of thing we did when we “found” the formula for the Fourier coefficients for a periodic function. Remember the principle: First suppose the problem is solved and see what the answer must be. The second step, assuming the first one works, is to turn that solution into a definition and then announce to the world that you have solved your original problem based on your brilliant definition. Mathematicians, in particular, are very good at presenting their results and writing their books in this way — do step one in secret and tell the world only step two. It’s extremely irritating.

3*The way to be unambiguous about this is to say something like: “Let”s define h(t) = g(2t), then (h ∗ f )(t) =*R∞

−∞*h(t −*
*x)f (x) dx = . . . .” I concede that this is too much of a hassle in most cases. Just be careful.*

Let’s see a quick application of our brilliant new discovery. As an exercise you can show (by hand) that
*(Π ∗ Π)(x) = Λ(x)*

Recall that Λ is the triangle function. Applying the Convolution Theorem, we find that
*F Λ(s) = F (Π ∗ Π)(s) = sinc s · sinc s = sinc*^{2}*s ,*

just like before. Was there any doubt?

**Convolving in the frequency domain** If you look at the argument for the convolution theorem
*F (g ∗ f ) = F g · F f , you’ll see that we could have carried the whole thing out for the inverse Fourier*
transform, and given the symmetry between the Fourier transform and its inverse that’s not surprising.

That is, we also have

F^{−1}*(g ∗ f ) = F*^{−1}*g · F*^{−1}*f .*

What’s more interesting, and doesn’t follow without a little additional argument, is this:

*F (gf )(s) = (F g ∗ F f )(s) .*
In words:

• Multiplication in the time domain corresponds to convolution in the frequency domain.

Here’s how the derivation goes. We’ll need one of the duality formulas, the one that says
*F (F f )(s) = f (−s)* or *F (F f ) = f*^{−} without the variable.

*To derive the identity F (gf ) = F g ∗ F f , we write, for convenience, h = F f and k = F g. Then we’re to*
show

*F (gf ) = k ∗ h .*

The one thing we know is how to take the Fourier transform of a convolution, so, in the present notation,
*F (k ∗ h) = (F k)(F h). But now F k = F F g = g*^{−}*, from the identity above, and likewise F h = F F f = f*^{−}.
*So F (k ∗ h) = g*^{−}*f*^{−}*= (gf )*^{−}, or

*gf = F (k ∗ h)*^{−}*.*

Now, finally, take the Fourier transform of both sides of this last equation and appeal to the F F identity again:

*F (gf ) = F (F (k ∗ h)*^{−}*) = k ∗ h = F g ∗ F f .*
We’re done.

**Remark** *You may wonder why we didn’t start by trying to prove F (gf )(s) = (F g ∗ F f )(s) rather than*
*F (g ∗ f ) = (F f )(F g) as we did. That is, it seems more “natural” to multiply signals in the time domain*
*and see what effect this has in the frequency domain, so why not work with F (f g) directly? But write the*
*integral for F (gf ); there’s nothing you can do with it to get toward F g ∗ F f .*

3.2 What is Convolution, Really? 99

**3.2** **What is Convolution, Really?**

There’s not a single answer to that question. Those of you who have had a course in “Signals and Systems”

*probably saw convolution in connection with Linear Time Invariant Systems and the impulse response for*
such a system. (This already came up in connection with our solution of the heat equation.) That’s a very
natural setting for convolution and we’ll consider it later, after we have the machinery of delta functions
et al.

The fact is that convolution is used in many ways and for many reasons, and it can be a mistake to try to attach to it one particular meaning or interpretation. This multitude of interpretations and applications is somewhat like the situation with the definite integral. When you learned about the integral, chances are that it was introduced via an important motivating problem, typically recovering the distance traveled from the velocity, or finding the area under a curve. That’s fine, but the integral is really a much more general and flexible concept than those two sample problems might suggest. You do yourself no service if every time you think to use an integral you think only of one of those problems. Likewise, you do yourself no service if you insist on one particular interpretation of convolution.

*To pursue the analogy with the integral a little bit further, in pretty much all applications of the integral*
there is a general method at work: cut the problem into small pieces where it can be solved approximately,
sum up the solution for the pieces, and pass to a limit.^{4} There is also often a general method to working, or
*seeking to work with convolutions: usually there’s something that has to do with smoothing and averaging,*
understood broadly. You see this in both the continuous case (which we’re doing now) and the discrete
case (which we’ll do later).

For example, in using Fourier series to solve the heat equation on a circle, we saw that the solution was expressed as a convolution of the initial heat distribution with the Green’s function (or fundamental solution). That’s a smoothing and averaging interpretation (both!) of the convolution. It’s also a linear systems interpretation of convolution, where the system is described by the heat equation.

In brief, we’ll get to know the convolution by seeing it in action:

*• Convolution is what convolution does.*

That’s probably the best answer to the question in the heading to this section.

**3.2.1** **But can I visualize convolution? or “Flip this, buddy”**

I’m tempted to say don’t bother. Again for those of you who have seen convolution in earlier courses, you’ve probably heard the expression “flip and drag”. For

*(g ∗ f )(t) =*
Z ∞

−∞

*g(t − x)f (x) dx*
here’s what this means.

*• Fix a value t. The graph of the function g(x − t) has the same shape as g(x) but shifted to the right*
*by t. Then forming g(t − x) flips the graph (left-right) about the line x = t. If the most interesting*
*or important features of g(x) are near x = 0, e.g., if it’s sharply peaked there, then those features*
*are shifted to x = t for the function g(t − x) (but there’s the extra “flip” to keep in mind).*

4This goes back to Archimedes, who called his paper on the subject “The Method”.

*• Multiply the two functions f (x) and g(t − x) and integrate with respect to x. Remember that the*
*value of the convolution (g ∗ f )(t) is not just the product of the values of f and the flipped and shifted*
*g, it’s the integral of the product — much harder to visualize. Integrating the product sums up these*
values, that’s the “dragging” part.

**Smoothing and averaging** I prefer to think of the convolution operation as using one function to
*smooth and average the other. (Say g is used to smooth f in g ∗ f .) In many common applications g(x)*
is a positive function, concentrated near 0, with total area 1,

Z ∞

−∞

*g(x) dx = 1,*

*like a sharply peaked Gaussian, for example (stay tuned). Then g(t − x) is concentrated near t and still*
*has area 1. For a fixed t, forming the integral*

Z ∞

−∞

*g(t − x)f (x) dx*

*is like taking a weighted average of the values of f (x) near x = t, weighted by the values of (the flipped*
*and shifted) g. (It’s a legitimate weighted average because* R∞

−∞*g(x) dx = 1.)*

*That’s the averaging part of the description: Computing the convolution g ∗ f at t replaces the value f (t)*
*by a weighted average of the values of f near t. Where does the smoothing come in? Here’s where.*

*• Changing t (“dragging” g(t − x) through different values of t) repeats this operation.*

*Again take the case of an averaging-type function g(t), as above. At a given value of t, (g ∗ f )(t) is a*
*weighted average of values of f near t. Move t a little to a point t*^{0}*. Then (g ∗ f )(t*^{0}) is a weighted average
*of values of f near t*^{0}*, which will include values of f that entered into the average near t. Thus the values*
*of the convolutions (g ∗ f )(t) and (g ∗ f )(t*^{0}*) will likely be closer to each other than are the values f (t) and*
*f (t*^{0}*). That is, (g ∗ f )(t) is “smoothing” f as t varies — there’s less of a change between values of the*
*convolution than between values of f .*

We’ll study this in more detail later, but you’ve already seen at least one example of smoothing. The rect
*function Π(x) is discontinuous — it has jumps at ±1/2. The convolution Π ∗ Π is the triangle function*
*Λ, which is continuous — the jumps at the endpoints have been smoothed out. There’s still a corner, but*
*there’s no discontinuity.*

In fact, as an aphorism we can state

*• The convolution g ∗ f is at least as smooth a function as g and f are separately.*

**A smear job, too** Now, be a little careful in how you think about this averaging and smoothing process.

*Computing any value of (g ∗ f )(t) involves all of the values of g and all of the values of f , and adding the*
*products of corresponding values of g and f with one of the functions flipped and dragged. If both f (t) and*
*g(t) become identically zero after awhile then the convolution g ∗ f will also be identically zero outside of*
*some interval. But if either f (t) or g(t) does not become identically zero then neither will the convolution.*

In addition to averaging and smoothing the convolution also “smears” out the factors — not a becoming description, but an accurate one.

3.3 Properties of Convolution: It’s a Lot like Multiplication 101

Definitely keep the general description we’ve just gone through in mind, but as far as visualizing the convolution of any two old functions, I think it’s of dubious value to beat yourself up trying to do that.

It’s hard geometrically, and it’s hard computationally, in the sense that you have to calculate some tedious integrals. (You do have to do a few of these in your life — hence the homework assignment — but only a few.) For developing further intuition, I do recommend the Johns Hopkins web page on signals, systems and control:

http://www.jhu.edu/~signals/

There you’ll find a Java applet called “Joy of Convolution” (and many other things). It will allow you
*to select sample curves f (t) and g(t), or to draw your own curves with a mouse, and then produce the*
*convolution (g ∗ f )(t).*

By the way, of course you can try to get some intuition for how the convolution looks by thinking of what’s
*happening in the frequency domain. It’s not so far fetched to try to imagine the Fourier transforms F f ,*
*F g, and their product, and then imagine the inverse transform to get you g ∗ f .*

**3.3** **Properties of Convolution: It’s a Lot like Multiplication**

Convolution behaves in many ways (not all ways) like multiplication. For example, it is commutative:

*f ∗ g = g ∗ f .*

*So although it looks like the respective roles of f and g are different — one is “flipped and dragged”, the*
other isn’t — in fact they share equally in the end result.

*Do we have to prove this? Not among friends. After all, we defined the convolution so that the convolution*
*theorem holds, that is so that F (g ∗ f ) = F gF f . But g and f enter symmetrically on the right hand side,*
*so g ∗ f = f ∗ g — g(t) can be used to modify f (t) or f (t) can be used to modify g(t).*

Nevertheless, the commutativity property is easy to check from the definition:

*(f ∗ g)(t) =*
Z ∞

−∞

*f (t − u)g(u) du*

= Z ∞

−∞

*g(t − v)f (v) dv* *(making the substitution v = t − u)*

*= (g ∗ f )(t) .*

The same idea, a change of variable but with more bookkeeping, establishes that convolution is associative (an exercise for you in integrals):

*(f ∗ g) ∗ h = f ∗ (g ∗ h) .*
Much more easily one gets that

*f ∗ (g + h) = f ∗ g + f ∗ h .*

The corresponding statements are easily verified in the frequency domain.

How about a “1”? Is there a function which is to convolution as 1 is to multiplication? Is there a function
*g such that*

*(g ∗ f )(t) = f (t),* *for all functions f ?*

*What property would such a g have? Take Fourier transforms of both sides:*

*F f (s)F g(s) = F f (s) .*
*Then g(x) must be such that*

*F g(s) = 1 .*

*Is there such a g? Applying the inverse Fourier transform would lead to*
Z ∞

−∞

*e*^{2πisx}*dx ,*

and that integral does not exist — even I wouldn’t try to slip that by the rigor police. Something is up here. Maybe Fourier inversion doesn’t work in this case, or else there’s no classical function whose Fourier transform is 1, or something. In fact, though the integral does not exist in any sense, the problem of a

“1 for convolution” leads exactly to the delta function, or unit impulse — not a classical function, but a

“generalized” function. We’ll return to that shortly.

*How about “division”? Suppose we know h and g in*
*h = f ∗ g*

*and we want to solve for f . Again, taking Fourier transforms we would say*
*F h = F f · F g ⇒ F f =* *F h*

*F g.*

*We’d like the convolution quotient to be the inverse Fourier transform of F h/F g. But there are problems*
*caused by places where F g = 0, along with the usual problems with the integral for the inverse Fourier*
transform to exist.

*Solving for f (t) is the deconvolution problem, which is extremely important in applications. Many times*
*a noisy signal comes to you in the form h = f ∗ g; the signal is f , the noise is g, you receive h. You make*
some assumptions about the nature of the noise, usually statistical assumptions, and you want to separate
the signal from the noise. You want to deconvolve.

**Other identities** It’s not hard to combine the various rules we have and develop an algebra of convolu-
tions. Such identities can be of great use — it beats calculating integrals. Here’s an assortment. (Lower
and uppercase letters are Fourier pairs.)

*(f · g) ∗ (h · k)*

*(t) (F ∗ G) · (H ∗ K)*
*(s)*
*(f (t) + g(t)) · (h(t) + k(t)*

* ((F + G) ∗ (H + K))*
*(s)*
*f (t) · (g ∗ h)(t) F ∗ (G · H )*

*(s)*
You can write down others. Be confident — careful, but confident.

**3.4** **Convolution in Action I: A Little Bit on Filtering**

“Filtering” is a generic term for just about any operation one might want to apply to a signal. We have
to be reasonable, of course — there’s usually some feature of the signal that one wants to enhance or
*eliminate, and one expects something of the original signal to be recognizable or recoverable after it’s been*

3.4 Convolution in Action I: A Little Bit on Filtering 103

filtered. Most filters are described as somehow modifying the spectral content of a signal, and they are
thus set up as an operation on the Fourier transform of a signal. We’ll take up this topic in more detail
*when we discuss linear time invariant (LTI) systems, but it’s worthwhile saying a little bit now because the*
most common filters operate through multiplication in the frequency domain, hence through convolution
in the time domain.

The features are:

*• An input signal v(t)*

*• An output signal w(t)*

*• The operation that produces w(t) from v(t) in the time domain is convolution with a function h(t):*

*w(t) = (h ∗ v)(t) =*
Z ∞

−∞

*h(t − x)v(x) dx*

With this description the Fourier transforms of the input and output are related by multiplication in the frequency domain:

*W (s) = H (s)V (s) ,*

where, following tradition, we denote the Fourier transforms by the corresponding capital letters. In this
*context h(t) is usually called the impulse response* ^{5} *and H (s) is called the transfer function. It seems to*
*be a matter of course always to denote the impulse response by h(t) and always to denote the transfer*
*function by H (s). Who am I to do otherwise?*

*Remember that h(t), hence H (s), is “fixed” in this discussion. It’s wired into the circuit or coded into the*
*software and it does what it does to any input you may give it. Filters based on convolution are usually*
*designed to have a specific effect on the spectrum of an input, and so to design a filter is to design a transfer*
function. The operations, which you’re invited to draw a block diagram for, are thus

*Input → Fourier transform → Multiply by H → Inverse Fourier transform = output*

We want to see some examples of this today — filters that are in day-to-day use and the principles that go into their design.

One preliminary comment about how the spectra of the input and output are related. Write
*V (s) = |V (s)|e*^{iφ}^{V}^{(s)}*,* *φ*_{V}*(s) = tan*^{−1}

*Im V (s)*
*Re V (s)*

*,*

*so the phase of V (s) is φ*_{V}*(s), with similar notations for the phases of W (s) and H (s). Then*

*|W (s)|e*^{iφ}^{W}^{(s)}*= |H (s)| e*^{iφ}^{H}^{(s)}*|V (s) |e*^{iφ}^{V}^{(s)}

*= |H (s)| |V (s)| e*^{i(φ}^{H}^{(s)+φ}^{V}^{(s))}*.*
*Thus the magnitudes multiply and the phases add :*

*|W (s)| = |H (s)| |V (s)|*

*φ*_{W}*(s) = φ*_{V}*(s) + φ*_{H}*(s)*

5Because, as we’ll see, it is how the system “responds” to a unit impulse.

*Multiplying V (s) by H (s) can’t make the spectrum of V (s) any bigger*^{6}*, but it can make the spectrum*
*smaller by zeroing out parts of it. Furthermore, there is no phase change when φ**H**(s) = 0, and this*
*happens when H (s) is real. In this case only the amplitude is changed when the signal goes through the*
filter. Common examples of filters that do both of these things — modify some part of the magnitude of
the spectrum with no phase change — are lowpass, bandpass, highpass, and notch filters, to which we’ll
now turn.

**3.4.1** **Designing filters**

**Lowpass filters** *An ideal lowpass filter cuts off all frequencies above a certain amount ν*_{c}*(“c” for “cutoff ”)*
*and lets all frequencies below ν** _{c}* pass through unchanged. (Hence the description “lowpass”.) If we write
the operation as

*w(t) = (h ∗ v)(t) W (s) = H (s)V (s) ,*
then the transfer function we want is

*H (s) =*
(

1 *|s| < ν**c*

0 *|s| ≥ ν*_{c}

*Multiplying V (s) by H (s) leaves unchanged the spectrum of v for |s| < ν**c* and eliminates the other
frequencies. The transfer function is just a scaled rect function, and we can write it (to remind you) as

*H (s) = Π*_{2ν}_{c}*(s) = Π(s/2ν** _{c}*) =

(1 *|s/2ν*_{c}*| <* ^{1}_{2}
0 *|s/2ν**c*| ≥ ^{1}_{2}

=

(1 *|s| < ν**c*

0 *|s| ≥ ν*_{c}

In the time domain the impulse response is the inverse Fourier transform of Π_{2ν}_{c}, and this is
*h(t) = 2ν*_{c}*sinc(2ν*_{c}*t) .*

*By the way, why is this called just a “lowpass filter”; aren’t the frequencies below −ν** _{c}* also eliminated

*and so not “passed” by the filter? Yes, but remember that for real signals v(t) (which is where this is*

*applied) one has the symmetry relation V (−s) = V (s). The positive and negative frequencies combine in*reconstructing the real signal in the inverse Fourier transform, much like what happens with Fourier series.

*Thus one wants to pass the frequencies with −ν**c* *< s < ν**c* *and eliminate the frequencies with s ≥ ν**c* *and*
*s ≤ −ν**c*.

*And, by the way, why is this called an ideal lowpass filter? Because the cutoff is a sharp one — right at*
*a particular frequency ν**c*. In practice this cannot be achieved, and much of the original art of filter design
is concerned with useful approximations to a sharp cutoff.

**Bandpass filters** Another very common filter passes a particular band of frequencies through unchanged
*and eliminates all others. This is the ideal bandpass filter. Its transfer function, B(s), can be constructed*
*by shifting and combining the transfer function H (s) for the lowpass filter.*

*We center our bandpass filter at ±ν*_{0} *and cut off frequencies more than ν*_{c}*above and below ν*_{0}; just as for
the lowpass filter we pass symmetric bands of positive frequencies and negative frequencies, and eliminate

6* In s, that is; the spectrum of the output takes up no more of R than the spectrum of the input. One says that no new*
frequencies are added to the spectrum

3.4 Convolution in Action I: A Little Bit on Filtering 105

everything else. That is, we define the transfer function of a bandpass filter to be
*B(s) =*

(1 *ν*_{0}*− ν*_{c}*< |s| < ν*_{0}*+ ν** _{c}*
0 otherwise

*= H (s − ν*_{0}*) + H (s + ν*_{0})
Here’s the graph.

### 1

0 *+ν*_{0}

*−ν*0

*2ν*c *2ν*c

*From the representation of B(s) in terms of H (s) it’s easy to find the impulse response, b(t). That’s given*
by

*b(t) = h(t)e*^{2πiν}^{0}^{t}*+ h(t)e*^{−2πiν}^{0}* ^{t}* (using the shift theorem or the modulation theorem)

*= 4ν**c**cos(2πν*0*t) sinc(2ν**c**t).*

*Here’s a plot of b(t) for ν*0 *= 10 and ν**c* = 2:

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1

−8

−6

−4

−2 0 2 4 6 8

*t*

*b(t*)

Now, tell the truth, do you really think you could just flip and drag and figure out what the convolution looks like of that thing with some other thing?

**Highpass filters** The twin to an ideal lowpass filter is an ideal high pass filter, where all frequencies
*above a cutoff frequency ν**c* are passed through unchanged and all frequencies below are eliminated. You
might use this, for example, if there’s a slow “drift” in your data that suggests a low frequency disturbance
or noise that you may want to eliminate. Highpass filters are used on images to sharpen edges and details
(associated with high spatial frequencies) and eliminate blurring (associated with low spatial frequencies).

The graph of the transfer function for an ideal highpass filter looks like:

### 1

0 *ν*_{c}

*−ν*_{c}

It’s easy to write a formula for this; it’s just

*High(s) = 1 − Π*_{2ν}_{c}*(s) ,*

*where ν**c* is the cutoff frequency.^{7} At this point we’re stuck. We can’t find the impulse response because
*we haven’t yet gained the knowledge that the inverse Fourier transform of 1 is the δ function. Think of*
the highpass filter as the evil twin of the lowpass filter.

**Notch filters** *The evil twin of a bandpass filter is a notch filter. The effect of a notch filter is to eliminate*
frequencies within a given band (the “notch”) and to pass frequencies outside that band. To get the transfer
function we just subtract a bandpass transfer function from 1. Using the one we already have:

*Notch(s) = 1 − B(s) = 1 − (H (s − ν*_{0}*) + H (s + ν*_{0}*)) .*

*This will eliminate the positive frequencies between ν*_{0}*− ν*_{c}*and ν*_{0}*+ ν** _{c}*, and the symmetric corresponding

*negative frequencies between −ν*0

*− ν*

*c*

*and −ν*0

*+ ν*

*c*, and pass all frequencies outside of these two bands.

You can draw your own graph of that.

Unfortunately, for the impulse response we’re in the same position here as we were for the highpass filter.

*We cannot write down the impulse response without recourse to δ’s, so this will have to wait.*

**3.5** **Convolution in Action II: Differential Equations**

One of the most common uses of convolution and the Fourier transform is in solving differential equations.

*Solving differential equations was Fourier’s original motivation for Fourier series and the use of the Fourier*
transform to this end has continued to exercise a strong influence on the theory and the applications. We’ll
consider several illustrations, from a simple ordinary differential equation to problems associated with the
heat equation. We’ll also revisit the problem of a signal propagating along a cable.

7*OK, this High(s) is 1 at the endpoints ±ν*c instead of 0, but that makes no practical difference. On the other hand, this is
*a further argument for defining Π to have value 1/2 at the endpoints, for then the transfer functions for the low and highpass*
filters agree in how they cut.

3.5 Convolution in Action II: Differential Equations 107

**The derivative formula** To put the Fourier transform to work, we need a formula for the Fourier
transform of the derivative, and as you found in homework:

*(F f*^{0}*)(s) = 2πis F f (s) .*

We see that differentiation has been transformed into multiplication, another remarkable feature of the Fourier transform and another reason for its usefulness. Formulas for higher derivatives also hold, and the result is:

*(F f*^{(n)}*)(s) = (2πis)*^{n}*F f (s) .*
We’ll come back to these formulas in another context a little later.

*In general, a differential operator can be thought of as a polynomial in d/dx, say of the form*
*P*

_{d}*dx*

*= a**n*

_{d}*dx*

*n*

*+ a**n−1*

_{d}*dx*

*n−1*

*+ · · · + a*1

*d*
*dx**+ a*0*,*
*and when applied to a function f (x) the result is*

*a*_{n}*f*^{(n)}*+ a*_{n−1}*f*^{(n−1)}*+ · · · + a*_{1}*f*^{0}*+ a*_{0}*f .*

*If we now take the Fourier transform of this expression, we wind up with the Fourier transform of f*
*multiplied by the corresponding n-th degree polynomial evaluated at 2πis.*

F

*P*

_{d}*dx*

*f*

*(s) = P (2πis) F f (s)*

*= a**n**(2πis)*^{n}*+ a**n−1**(2πis)*^{n−1}*+ · · · + a*1*(2πis) + a*0

*F f (s) .*
Don’t underestimate how important this is.

**A simple ordinary differential equation and how to solve it** You might like starting off with the
classic second order, ordinary differential equation

*u*^{00}*− u = −f*

Maybe you’ve looked at a different form of this equation, but I’m writing it this way to make the subsequent
*calculations a little easier. f (t) is a given function and you want to find u(t).*

Take the Fourier transform of both sides:

*(2πis)*^{2}*F u − F u = −F f*

*−4π*^{2}*s*^{2}*F u − F u = −F f*
*(1 + 4π*^{2}*s*^{2}*)F u = F f*
*So we can solve for F u as*

*F u =* ^{1}

*1 + 4π*^{2}*s*^{2}*F f ,*

*and — with a little struggle — we recognize 1/(1 + 4π*^{2}*s*^{2}) as the Fourier transform of ^{1}_{2}*e** ^{−|t|}*, that is,

*F u = F*

1
2*e*^{−|t|}

*· F f .*

The right hand side is the product of two Fourier transforms. Therefore, according to the convolution theorem,

*u(t) =* ^{1}_{2}*e*^{−|t|}*∗ f (t) .*
Written out in full this is

*u(t) =* ^{1}_{2}
Z ∞

−∞

*e*^{−|t−τ |}*f (τ ) dτ .*

And there you have the two-sided exponential decay in action, as well as convolution.

**The heat equation** Remember the heat equation? In one spatial dimension, the equation that describes
*the rates of change of the temperature u(x, t) of the body at a point x and time t (with some normalization*
of the constants associated with the material) is the partial differential equation

*u** _{t}*=

^{1}

_{2}

*u*

_{xx}*.*

In our earlier work on Fourier series we considered heat flow on a circle, and we looked for solutions that
*are periodic function of x on the interval [0, 1], so u was to satisfy u(x + 1, t) = u(x, t). This time we want*
to consider the problem of heat flow on the “infinite rod”. A rod of great length (effectively of infinite
*length) is provided with an initial temperature distribution f (x) and we want to find a solution u(x, t) of*
the heat equation with

*u(x, 0) = f (x) .*

*Both f (x) and u(x, t) are defined for −∞ < x < ∞, and there is no assumption of periodicity. Knowing*
the Fourier transform of the Gaussian is essential for the treatment we’re about to give.

*The idea is to take the Fourier transform of both sides of the heat equation, “with respect to x”. The*
Fourier transform of the right hand side of the equation, ^{1}_{2}*u**xx**(x, t), is*

1

2*F u**xx**(s, t) =* ^{1}_{2}*(2πis)*^{2}*F u(s, t) = −2π*^{2}*s*^{2}*F u(s, t) ,*

*from the derivative formula. Observe that the “frequency variable” s is now in the first slot of the trans-*
*formed function and that the time variable t is just going along for the ride. For the left hand side, u**t**(x, t),*
we do something different. We have

*F u*_{t}*(s, t) =*
Z ∞

−∞

*u*_{t}*(x, t)e*^{−2πisx}*dx* *(Fourier transform in x)*

= Z ∞

−∞

*∂*

*∂t**u(x, t)e*^{−2πisx}*dx*

= ^{∂}

*∂t*
Z ∞

−∞

*u(x, t)e*^{−2πisx}*dx =* ^{∂}

*∂t**u(s, t).*ˆ

*Thus taking the Fourier transform (with respect to x) of both sides of the equation*
*u**t*= ^{1}_{2}*u**xx*

leads to

*∂F u(s, t)*

*∂t* *= −2π*^{2}*s*^{2}*F u(s, t) .*

*This is a differential equation in t — an ordinary differential equation, despite the partial derivative symbol*

— and we can solve it:

*F u(s, t) = F u(s, 0)e*^{−2π}^{2}^{s}^{2}^{t}*.*
*What is the initial condition, F u(s, 0)?*

*F u(s, 0) =*
Z ∞

−∞

*u(x, 0)e*^{−2πisx}*dx*

= Z ∞

−∞

*f (x)e*^{−2πisx}*dx = F f (s)*
Putting it all together,

*F u(s, t) = F f (s)e*^{−2π}^{2}^{s}^{2}^{t}*.*

3.5 Convolution in Action II: Differential Equations 109

We recognize (we are good) that the exponential factor on the right hand side is the Fourier transform of the Gaussian,

*g(x, t) =* 1

√

*2πte*^{−x}^{2}^{/2t}*.*
We then have a product of two Fourier transforms,

*F u(s, t) = F g(s, t) F f (s)*
and we invert this to obtain a convolution in the spatial domain:

*u(x, t) = g(x, t) ∗ f (x) =*

1

√

*2πte*^{−x}^{2}^{/2t}

*∗ f (x)* *(convolution in x)*
or, written out,

*u(x, t) =*
Z ∞

−∞

√1

*2πte*^{−(x−y)}^{2}^{/2t}*f (y) dy .*

*It’s reasonable to believe that the temperature u(x, t) of the rod at a point x at a time t > 0 is some kind*
*of averaged, smoothed version of the initial temperature f (x) = u(x, 0). That’s convolution at work.*

The function

*g(x, t) =* 1

√

*2πte*^{−x}^{2}^{/2t}*.*

*is called the heat kernel (or Green’s function, or fundamental solution) for the heat equation for the infinite*
*rod. Here are plots of g(x, t), as a function of x, for t = 1, 0.5, 0.1, 0.05, 0.01.*

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

0 0.5 1 1.5 2 2.5 3 3.5 4

*x*

*g(x,t)*

*You can see that the curves are becoming more concentrated near x = 0. Nevertheless, they are doing so*
in a way that keeps the area under each curve 1. For

Z ∞

−∞

√1

*2πte*^{−x}^{2}^{/2t}*dx =* 1

√
*2πt*

Z ∞

−∞

*e*^{−πu}^{2}

√

*2πt du* *(making the substitution u = x/*

√
*2πt.)*

= Z ∞

−∞

*e*^{−πu}^{2} *du = 1*

*We’ll see later that the g(x, t) serve as an approximation to the δ function as t → 0.*

You might ask at this point: Didn’t we already solve the heat equation? Is what we did then related to what we just did now? Indeed we did and indeed they are: see Section 3.5.

**More on diffusion — back to the cable** Recall from our earlier discussion that William Thomson
appealed to the heat equation to study the delay in a signal sent along a long, undersea telegraph cable. The
physical intuition, as of the mid 19th century, was that charge “diffused” along the cable. To reconstruct
part of Thomson’s solution (essentially) we must begin with a slightly different setup. The equation is the
same

*u** _{t}*=

^{1}

_{2}

*u*

_{xx}*,*

so we’re choosing constants as above and not explicitly incorporating physical parameters such as resistance per length, capacitance per length, etc., but the initial and boundary conditions are different.

*We consider a semi-infinite rod, having one end (at x = 0) but effectively extending infinitely in the*
*positive x-direction. Instead of an initial distribution of temperature along the entire rod, we consider a*
*source of heat (or voltage) f (t) at the end x = 0. Thus we have the initial condition*

*u(0, t) = f (t) .*
We suppose that

*u(x, 0) = 0 ,*

*meaning that at t = 0 there’s no temperature (or charge) in the rod. We also assume that u(x, t) and its*
*derivatives tend to zero as x → ∞. Finally, we set*

*u(x, t) = 0* for *x < 0*

*so that we can regard u(x, t) as defined for all x. We want a solution that expresses u(x, t), the temperature*
*(or voltage) at a position x > 0 and time t > 0 in terms of the initial temperature (or voltage) f (t) at the*
*endpoint x = 0.*

*The analysis of this is really involved. It’s quite a striking formula that works out in the end, but, be*
warned, the end is a way off. Proceed only if interested.

*First take the Fourier transform of u(x, t) with respect to x (the notation ˆu seems more natural here):*

ˆ

*u(s, t) =*
Z ∞

−∞

*e*^{−2πisx}*u(x, t) dx .*
Then, using the heat equation,

*∂*

*∂t**u(s, t) =*ˆ
Z ∞

−∞

*e*^{−2πisx}^{∂}

*∂t**u(x, t) dx =*
Z ∞

−∞

*e*^{−2πisx}^{∂}

2

*∂x*^{2}

1

2*u(x, t) dx .*

3.5 Convolution in Action II: Differential Equations 111

*We need integrate only from 0 to ∞ since u(x, t) is identically 0 for x < 0. We integrate by parts once:*

Z ∞ 0

*e*^{−2πisx}^{1}
2

*∂*^{2}

*∂x*^{2}*u(x, t) dx =* ^{1}
2

h

*e*^{−2πisx}^{∂}

*∂x**u(x, t)*
i*x=∞*

*x=0* *+ 2πis*
Z ∞

0

*∂*

*∂x**u(x, t) e*^{−2πisx}*dx*

= −^{1}_{2}*u**x**(0, t) + πis*
Z ∞

0

*∂*

*∂x**u(x, t) e*^{−2πisx}*dx ,*

*taking the boundary conditions on u(x, t) into account. Now integrate by parts a second time:*

Z ∞ 0

*∂*

*∂x**u(x, t) e*^{−2πisx}*dx =*h

*e*^{−2πisx}*u(x, t)*i*x=∞*

*x=0*

*+ 2πis*
Z ∞

0

*e*^{−2πist}*u(x, t) dx*

*= −u(0, t) + 2πis*
Z ∞

0

*e*^{−2πist}*u(x, t) dx*

*= −f (t) + 2πis*
Z ∞

−∞

*e*^{−2πist}*u(x, t) dx*

(we drop the bottom limit back to −∞ to bring back the Fourier transform)

*= −f (t) + 2πis ˆu(s, t).*

Putting these calculations together yields

*∂*

*∂t**u(s, t) = −*ˆ ^{1}_{2}*u**x**(0, t) − πisf (t) − 2π*^{2}*s*^{2}*u(s, t) .*ˆ

*Now, this is a linear, first order, ordinary differential equation (in t) for ˆu. It’s of the general type*
*y*^{0}*(t) + P (t)y(t) = Q(t) ,*

and if you cast your mind back and search for knowledge from the dim past you will recall that to solve such an equation you multiply both sides by the integrating factor

*e*

Rt
0*P (τ ) dτ*

which produces

*y(t)e*

Rt

0*P (τ ) dτ*0

*= e*

Rt
0*P (τ ) dτ*

*Q(t) .*

*From here you get y(t) by direct integration. For our particular application we have*

*P (t) = 2π*^{2}*s*^{2} *(that’s a constant as far as we’re concerned because there’s no t)*
*Q(t) = −*^{1}_{2}*u**x**(0, t) − πisf (t).*

*The integrating factor is e*^{2π}^{2}^{s}^{2}* ^{t}* and we’re to solve

^{8}

*(e*

^{2π}^{2}

^{s}^{2}

^{t}*u(t))*ˆ

^{0}

*= e*

^{2π}^{2}

^{s}^{2}

*−*

^{t}^{1}

2*u*_{x}*(0, t) − πisf (t)*
*.*
*Write τ for t and integrate both sides from 0 to t with respect to τ :*

*e*^{2π}^{2}^{s}^{2}^{t}*u(s, t) − ˆ*ˆ *u(s, 0) =*
Z *t*

0

*e*^{2πs}^{2}* ^{τ}* −

^{1}

2*u*_{x}*(0, τ ) − πisf (τ )*
*dτ .*

8I want to carry this out so you don’t miss anything

But ˆ*u(s, 0) = 0 since u(x, 0) is identically 0, so*
ˆ

*u(s, t) = e*^{−2π}^{2}^{s}^{2}* ^{t}*
Z

*t*

0

*e*^{2πs}^{2}* ^{τ}* −

^{1}

_{2}

*u*

*x*

*(0, τ ) − πisf (τ )*

*dτ*

=
Z *t*

0

*e*^{−2π}^{2}^{s}^{2}* ^{(t−τ )}* −

^{1}

2*u*_{x}*(0, τ ) − πisf (τ )*
*dτ.*

*We need to take the inverse transform of this to get u(x, t). Be not afraid:*

*u(x, t) =*
Z ∞

−∞

*e*^{2πisx}*u(s, t) ds*ˆ

= Z ∞

−∞

*e*^{2πisx}

Z ^{t}

0

*e*^{−2π}^{2}^{s}^{2}* ^{(t−τ )}* −

^{1}

2*u*_{x}*(0, τ ) − πisf (τ )*
*dτ*

*ds*

=
Z *t*

0

Z ∞

−∞

*e*^{2πisx}*e*^{−2π}^{2}^{s}^{2}* ^{(t−τ )}* −

^{1}

2*u**x**(0, τ ) − πisf (τ )*
*ds dτ .*

Appearances to the contrary, this is not hopeless. Let’s pull out the inner integral for further examination:

Z ∞

−∞

*e*^{2πisx}*(e*^{−2π}^{2}^{s}^{2}* ^{(t−τ )}* −

^{1}

2*u*_{x}*(0, τ ) − πisf (τ ))*
*ds =*

−^{1}

2*u**x**(0, τ )*
Z ∞

−∞

*e*^{2πisx}*e*^{−2π}^{2}^{s}^{2}^{(t−τ )}*ds − πif (τ )*
Z ∞

−∞

*e*^{2πisx}*s e*^{−2π}^{2}^{s}^{2}^{(t−τ )}*ds*

The first integral is the inverse Fourier transform of a Gaussian; we want to find F^{−1} *e*^{−2πs}^{2}^{(t−τ )}

. Recall the formulas

F

_{1}
*σ*√

*2π**e*^{−x}^{2}^{/2σ}^{2}

*= e*^{−2π}^{2}^{σ}^{2}^{s}^{2}*,* *F (e*^{−x}^{2}^{/2σ}^{2}*) = σ*

√

*2π e*^{−2π}^{2}^{σ}^{2}^{s}^{2}*.*
Apply this with

*σ =* ^{1}

*2π*p

*(t − τ )**.*
Then, using duality and evenness of the Gaussian, we have

Z ∞

−∞

*e*^{2πisx}*e*^{−2πs}^{2}^{(t−τ )}*ds = F*^{−1} *e*^{−2πs}^{2}^{(t−τ )}

= *e*^{−x}^{2}* ^{/2(t−τ )}*
p

*2π(t − τ )*

*.*

In the second integral we want to find F^{−1}*(s e*^{−2π}^{2}^{s}^{2}* ^{(t−τ )}*). For this, note that

*s e*

^{−2π}^{2}

^{s}^{2}

*= −*

^{(t−τ )}^{1}

*4π*^{2}*(t − τ )*
*d*

*ds**e*^{−2π}^{2}^{s}^{2}* ^{(t−τ )}*
and hence

Z ∞

−∞

*e*^{2πisx}*s e*^{−2π}^{2}^{s}^{2}^{(t−τ )}*ds = F*^{−1}

− ^{1}

*4π*^{2}*(t − τ )*
*d*

*ds**e*^{−2π}^{2}^{s}^{2}^{(t−τ )}

= − ^{1}

*4π*^{2}*(t − τ )*F^{−1}

_{d}

*ds**e*^{−2π}^{2}^{s}^{2}^{(t−τ )}

*.*
We know how to take the inverse Fourier transform of a derivative, or rather we know how to take the
(forward) Fourier transform, and that’s all we need by another application of duality. We use, for a general
*function f ,*

F^{−1}*f*^{0}*= (F f*^{0})^{−}*= (2πixF f )*^{−}*= −2πix(F f )*^{−}*= −2πixF*^{−1}*f .*

3.5 Convolution in Action II: Differential Equations 113

Apply this to
F^{−1}

*d*

*ds**e*^{−2π}^{2}^{s}^{2}^{(t−τ )}

*= −2πixF*^{−1} *e*^{−2π}^{2}^{s}^{2}^{(t−τ )}

*= −2πix*_{p} ^{1}

*2π(t − τ )**e*^{−x}^{2}* ^{/2(t−τ )}* (from our earlier calculation, fortunately)

Then

− ^{1}

*4π*^{2}*(t − τ )*F^{−1}_{d}

*ds**e*^{−2π}^{2}^{s}^{2}^{(t−τ )}

= ^{2πix}*4π*^{2}*(t − τ )*

*e*^{−x}^{2}^{/2(t−τ )}
p*2π(t − τ )* = ^{i}

*2π*

*x e*^{−x}^{2}^{/2(t−τ )}
p*2π(t − τ )*^{3} *.*
That is,

F^{−1}

*s e*^{−2π}^{2}^{s}^{2}^{(t−τ )}

= ^{i}*2π*

*x e*^{−x}^{2}^{/2(t−τ )}
p*2π(t − τ )*^{3} *.*

*Finally getting back to the expression for u(x, t), we can combine what we’ve calculated for the inverse*
Fourier transforms and write

*u(x, t) = −*^{1}_{2}
Z *t*

0

*u*_{x}*(0, τ )F*^{−1}

*e*^{−2πs}^{2}^{(t−τ )}

*dτ − πi*
Z *t*

0

*f (τ )F*^{−1}

*s e*^{−2π}^{2}^{s}^{2}^{(t−τ )}*dτ*

= −^{1}_{2}
Z *t*

0

*u*_{x}*(0, τ )*^{e}

−x^{2}/2(t−τ )

p*2π(t − τ )**dτ +* ^{1}_{2}
Z *t*

0

*f (τ )*^{x e}

−x^{2}/2(t−τ )

p*2π(t − τ )*^{3} *dτ.*

*We’re almost there. We’d like to eliminate u**x**(0, τ ) from this formula and express u(x, t) in terms of f (t)*
only. This can be accomplished by a very clever, and I’d say highly nonobvious observation. We know
*that u(x, t) is zero for x < 0; we have defined it to be so. Hence the integral expression for u(x, t) is zero*
*for x < 0. Because of the evenness and oddness in x of the two integrands this has a consequence for the*
*values of the integrals when x is positive. (The first integrand is even in x and the second is odd in x.) In*
fact, the integrals are equal!

Let me explain what happens in a general situation, stripped down, so you can see the idea. Suppose we have

*Φ(x, t) =*
Z *t*

0

*φ(x, τ ) dτ +*
Z *t*

0

*ψ(x, τ ) dτ*

*where we know that: Φ(x, t) is zero for x < 0; φ(x, τ ) is even in x; ψ(x, τ ) is odd in x. Take a > 0. Then*
*Φ(−a, τ ) = 0, hence using the evenness of φ(x, τ ) and the oddness of ψ(x, τ ),*

0 =
Z *t*

0

*φ(−a, τ ) dτ +*
Z *t*

0

*ψ(−a, τ ) dτ =*
Z *t*

0

*φ(a, τ ) dτ −*
Z *t*

0

*ψ(a, τ ) dτ .*
*We conclude that for all a > 0,* Z *t*

0

*φ(a, τ ) =*
Z *t*

0

*ψ(a, τ ) dτ ,*
*and hence for x > 0 (writing x for a)*

*Φ(x, t) =*
Z *t*

0

*φ(x, τ ) dτ +*
Z *t*

0

*ψ(x, τ ) dτ*

= 2
Z *t*

0

*ψ(x, τ ) dτ = 2*
Z *t*

0

*φ(x, τ ) dτ* *(either φ or ψ could be used).*

We apply this in our situation with

*φ(x, τ ) = −*^{1}_{2}*u**x**(0, τ )e*^{−x}^{2}^{/2(t−τ )}

p*2π(t − τ ),* *ψ(x, τ ) =* ^{1}_{2}*f (τ )x e*^{−x}^{2}* ^{/2(t−τ )}*
p

*2π(t − τ )*

^{3}

*.*