Convolution Chapter3

(1)

Chapter 3

Convolution

3.1 A ∗ is Born

How can we use one signal to modify another? Some of the properties of the Fourier transform that we have already derived can be thought of as addressing this question. The easiest is the result on additivity, according to which

F (f + g) = F f + F g.

Adding the signal g(t) to the signal f (t) adds the amounts F g(s) to the frequency components F f (s).

(Symmetrically, f (t) modifies g(t) in the same way.) The spectrum of f + g may be more or less “compli- cated” than the spectrum of f and g alone, and it’s an elementary operation in both the time domain and the frequency domain that produces or eliminates the complications. It’s an operation that’s also easily undone: See some frequencies you don’t like in the spectrum (a bad buzz)? Then try adding something in or subtracting something out and see what the signal looks like.

We can view the question of using one signal to modify another in either the time domain or in the frequency domain, sometimes with equal ease and sometimes with one point of view preferred. We just looked at sums, what about products? The trivial case is multiplying by a constant, as in F (af )(s) = aF f (s). The energies of the harmonics are all affected by the same amount, so, thinking of music for example, the signal sounds the same, only louder or softer. It’s much less obvious how to scale the harmonics separately. That is, as a question “in the frequency domain”, we ask:

Is there some combination of the signals f (t) and g(t) so that in the frequency domain the Fourier transform is

F g(s)F f (s) ?

In other words, in the time domain can we combine the signal g(t) with the signal f (t) so that the frequency components F f (s) of f (t) are scaled by the frequency components F g(s) of g(t)?

(Once again this is symmetric — we could say that the frequency components F g(s) are scaled by the frequency components F f (s).)

Let’s check this out, and remember that the rigor police are off duty. No arrests will be made for unstated assumptions, divergent integrals, etc.

The product of the Fourier transforms of f (t) and g(t) is F g(s) F f (s) =

Z ∞

−∞

e^−2πistg(t) dt Z ∞

−∞

e^−2πisxf (x) dx .

(2)

We used different variables of integration in the two integrals because we’re going to combine the product into an iterated integral.¹

Z ∞

−∞

e^−2πistg(t) dt Z ∞

−∞

e^−2πisxf (x) dx = Z ∞

−∞

Z ∞

−∞

e^−2πiste^−2πisxg(t)f (x) dt dx

= Z ∞

−∞

Z ∞

−∞

e^{−2πis(t+x)}g(t)f (x) dt dx

= Z ∞

−∞

Z ∞

−∞

e^{−2πis(t+x)}g(t) dt

f (x) dx

Now make the change of variable u = t + x in the inner integral. Then t = u − x, du = dt, and the limits are the same. The result is

Z ∞

−∞

Z ∞

−∞

e^{−2πis(t+x)}g(t) dt

f (x) dx = Z ∞

−∞

Z ∞

−∞

e^−2πisug(u − x) du

f (x) dx Next, switch the order of integration:

Z ∞

−∞

Z ∞

−∞

e^−2πisug(u − x) du

f (x) dx = Z ∞

−∞

Z ∞

−∞

e^−2πisug(u − x)f (x) du dx

= Z ∞

−∞

Z ∞

−∞

e^−2πisug(u − x)f (x) dx du

= Z ∞

−∞

e^−2πisu

Z ∞

−∞

g(u − x)f (x) dx

du

Look at what’s happened here. The inner integral is a function of u. Let’s set it up on its own:

h(u) = Z ∞

−∞

g(u − x)f (x) dx . Then the outer integral produces the Fourier transform of h:

Z ∞

−∞

e^−2πisu

Z ∞

−∞

g(u − x)f (x) dx

du =

Z ∞

−∞

e^−2πisuh(u) du = F h(s)

Switching the variable name for h from h(u) to h(t) (solely for psychological comfort), we have discovered that the signals f (t) and g(t) are combined into a signal

h(t) = Z ∞

−∞

g(t − x)f (x) dx . In other words,

F h(s) = F g(s)F f (s) . Remarkable.

We have solved our problem. The only thing to do is to realize what we’ve done and declare it to the world. We make the following definition:

1If you’re uneasy with this (never mind issues of convergence) you might convince yourself that it’s correct by working your way backwards from the double integral to the product of the two single integrals.

(3)

3.1 A ∗ is Born 97

• Convolution defined The convolution of two functions g(t) and f (t) is the function h(t) =

Z ∞

−∞

g(t − x)f (x) dx . We use the notation

(g ∗ f )(t) = Z ∞

−∞

g(t − x)f (x) dx .

We can now proudly announce:

• Convolution Theorem F (g ∗ f )(s) = F g(s)F f (s)

• In words: Convolution in the time domain corresponds to multiplication in the frequency domain.² Recall that when we studied Fourier series, convolution came up in the form

(g ∗ f )(t) = Z 1

0

g(t − x)f (x) dx .

In that setting, for the integral to make sense, i.e., to be able to evaluate g(t − x) at points outside the interval from 0 to 1, we had to assume that g was periodic. That’s not an issue in the present setting, where we assume that f (t) and g(t) are defined for all t, so the factors in the integral

Z ∞

−∞

g(t − x)f (x) dx

are defined everywhere. There may be questions to raise about whether the integral converges, and there are, but at least the setup makes sense.

Remark on notation, again It’s common to see the people write the convolution as g(t) ∗ f (t), putting the variable t in each of g and f . There are times when that’s OK, even sometimes preferable to introducing a lot of extra notation, but in general I think it’s a bad idea because it can lead to all sorts of abuses and possible mistakes. For example, what’s g(2t) ∗ f (t)? If you plugged in too casually you might write this as

the integral Z ∞

−∞

g(2t − x)f (x) dx . That’s wrong. The right answer in convolving g(2t) and f (t) is

Z ∞

−∞

g(2(t − x))f (x) dx = Z ∞

−∞

g(2t − 2x)f (x) dx . Make sure you understand why the first is wrong and second is right.³

2What we’ve just gone through is the same sort of thing we did when we “found” the formula for the Fourier coefficients for a periodic function. Remember the principle: First suppose the problem is solved and see what the answer must be. The second step, assuming the first one works, is to turn that solution into a definition and then announce to the world that you have solved your original problem based on your brilliant definition. Mathematicians, in particular, are very good at presenting their results and writing their books in this way — do step one in secret and tell the world only step two. It’s extremely irritating.

3The way to be unambiguous about this is to say something like: “Let”s define h(t) = g(2t), then (h ∗ f )(t) =R∞

−∞h(t − x)f (x) dx = . . . .” I concede that this is too much of a hassle in most cases. Just be careful.

(4)

Let’s see a quick application of our brilliant new discovery. As an exercise you can show (by hand) that (Π ∗ Π)(x) = Λ(x)

Recall that Λ is the triangle function. Applying the Convolution Theorem, we find that F Λ(s) = F (Π ∗ Π)(s) = sinc s · sinc s = sinc²s ,

just like before. Was there any doubt?

Convolving in the frequency domain If you look at the argument for the convolution theorem F (g ∗ f ) = F g · F f , you’ll see that we could have carried the whole thing out for the inverse Fourier transform, and given the symmetry between the Fourier transform and its inverse that’s not surprising.

That is, we also have

F⁻¹(g ∗ f ) = F⁻¹g · F⁻¹f .

What’s more interesting, and doesn’t follow without a little additional argument, is this:

F (gf )(s) = (F g ∗ F f )(s) . In words:

• Multiplication in the time domain corresponds to convolution in the frequency domain.

Here’s how the derivation goes. We’ll need one of the duality formulas, the one that says F (F f )(s) = f (−s) or F (F f ) = f⁻ without the variable.

To derive the identity F (gf ) = F g ∗ F f , we write, for convenience, h = F f and k = F g. Then we’re to show

F (gf ) = k ∗ h .

The one thing we know is how to take the Fourier transform of a convolution, so, in the present notation, F (k ∗ h) = (F k)(F h). But now F k = F F g = g⁻, from the identity above, and likewise F h = F F f = f⁻. So F (k ∗ h) = g⁻f⁻= (gf )⁻, or

gf = F (k ∗ h)⁻.

Now, finally, take the Fourier transform of both sides of this last equation and appeal to the F F identity again:

F (gf ) = F (F (k ∗ h)⁻) = k ∗ h = F g ∗ F f . We’re done.

Remark You may wonder why we didn’t start by trying to prove F (gf )(s) = (F g ∗ F f )(s) rather than F (g ∗ f ) = (F f )(F g) as we did. That is, it seems more “natural” to multiply signals in the time domain and see what effect this has in the frequency domain, so why not work with F (f g) directly? But write the integral for F (gf ); there’s nothing you can do with it to get toward F g ∗ F f .

(5)

3.2 What is Convolution, Really? 99

3.2 What is Convolution, Really?

There’s not a single answer to that question. Those of you who have had a course in “Signals and Systems”

probably saw convolution in connection with Linear Time Invariant Systems and the impulse response for such a system. (This already came up in connection with our solution of the heat equation.) That’s a very natural setting for convolution and we’ll consider it later, after we have the machinery of delta functions et al.

The fact is that convolution is used in many ways and for many reasons, and it can be a mistake to try to attach to it one particular meaning or interpretation. This multitude of interpretations and applications is somewhat like the situation with the definite integral. When you learned about the integral, chances are that it was introduced via an important motivating problem, typically recovering the distance traveled from the velocity, or finding the area under a curve. That’s fine, but the integral is really a much more general and flexible concept than those two sample problems might suggest. You do yourself no service if every time you think to use an integral you think only of one of those problems. Likewise, you do yourself no service if you insist on one particular interpretation of convolution.

To pursue the analogy with the integral a little bit further, in pretty much all applications of the integral there is a general method at work: cut the problem into small pieces where it can be solved approximately, sum up the solution for the pieces, and pass to a limit.⁴ There is also often a general method to working, or seeking to work with convolutions: usually there’s something that has to do with smoothing and averaging, understood broadly. You see this in both the continuous case (which we’re doing now) and the discrete case (which we’ll do later).

For example, in using Fourier series to solve the heat equation on a circle, we saw that the solution was expressed as a convolution of the initial heat distribution with the Green’s function (or fundamental solution). That’s a smoothing and averaging interpretation (both!) of the convolution. It’s also a linear systems interpretation of convolution, where the system is described by the heat equation.

In brief, we’ll get to know the convolution by seeing it in action:

• Convolution is what convolution does.

That’s probably the best answer to the question in the heading to this section.

3.2.1 But can I visualize convolution? or “Flip this, buddy”

I’m tempted to say don’t bother. Again for those of you who have seen convolution in earlier courses, you’ve probably heard the expression “flip and drag”. For

(g ∗ f )(t) = Z ∞

−∞

g(t − x)f (x) dx here’s what this means.

• Fix a value t. The graph of the function g(x − t) has the same shape as g(x) but shifted to the right by t. Then forming g(t − x) flips the graph (left-right) about the line x = t. If the most interesting or important features of g(x) are near x = 0, e.g., if it’s sharply peaked there, then those features are shifted to x = t for the function g(t − x) (but there’s the extra “flip” to keep in mind).

4This goes back to Archimedes, who called his paper on the subject “The Method”.

(6)

• Multiply the two functions f (x) and g(t − x) and integrate with respect to x. Remember that the value of the convolution (g ∗ f )(t) is not just the product of the values of f and the flipped and shifted g, it’s the integral of the product — much harder to visualize. Integrating the product sums up these values, that’s the “dragging” part.

Smoothing and averaging I prefer to think of the convolution operation as using one function to smooth and average the other. (Say g is used to smooth f in g ∗ f .) In many common applications g(x) is a positive function, concentrated near 0, with total area 1,

Z ∞

−∞

g(x) dx = 1,

like a sharply peaked Gaussian, for example (stay tuned). Then g(t − x) is concentrated near t and still has area 1. For a fixed t, forming the integral

Z ∞

−∞

g(t − x)f (x) dx

is like taking a weighted average of the values of f (x) near x = t, weighted by the values of (the flipped and shifted) g. (It’s a legitimate weighted average because R∞

−∞g(x) dx = 1.)

That’s the averaging part of the description: Computing the convolution g ∗ f at t replaces the value f (t) by a weighted average of the values of f near t. Where does the smoothing come in? Here’s where.

• Changing t (“dragging” g(t − x) through different values of t) repeats this operation.

Again take the case of an averaging-type function g(t), as above. At a given value of t, (g ∗ f )(t) is a weighted average of values of f near t. Move t a little to a point t⁰. Then (g ∗ f )(t⁰) is a weighted average of values of f near t⁰, which will include values of f that entered into the average near t. Thus the values of the convolutions (g ∗ f )(t) and (g ∗ f )(t⁰) will likely be closer to each other than are the values f (t) and f (t⁰). That is, (g ∗ f )(t) is “smoothing” f as t varies — there’s less of a change between values of the convolution than between values of f .

We’ll study this in more detail later, but you’ve already seen at least one example of smoothing. The rect function Π(x) is discontinuous — it has jumps at ±1/2. The convolution Π ∗ Π is the triangle function Λ, which is continuous — the jumps at the endpoints have been smoothed out. There’s still a corner, but there’s no discontinuity.

In fact, as an aphorism we can state

• The convolution g ∗ f is at least as smooth a function as g and f are separately.

A smear job, too Now, be a little careful in how you think about this averaging and smoothing process.

Computing any value of (g ∗ f )(t) involves all of the values of g and all of the values of f , and adding the products of corresponding values of g and f with one of the functions flipped and dragged. If both f (t) and g(t) become identically zero after awhile then the convolution g ∗ f will also be identically zero outside of some interval. But if either f (t) or g(t) does not become identically zero then neither will the convolution.

In addition to averaging and smoothing the convolution also “smears” out the factors — not a becoming description, but an accurate one.

(7)

3.3 Properties of Convolution: It’s a Lot like Multiplication 101

Definitely keep the general description we’ve just gone through in mind, but as far as visualizing the convolution of any two old functions, I think it’s of dubious value to beat yourself up trying to do that.

It’s hard geometrically, and it’s hard computationally, in the sense that you have to calculate some tedious integrals. (You do have to do a few of these in your life — hence the homework assignment — but only a few.) For developing further intuition, I do recommend the Johns Hopkins web page on signals, systems and control:

http://www.jhu.edu/~signals/

There you’ll find a Java applet called “Joy of Convolution” (and many other things). It will allow you to select sample curves f (t) and g(t), or to draw your own curves with a mouse, and then produce the convolution (g ∗ f )(t).

By the way, of course you can try to get some intuition for how the convolution looks by thinking of what’s happening in the frequency domain. It’s not so far fetched to try to imagine the Fourier transforms F f , F g, and their product, and then imagine the inverse transform to get you g ∗ f .

3.3 Properties of Convolution: It’s a Lot like Multiplication

Convolution behaves in many ways (not all ways) like multiplication. For example, it is commutative:

f ∗ g = g ∗ f .

So although it looks like the respective roles of f and g are different — one is “flipped and dragged”, the other isn’t — in fact they share equally in the end result.

Do we have to prove this? Not among friends. After all, we defined the convolution so that the convolution theorem holds, that is so that F (g ∗ f ) = F gF f . But g and f enter symmetrically on the right hand side, so g ∗ f = f ∗ g — g(t) can be used to modify f (t) or f (t) can be used to modify g(t).

Nevertheless, the commutativity property is easy to check from the definition:

(f ∗ g)(t) = Z ∞

−∞

f (t − u)g(u) du

= Z ∞

−∞

g(t − v)f (v) dv (making the substitution v = t − u)

= (g ∗ f )(t) .

The same idea, a change of variable but with more bookkeeping, establishes that convolution is associative (an exercise for you in integrals):

(f ∗ g) ∗ h = f ∗ (g ∗ h) . Much more easily one gets that

f ∗ (g + h) = f ∗ g + f ∗ h .

The corresponding statements are easily verified in the frequency domain.

How about a “1”? Is there a function which is to convolution as 1 is to multiplication? Is there a function g such that

(g ∗ f )(t) = f (t), for all functions f ?

(8)

What property would such a g have? Take Fourier transforms of both sides:

F f (s)F g(s) = F f (s) . Then g(x) must be such that

F g(s) = 1 .

Is there such a g? Applying the inverse Fourier transform would lead to Z ∞

−∞

e^2πisxdx ,

and that integral does not exist — even I wouldn’t try to slip that by the rigor police. Something is up here. Maybe Fourier inversion doesn’t work in this case, or else there’s no classical function whose Fourier transform is 1, or something. In fact, though the integral does not exist in any sense, the problem of a

“1 for convolution” leads exactly to the delta function, or unit impulse — not a classical function, but a

“generalized” function. We’ll return to that shortly.

How about “division”? Suppose we know h and g in h = f ∗ g

and we want to solve for f . Again, taking Fourier transforms we would say F h = F f · F g ⇒ F f = F h

F g.

We’d like the convolution quotient to be the inverse Fourier transform of F h/F g. But there are problems caused by places where F g = 0, along with the usual problems with the integral for the inverse Fourier transform to exist.

Solving for f (t) is the deconvolution problem, which is extremely important in applications. Many times a noisy signal comes to you in the form h = f ∗ g; the signal is f , the noise is g, you receive h. You make some assumptions about the nature of the noise, usually statistical assumptions, and you want to separate the signal from the noise. You want to deconvolve.

Other identities It’s not hard to combine the various rules we have and develop an algebra of convolutions. Such identities can be of great use — it beats calculating integrals. Here’s an assortment. (Lower and uppercase letters are Fourier pairs.)

(f · g) ∗ (h · k)

(t) (F ∗ G) · (H ∗ K) (s) (f (t) + g(t)) · (h(t) + k(t)

((F + G) ∗ (H + K)) (s) f (t) · (g ∗ h)(t) F ∗ (G · H )

(s) You can write down others. Be confident — careful, but confident.

3.4 Convolution in Action I: A Little Bit on Filtering

“Filtering” is a generic term for just about any operation one might want to apply to a signal. We have to be reasonable, of course — there’s usually some feature of the signal that one wants to enhance or eliminate, and one expects something of the original signal to be recognizable or recoverable after it’s been

(9)

3.4 Convolution in Action I: A Little Bit on Filtering 103

filtered. Most filters are described as somehow modifying the spectral content of a signal, and they are thus set up as an operation on the Fourier transform of a signal. We’ll take up this topic in more detail when we discuss linear time invariant (LTI) systems, but it’s worthwhile saying a little bit now because the most common filters operate through multiplication in the frequency domain, hence through convolution in the time domain.

The features are:

• An input signal v(t)

• An output signal w(t)

• The operation that produces w(t) from v(t) in the time domain is convolution with a function h(t):

w(t) = (h ∗ v)(t) = Z ∞

−∞

h(t − x)v(x) dx

With this description the Fourier transforms of the input and output are related by multiplication in the frequency domain:

W (s) = H (s)V (s) ,

where, following tradition, we denote the Fourier transforms by the corresponding capital letters. In this context h(t) is usually called the impulse response ⁵ and H (s) is called the transfer function. It seems to be a matter of course always to denote the impulse response by h(t) and always to denote the transfer function by H (s). Who am I to do otherwise?

Remember that h(t), hence H (s), is “fixed” in this discussion. It’s wired into the circuit or coded into the software and it does what it does to any input you may give it. Filters based on convolution are usually designed to have a specific effect on the spectrum of an input, and so to design a filter is to design a transfer function. The operations, which you’re invited to draw a block diagram for, are thus

Input → Fourier transform → Multiply by H → Inverse Fourier transform = output

We want to see some examples of this today — filters that are in day-to-day use and the principles that go into their design.

One preliminary comment about how the spectra of the input and output are related. Write V (s) = |V (s)|e^iφ^V^(s), φ_V(s) = tan⁻¹

Im V (s) Re V (s)

,

so the phase of V (s) is φ_V(s), with similar notations for the phases of W (s) and H (s). Then

|W (s)|eîφ^W^(s) = |H (s)| eîφ^H^(s)|V (s) |eîφ^V^(s)

= |H (s)| |V (s)| e^i(φ^H^(s)+φ^V^(s)). Thus the magnitudes multiply and the phases add :

|W (s)| = |H (s)| |V (s)|

φ_W(s) = φ_V(s) + φ_H(s)

5Because, as we’ll see, it is how the system “responds” to a unit impulse.

(10)

Multiplying V (s) by H (s) can’t make the spectrum of V (s) any bigger⁶, but it can make the spectrum smaller by zeroing out parts of it. Furthermore, there is no phase change when φH(s) = 0, and this happens when H (s) is real. In this case only the amplitude is changed when the signal goes through the filter. Common examples of filters that do both of these things — modify some part of the magnitude of the spectrum with no phase change — are lowpass, bandpass, highpass, and notch filters, to which we’ll now turn.

3.4.1 Designing filters

Lowpass filters An ideal lowpass filter cuts off all frequencies above a certain amount ν_c(“c” for “cutoff ”) and lets all frequencies below ν_c pass through unchanged. (Hence the description “lowpass”.) If we write the operation as

w(t) = (h ∗ v)(t) W (s) = H (s)V (s) , then the transfer function we want is

H (s) = (

1 |s| < νc

0 |s| ≥ ν_c

Multiplying V (s) by H (s) leaves unchanged the spectrum of v for |s| < νc and eliminates the other frequencies. The transfer function is just a scaled rect function, and we can write it (to remind you) as

H (s) = Π_2ν_c(s) = Π(s/2ν_c) =

(1 |s/2ν_c| < ¹₂ 0 |s/2νc| ≥ ¹₂

=

(1 |s| < νc

0 |s| ≥ ν_c

In the time domain the impulse response is the inverse Fourier transform of Π_2ν_c, and this is h(t) = 2ν_csinc(2ν_ct) .

By the way, why is this called just a “lowpass filter”; aren’t the frequencies below −ν_c also eliminated and so not “passed” by the filter? Yes, but remember that for real signals v(t) (which is where this is applied) one has the symmetry relation V (−s) = V (s). The positive and negative frequencies combine in reconstructing the real signal in the inverse Fourier transform, much like what happens with Fourier series.

Thus one wants to pass the frequencies with −νc < s < νc and eliminate the frequencies with s ≥ νc and s ≤ −νc.

And, by the way, why is this called an ideal lowpass filter? Because the cutoff is a sharp one — right at a particular frequency νc. In practice this cannot be achieved, and much of the original art of filter design is concerned with useful approximations to a sharp cutoff.

Bandpass filters Another very common filter passes a particular band of frequencies through unchanged and eliminates all others. This is the ideal bandpass filter. Its transfer function, B(s), can be constructed by shifting and combining the transfer function H (s) for the lowpass filter.

We center our bandpass filter at ±ν₀ and cut off frequencies more than ν_c above and below ν₀; just as for the lowpass filter we pass symmetric bands of positive frequencies and negative frequencies, and eliminate

6In s, that is; the spectrum of the output takes up no more of R than the spectrum of the input. One says that no new frequencies are added to the spectrum

(11)

3.4 Convolution in Action I: A Little Bit on Filtering 105

everything else. That is, we define the transfer function of a bandpass filter to be B(s) =

(1 ν₀− ν_c < |s| < ν₀+ ν_c 0 otherwise

= H (s − ν₀) + H (s + ν₀) Here’s the graph.

1

0 +ν₀

−ν0

2νc 2νc

From the representation of B(s) in terms of H (s) it’s easy to find the impulse response, b(t). That’s given by

b(t) = h(t)e^2πiν⁰^t+ h(t)e^−2πiν⁰^t (using the shift theorem or the modulation theorem)

= 4νccos(2πν0t) sinc(2νct).

Here’s a plot of b(t) for ν0 = 10 and νc = 2:

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1

−8

−6

−4

−2 0 2 4 6 8

t

b(t)

Now, tell the truth, do you really think you could just flip and drag and figure out what the convolution looks like of that thing with some other thing?

(12)

Highpass filters The twin to an ideal lowpass filter is an ideal high pass filter, where all frequencies above a cutoff frequency νc are passed through unchanged and all frequencies below are eliminated. You might use this, for example, if there’s a slow “drift” in your data that suggests a low frequency disturbance or noise that you may want to eliminate. Highpass filters are used on images to sharpen edges and details (associated with high spatial frequencies) and eliminate blurring (associated with low spatial frequencies).

The graph of the transfer function for an ideal highpass filter looks like:

1

0 ν_c

−ν_c

It’s easy to write a formula for this; it’s just

High(s) = 1 − Π_2ν_c(s) ,

where νc is the cutoff frequency.⁷ At this point we’re stuck. We can’t find the impulse response because we haven’t yet gained the knowledge that the inverse Fourier transform of 1 is the δ function. Think of the highpass filter as the evil twin of the lowpass filter.

Notch filters The evil twin of a bandpass filter is a notch filter. The effect of a notch filter is to eliminate frequencies within a given band (the “notch”) and to pass frequencies outside that band. To get the transfer function we just subtract a bandpass transfer function from 1. Using the one we already have:

Notch(s) = 1 − B(s) = 1 − (H (s − ν₀) + H (s + ν₀)) .

This will eliminate the positive frequencies between ν₀− ν_c and ν₀+ ν_c, and the symmetric corresponding negative frequencies between −ν0− νc and −ν0+ νc, and pass all frequencies outside of these two bands.

You can draw your own graph of that.

Unfortunately, for the impulse response we’re in the same position here as we were for the highpass filter.

We cannot write down the impulse response without recourse to δ’s, so this will have to wait.

3.5 Convolution in Action II: Differential Equations

One of the most common uses of convolution and the Fourier transform is in solving differential equations.

Solving differential equations was Fourier’s original motivation for Fourier series and the use of the Fourier transform to this end has continued to exercise a strong influence on the theory and the applications. We’ll consider several illustrations, from a simple ordinary differential equation to problems associated with the heat equation. We’ll also revisit the problem of a signal propagating along a cable.

7OK, this High(s) is 1 at the endpoints ±νc instead of 0, but that makes no practical difference. On the other hand, this is a further argument for defining Π to have value 1/2 at the endpoints, for then the transfer functions for the low and highpass filters agree in how they cut.

(13)

3.5 Convolution in Action II: Differential Equations 107

The derivative formula To put the Fourier transform to work, we need a formula for the Fourier transform of the derivative, and as you found in homework:

(F f⁰)(s) = 2πis F f (s) .

We see that differentiation has been transformed into multiplication, another remarkable feature of the Fourier transform and another reason for its usefulness. Formulas for higher derivatives also hold, and the result is:

(F f⁽ⁿ⁾)(s) = (2πis)ⁿF f (s) . We’ll come back to these formulas in another context a little later.

In general, a differential operator can be thought of as a polynomial in d/dx, say of the form P

_d dx

= an

_d dx

n

+ an−1

_d dx

n−1

+ · · · + a1

d dx+ a0, and when applied to a function f (x) the result is

a_nf⁽ⁿ⁾+ a_n−1f⁽ⁿ⁻¹⁾+ · · · + a₁f⁰+ a₀f .

If we now take the Fourier transform of this expression, we wind up with the Fourier transform of f multiplied by the corresponding n-th degree polynomial evaluated at 2πis.

F

P

_d dx

f

(s) = P (2πis) F f (s)

= an(2πis)ⁿ+ an−1(2πis)ⁿ⁻¹+ · · · + a1(2πis) + a0

F f (s) . Don’t underestimate how important this is.

A simple ordinary differential equation and how to solve it You might like starting off with the classic second order, ordinary differential equation

u⁰⁰− u = −f

Maybe you’ve looked at a different form of this equation, but I’m writing it this way to make the subsequent calculations a little easier. f (t) is a given function and you want to find u(t).

Take the Fourier transform of both sides:

(2πis)²F u − F u = −F f

−4π²s²F u − F u = −F f (1 + 4π²s²)F u = F f So we can solve for F u as

F u = ¹

1 + 4π²s²F f ,

and — with a little struggle — we recognize 1/(1 + 4π²s²) as the Fourier transform of ¹₂e^−|t|, that is, F u = F

1 2e^−|t|

· F f .

The right hand side is the product of two Fourier transforms. Therefore, according to the convolution theorem,

u(t) = ¹₂e^−|t|∗ f (t) . Written out in full this is

u(t) = ¹₂ Z ∞

−∞

e^{−|t−τ |}f (τ ) dτ .

And there you have the two-sided exponential decay in action, as well as convolution.

(14)

The heat equation Remember the heat equation? In one spatial dimension, the equation that describes the rates of change of the temperature u(x, t) of the body at a point x and time t (with some normalization of the constants associated with the material) is the partial differential equation

u_t= ¹₂u_xx.

In our earlier work on Fourier series we considered heat flow on a circle, and we looked for solutions that are periodic function of x on the interval [0, 1], so u was to satisfy u(x + 1, t) = u(x, t). This time we want to consider the problem of heat flow on the “infinite rod”. A rod of great length (effectively of infinite length) is provided with an initial temperature distribution f (x) and we want to find a solution u(x, t) of the heat equation with

u(x, 0) = f (x) .

Both f (x) and u(x, t) are defined for −∞ < x < ∞, and there is no assumption of periodicity. Knowing the Fourier transform of the Gaussian is essential for the treatment we’re about to give.

The idea is to take the Fourier transform of both sides of the heat equation, “with respect to x”. The Fourier transform of the right hand side of the equation, ¹₂uxx(x, t), is

1

2F uxx(s, t) = ¹₂(2πis)²F u(s, t) = −2π²s²F u(s, t) ,

from the derivative formula. Observe that the “frequency variable” s is now in the first slot of the trans- formed function and that the time variable t is just going along for the ride. For the left hand side, ut(x, t), we do something different. We have

F u_t(s, t) = Z ∞

−∞

u_t(x, t)e^−2πisxdx (Fourier transform in x)

= Z ∞

−∞

∂

∂tu(x, t)e^−2πisxdx

= ^∂

∂t Z ∞

−∞

u(x, t)e^−2πisxdx = ^∂

∂tu(s, t).ˆ

Thus taking the Fourier transform (with respect to x) of both sides of the equation ut= ¹₂uxx

leads to

∂F u(s, t)

∂t = −2π²s²F u(s, t) .

This is a differential equation in t — an ordinary differential equation, despite the partial derivative symbol

— and we can solve it:

F u(s, t) = F u(s, 0)e^−2π²^s²^t. What is the initial condition, F u(s, 0)?

F u(s, 0) = Z ∞

−∞

u(x, 0)e^−2πisxdx

= Z ∞

−∞

f (x)e^−2πisxdx = F f (s) Putting it all together,

F u(s, t) = F f (s)e^−2π²^s²^t.

(15)

We recognize (we are good) that the exponential factor on the right hand side is the Fourier transform of the Gaussian,

g(x, t) = 1

√

2πte^−x²^/2t. We then have a product of two Fourier transforms,

F u(s, t) = F g(s, t) F f (s) and we invert this to obtain a convolution in the spatial domain:

u(x, t) = g(x, t) ∗ f (x) =

1

√

2πte^−x²^/2t

∗ f (x) (convolution in x) or, written out,

u(x, t) = Z ∞

−∞

√1

2πte^−(x−y)²^/2tf (y) dy .

It’s reasonable to believe that the temperature u(x, t) of the rod at a point x at a time t > 0 is some kind of averaged, smoothed version of the initial temperature f (x) = u(x, 0). That’s convolution at work.

The function

g(x, t) = 1

√

2πte^−x²^/2t.

is called the heat kernel (or Green’s function, or fundamental solution) for the heat equation for the infinite rod. Here are plots of g(x, t), as a function of x, for t = 1, 0.5, 0.1, 0.05, 0.01.

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

0 0.5 1 1.5 2 2.5 3 3.5 4

x

g(x,t)

(16)

You can see that the curves are becoming more concentrated near x = 0. Nevertheless, they are doing so in a way that keeps the area under each curve 1. For

Z ∞

−∞

√1

2πte^−x²^/2tdx = 1

√ 2πt

Z ∞

−∞

e^−πu²

√

2πt du (making the substitution u = x/

√ 2πt.)

= Z ∞

−∞

e^−πu² du = 1

We’ll see later that the g(x, t) serve as an approximation to the δ function as t → 0.

You might ask at this point: Didn’t we already solve the heat equation? Is what we did then related to what we just did now? Indeed we did and indeed they are: see Section 3.5.

More on diffusion — back to the cable Recall from our earlier discussion that William Thomson appealed to the heat equation to study the delay in a signal sent along a long, undersea telegraph cable. The physical intuition, as of the mid 19th century, was that charge “diffused” along the cable. To reconstruct part of Thomson’s solution (essentially) we must begin with a slightly different setup. The equation is the same

u_t= ¹₂u_xx,

so we’re choosing constants as above and not explicitly incorporating physical parameters such as resistance per length, capacitance per length, etc., but the initial and boundary conditions are different.

We consider a semi-infinite rod, having one end (at x = 0) but effectively extending infinitely in the positive x-direction. Instead of an initial distribution of temperature along the entire rod, we consider a source of heat (or voltage) f (t) at the end x = 0. Thus we have the initial condition

u(0, t) = f (t) . We suppose that

u(x, 0) = 0 ,

meaning that at t = 0 there’s no temperature (or charge) in the rod. We also assume that u(x, t) and its derivatives tend to zero as x → ∞. Finally, we set

u(x, t) = 0 for x < 0

so that we can regard u(x, t) as defined for all x. We want a solution that expresses u(x, t), the temperature (or voltage) at a position x > 0 and time t > 0 in terms of the initial temperature (or voltage) f (t) at the endpoint x = 0.

The analysis of this is really involved. It’s quite a striking formula that works out in the end, but, be warned, the end is a way off. Proceed only if interested.

First take the Fourier transform of u(x, t) with respect to x (the notation ˆu seems more natural here):

ˆ

u(s, t) = Z ∞

−∞

e^−2πisxu(x, t) dx . Then, using the heat equation,

∂

∂tu(s, t) =ˆ Z ∞

−∞

e^−2πisx ^∂

∂tu(x, t) dx = Z ∞

−∞

e^−2πisx ^∂

2

∂x²

1

2u(x, t) dx .

(17)

We need integrate only from 0 to ∞ since u(x, t) is identically 0 for x < 0. We integrate by parts once:

Z ∞ 0

e^−2πisx¹ 2

∂²

∂x²u(x, t) dx = ¹ 2

h

e^−2πisx ^∂

∂xu(x, t) ix=∞

x=0 + 2πis Z ∞

0

∂

∂xu(x, t) e^−2πisxdx

= −¹₂ux(0, t) + πis Z ∞

0

∂

∂xu(x, t) e^−2πisxdx ,

taking the boundary conditions on u(x, t) into account. Now integrate by parts a second time:

Z ∞ 0

∂

∂xu(x, t) e^−2πisxdx =h

e^−2πisxu(x, t)ix=∞

x=0

+ 2πis Z ∞

0

e^−2πistu(x, t) dx

= −u(0, t) + 2πis Z ∞

0

= −f (t) + 2πis Z ∞

−∞

(we drop the bottom limit back to −∞ to bring back the Fourier transform)

= −f (t) + 2πis ˆu(s, t).

Putting these calculations together yields

∂

∂tu(s, t) = −ˆ ¹₂ux(0, t) − πisf (t) − 2π²s²u(s, t) .ˆ

Now, this is a linear, first order, ordinary differential equation (in t) for ˆu. It’s of the general type y⁰(t) + P (t)y(t) = Q(t) ,

and if you cast your mind back and search for knowledge from the dim past you will recall that to solve such an equation you multiply both sides by the integrating factor

e

Rt 0P (τ ) dτ

which produces

y(t)e

Rt

0P (τ ) dτ0

= e

Rt 0P (τ ) dτ

Q(t) .

From here you get y(t) by direct integration. For our particular application we have

P (t) = 2π²s² (that’s a constant as far as we’re concerned because there’s no t) Q(t) = −¹₂ux(0, t) − πisf (t).

The integrating factor is e^2π²^s²^t and we’re to solve⁸ (e^2π²^s²^tu(t))ˆ ⁰= e^2π²^s²^t −¹

2u_x(0, t) − πisf (t) . Write τ for t and integrate both sides from 0 to t with respect to τ :

e^2π²^s²^tu(s, t) − ˆˆ u(s, 0) = Z t

0

e^2πs²^τ −¹

2u_x(0, τ ) − πisf (τ ) dτ .

8I want to carry this out so you don’t miss anything

(18)

But ˆu(s, 0) = 0 since u(x, 0) is identically 0, so ˆ

u(s, t) = e^−2π²^s²^t Z t

0

e^2πs²^τ −¹₂ux(0, τ ) − πisf (τ ) dτ

= Z t

0

e^−2π²^s²^{(t−τ )} −¹

2u_x(0, τ ) − πisf (τ ) dτ.

We need to take the inverse transform of this to get u(x, t). Be not afraid:

u(x, t) = Z ∞

−∞

e^2πisxu(s, t) dsˆ

= Z ∞

−∞

e^2πisx

Z ^t

0

e^−2π²^s²^{(t−τ )} −¹

2u_x(0, τ ) − πisf (τ ) dτ

ds

= Z t

0

Z ∞

−∞

e^2πisxe^−2π²^s²^{(t−τ )} −¹

2ux(0, τ ) − πisf (τ ) ds dτ .

Appearances to the contrary, this is not hopeless. Let’s pull out the inner integral for further examination:

Z ∞

−∞

e^2πisx(e^−2π²^s²^{(t−τ )} −¹

2u_x(0, τ ) − πisf (τ )) ds =

−¹

2ux(0, τ ) Z ∞

−∞

e^2πisxe^−2π²^s²^{(t−τ )}ds − πif (τ ) Z ∞

−∞

e^2πisxs e^−2π²^s²^{(t−τ )}ds

The first integral is the inverse Fourier transform of a Gaussian; we want to find F⁻¹ e^−2πs²^{(t−τ )}

. Recall the formulas

F

₁ σ√

2πe^−x²^/2σ²

= e^−2π²^σ²^s², F (e^−x²^/2σ²) = σ

√

2π e^−2π²^σ²^s². Apply this with

σ = ¹

2πp

(t − τ ). Then, using duality and evenness of the Gaussian, we have

Z ∞

−∞

e^2πisxe^−2πs²^{(t−τ )}ds = F⁻¹ e^−2πs²^{(t−τ )}

= e^−x²^{/2(t−τ )} p2π(t − τ ) .

In the second integral we want to find F⁻¹(s e^−2π²^s²^{(t−τ )}). For this, note that s e^−2π²^s²^{(t−τ )}= − ¹

4π²(t − τ ) d

dse^−2π²^s²^{(t−τ )} and hence

Z ∞

−∞

e^2πisxs e^−2π²^s²^{(t−τ )}ds = F⁻¹

− ¹

4π²(t − τ ) d

dse^−2π²^s²^{(t−τ )}

= − ¹

4π²(t − τ )F⁻¹

_d

dse^−2π²^s²^{(t−τ )}

. We know how to take the inverse Fourier transform of a derivative, or rather we know how to take the (forward) Fourier transform, and that’s all we need by another application of duality. We use, for a general function f ,

F⁻¹f⁰= (F f⁰)⁻= (2πixF f )⁻= −2πix(F f )⁻= −2πixF⁻¹f .

(19)

Apply this to F⁻¹

d

dse^−2π²^s²^{(t−τ )}

= −2πixF⁻¹ e^−2π²^s²^{(t−τ )}

= −2πix_p ¹

2π(t − τ )e^−x²^{/2(t−τ )} (from our earlier calculation, fortunately)

Then

− ¹

4π²(t − τ )F⁻¹_d

dse^−2π²^s²^{(t−τ )}

= ^2πix 4π²(t − τ )

e^−x²^{/2(t−τ )} p2π(t − τ ) = ⁱ

2π

x e^−x²^{/2(t−τ )} p2π(t − τ )³ . That is,

F⁻¹

s e^−2π²^s²^{(t−τ )}

= ⁱ 2π

x e^−x²^{/2(t−τ )} p2π(t − τ )³ .

Finally getting back to the expression for u(x, t), we can combine what we’ve calculated for the inverse Fourier transforms and write

u(x, t) = −¹₂ Z t

0

u_x(0, τ )F⁻¹

e^−2πs²^{(t−τ )}

dτ − πi Z t

0

f (τ )F⁻¹

s e^−2π²^s²^{(t−τ )} dτ

= −¹₂ Z t

0

u_x(0, τ )^e

−x²/2(t−τ )

p2π(t − τ )dτ + ¹₂ Z t

0

f (τ )^{x e}

−x²/2(t−τ )

p2π(t − τ )³ dτ.

We’re almost there. We’d like to eliminate ux(0, τ ) from this formula and express u(x, t) in terms of f (t) only. This can be accomplished by a very clever, and I’d say highly nonobvious observation. We know that u(x, t) is zero for x < 0; we have defined it to be so. Hence the integral expression for u(x, t) is zero for x < 0. Because of the evenness and oddness in x of the two integrands this has a consequence for the values of the integrals when x is positive. (The first integrand is even in x and the second is odd in x.) In fact, the integrals are equal!

Let me explain what happens in a general situation, stripped down, so you can see the idea. Suppose we have

Φ(x, t) = Z t

0

φ(x, τ ) dτ + Z t

0

ψ(x, τ ) dτ

where we know that: Φ(x, t) is zero for x < 0; φ(x, τ ) is even in x; ψ(x, τ ) is odd in x. Take a > 0. Then Φ(−a, τ ) = 0, hence using the evenness of φ(x, τ ) and the oddness of ψ(x, τ ),

0 = Z t

0

φ(−a, τ ) dτ + Z t

0

ψ(−a, τ ) dτ = Z t

0

φ(a, τ ) dτ − Z t

0

ψ(a, τ ) dτ . We conclude that for all a > 0, Z t

0

φ(a, τ ) = Z t

0

ψ(a, τ ) dτ , and hence for x > 0 (writing x for a)

Φ(x, t) = Z t

0

φ(x, τ ) dτ + Z t

0

ψ(x, τ ) dτ

= 2 Z t

0

ψ(x, τ ) dτ = 2 Z t

0

φ(x, τ ) dτ (either φ or ψ could be used).

We apply this in our situation with

φ(x, τ ) = −¹₂ux(0, τ )e^−x²^{/2(t−τ )}

p2π(t − τ ), ψ(x, τ ) = ¹₂f (τ )x e^−x²^{/2(t−τ )} p2π(t − τ )³.