This book is a course of lectures on the mathematics of actuarial science. The idea behind the lectures is as far as possible to deduce interesting material on contingent present values and life tables directly from calculus and common-sense notions, illustrated through word problems. Both the Interest Theory and Probability related to life tables are treated as wonderful concrete appli-cations of the calculus. The lectures require no background beyond a third semester of calculus, but the prerequisite calculus courses must have been solidly understood. It is a truism of pre-actuarial advising that students who have not done really well in and digested the calculus ought not to consider actuarial studies.
It is not assumed that the student has seen a formal introduction to prob-ability. Notions of relative frequency and average are introduced first with reference to the ensemble of a cohort life-table, the underlying formal random experiment being random selection from the cohort life-table population (or, in the context of probabilities and expectations for ‘lives aged x’, from the subset of lx members of the population who survive to age x). The cal-culation of expectations of functions of a time-to-death random variables is rooted on the one hand in the concrete notion of life-table average, which is then approximated by suitable idealized failure densities and integrals. Later, in discussing Binomial random variables and the Law of Large Numbers, the combinatorial and probabilistic interpretation of binomial coefficients are de-rived from the Binomial Theorem, which the student the is assumed to know as a topic in calculus (Taylor series identification of coefficients of a poly-nomial.) The general notions of expectation and probability are introduced, but for example the Law of Large Numbers for binomial variables is treated (rigorously) as a topic involving calculus inequalities and summation of finite series. This approach allows introduction of the numerically and conceptually useful large-deviation inequalities for binomial random variables to explain just how unlikely it is for binomial (e.g., life-table) counts to deviate much percentage-wise from expectations when the underlying population of trials is large.
The reader is also not assumed to have worked previously with the The-ory of Interest. These lectures present TheThe-ory of Interest as a mathematical problem-topic, which is rather unlike what is done in typical finance courses.
Getting the typical Interest problems — such as the exercises on mortgage re-financing and present values of various payoff schemes — into correct format for numerical answers is often not easy even for good mathematics students.
The main goal of these lectures is to reach — by a conceptual route — mathematical topics in Life Contingencies, Premium Calculation and De-mography not usually seen until rather late in the trajectory of quantitative Actuarial Examinations. Such an approach can allow undergraduates with solid preparation in calculus (not necessarily mathematics or statistics ma-jors) to explore their possible interests in business and actuarial science. It also allows the majority of such students — who will choose some other av-enue, from economics to operations research to statistics, for the exercise of their quantitative talents — to know something concrete and mathematically coherent about the topics and ideas actually useful in Insurance.
A secondary goal of the lectures has been to introduce varied topics of applied mathematics as part of a reasoned development of ideas related to survival data. As a result, material is included on statistics of biomedical studies and on reliability which would not ordinarily find its way into an actuarial course. A further result is that mathematical topics, from differen-tial equations to maximum likelihood estimators based on complex life-table data, which seldom fit coherently into undergraduate programs of study, are
‘vertically integrated’ into a single course.
While the material in these lectures is presented systematically, it is not separated by chapters into unified topics such as Interest Theory, Probability Theory, Premium Calculation, etc. Instead the introductory material from probability and interest theory are interleaved, and later, various mathemat-ical ideas are introduced as needed to advance the discussion. No book at this level can claim to be fully self-contained, but every attempt has been made to develop the mathematics to fit the actuarial applications as they arise logically.
The coverage of the main body of each chapter is primarily ‘theoretical’.
At the end of each chapter is an Exercise Set and a short section of Worked Examples to illustrate the kinds of word problems which can be solved by the techniques of the chapter. The Worked Examples sections show how the ideas and formulas work smoothly together, and they highlight the most important and frequently used formulas.
Chapter 1
Basics of Probability and the Theory of Interest
The first lectures supply some background on elementary Probability Theory and basic Theory of Interest. The reader who has not previously studied these subjects may get a brief overview here, but will likely want to supplement this Chapter with reading in any of a number of calculus-based introductions to probability and statistics, such as Larson (1982), Larsen and Marx (1985), or Hogg and Tanis (1997) and the basics of the Theory of Interest as covered in the text of Kellison (1970) or Chapter 1 of Gerber (1997).
1.1 Probability, Lifetimes, and Expectation
In the cohort life-table model, imagine a number l0 of individuals born simultaneously and followed until death, resulting in data dx, lx for each age x = 0, 1, 2, . . ., where
lx= number of lives aged x (i.e. alive at birthday x ) and
dx = lx− lx+1 = number dying between ages x, x + 1
Now, allowing the age-variable x to take all real values, not just whole numbers, treat S(x) = lx/l0 as a piecewise continuously differentiable
non-1
increasing function called the “survivor” or “survival” function. Then for all positive real x, S(x) − S(x + t) is the fraction of the initial cohort which fails between time x and x + t, and
S(x) − S(x + t)
S(x) = lx− lx+t lx
denotes the fraction of those alive at exact age x who fail before x + t.
Question: what do probabilities have to do with the life table and survival function ?
To answer this, we first introduce probability as simply a relative fre-quency, using numbers from a cohort life-table like that of the accompanying Illustrative Life Table. In response to a probability question, we supply the fraction of the relevant life-table population, to obtain identities like
P r(life aged 29 dies between exact ages 35 and 41 or between 52 and 60 )
= S(35) − S(41) + S(52) − S(60) = n
(l35− l41) + (l52− l60)o.
l29
where our convention is that a life aged 29 is one of the cohort surviving to the 29th birthday.
The idea here is that all of the lifetimes covered by the life table are understood to be governed by an identical “mechanism” of failure, and that any probability question about a single lifetime is really a question concerning the fraction of those lives about which the question is asked (e.g., those alive at age x) whose lifetimes will satisfy the stated property (e.g., die either between 35 and 41 or between 52 and 60). This “frequentist” notion of probability of an event as the relative frequency with which the event occurs in a large population of (independent) identical units is associated with the phrase “law of large numbers”, which will bediscussed later. For now, remark only that the life table population should be large for the ideas presented so far to make good sense. See Table 1.1 for an illustration of a cohort life-table with realistic numbers.
Note: see any basic probability textbook, such as Larson (1982), Larsen and Marx (1985), or Hogg and Tanis (1997) for formal definitions of the notions of sample space, event, probability, and conditional probability. The main ideas which are necessary to understand the discussion so far are really
Table 1.1: Illustrative Life-Table, simulated to resemble realistic US (Male) life-table. For details of simulation, see Section 3.4 below.
Age x lx dx x lx dx
0 100000 2629 40 92315 295
1 97371 141 41 92020 332
2 97230 107 42 91688 408
3 97123 63 43 91280 414
4 97060 63 44 90866 464
5 96997 69 45 90402 532
6 96928 69 46 89870 587
7 96859 52 47 89283 680
8 96807 54 48 88603 702
9 96753 51 49 87901 782
10 96702 33 50 87119 841
11 96669 40 51 86278 885
12 96629 47 52 85393 974
13 96582 61 53 84419 1082
14 96521 86 54 83337 1088
15 96435 105 55 82249 1213
16 96330 83 56 81036 1344
17 96247 125 57 79692 1423
18 96122 133 58 78269 1476
19 95989 149 59 76793 1572
20 95840 154 60 75221 1696
21 95686 138 61 73525 1784
22 95548 163 62 71741 1933
23 95385 168 63 69808 2022
24 95217 166 64 67786 2186
25 95051 151 65 65600 2261
26 94900 149 66 63339 2371
27 94751 166 67 60968 2426
28 94585 157 68 58542 2356
29 94428 133 69 56186 2702
30 94295 160 70 53484 2548
31 94135 149 71 50936 2677
32 93986 152 72 48259 2811
33 93834 160 73 45448 2763
34 93674 199 74 42685 2710
35 93475 187 75 39975 2848
36 93288 212 76 37127 2832
37 93076 228 77 34295 2835
38 92848 272 78 31460 2803
39 92576 261
matters of common sense when applied to relative frequency but require formal axioms when used more generally:
• Probabilities are numbers between 0 and 1 assigned to subsets of the entire range of possible outcomes (in the examples, subsets of the in-terval of possible human lifetimes measured in years).
• The probability P (A ∪ B) of the union A ∪ B of disjoint (i.e., nonoverlapping) sets A and B is necessarily the sum of the separate probabilities P (A) and P (B).
• When probabilities are requested with reference to a smaller universe of possible outcomes, such as B = lives aged 29, rather than all members of a cohort population, the resulting conditional probabilities of events A are written P (A | B) and calculated as P (A ∩ B)/P (B), where A ∩ B denotes the intersection or overlap of the two events A, B.
• Two events A, B are defined to be independent when P (A ∩ B) = P (A)·P (B) or — equivalently, as long as P (B) > 0 — the conditional probability P (A|B) expressing the probability of A if B were known to have occurred, is the same as the (unconditional) probability P (A).
The life-table data, and the mechanism by which members of the popula-tion die, are summarized first through the survivor funcpopula-tion S(x) which at integer values of x agrees with the ratios lx/l0. Note that S(x) has values between 0 and 1, and can be interpreted as the probability for a single indi-vidual to survive at least x time units. Since fewer people are alive at larger ages, S(x) is a decreasing function of x, and in applications S(x) should be piecewise continuously differentiable (largely for convenience, and because any analytical expression which would be chosen for S(x) in practice will be piecewise smooth). In addition, by definition, S(0) = 1. Another way of summarizing the probabilities of survival given by this function is to define the density function
f (x) = −dS
dx(x) = −S0(x)
as the (absolute) rate of decrease of the function S. Then, by the funda-mental theorem of calculus, for any ages a < b,
P (life aged 0 dies between ages a and b) = (la− lb)/l0
= S(a) − S(b) = Z b
a (−S0(x)) dx = Z b
a
f (x) dx (1.1) which has the very helpful geometric interpretation that the probability of dying within the interval [a, b] is equal to the area under the curve y = f (x) over the x-interval [a, b]. Note also that the ‘probability’ rule which assigns the integral R
A f (x) dx to the set A (which may be an interval, a union of intervals, or a still more complicated set) obviously satisfies the first two of the bulleted axioms displayed above.
The terminal age ω of a life table is an integer value large enough that S(ω) is negligibly small, but no value S(t) for t < ω is zero. For practical purposes, no individual lives to the ω birthday. While ω is finite in real life-tables and in some analytical survival models, most theoretical forms for S(x) have no finite age ω at which S(ω) = 0, and in those forms ω = ∞ by convention.
Now we are ready to define some terms and motivate the notion of ex-pectation. Think of the age T at which a specified newly born member of the population will die as a random variable, which for present purposes means a variable which takes various values x with probabilities governed by the life table data lx and the survivor function S(x) or density function f (x) in a formula like the one just given in equation (1.1). Suppose there is a contractual amount Y which must be paid (say, to the heirs of that individ-ual) at the time T of death of the individual, and suppose that the contract provides a specific function Y = g(T ) according to which this payment depends on (the whole-number part of) the age T at which death occurs.
What is the average value of such a payment over all individuals whose life-times are reflected in the life-table ? Since dx = lx− lx+1 individuals (out of the original l0 ) die at ages between x and x + 1, thereby generating a payment g(x), the total payment to all individuals in the life-table can be written as
X
x
(lx− lx+1) g(x)
Thus the average payment, at least under the assumption that Y = g(T )
depends only on the largest whole number [T ] less than or equal to T , is This quantity, the total contingent payment over the whole cohort divided by the number in the cohort, is called the expectation of the random payment Y = g(T ) in this special case, and can be interpreted as the weighted average of all of the different payments g(x) actually received, where the weights are just the relative frequency in the life table with which those payments are received. More generally, if the restriction that g(t) depends only on the integer part [t] of t were dropped , then the expectation of Y = g(T ) would be given by the same formula
E(Y ) = E(g(T )) = Z ∞
0
f (t) g(t) dt
The last displayed integral, like all expectation formulas, can be under-stood as a weighted average of values g(T ) obtained over a population, with weights equal to the probabilities of obtaining those values. Recall from the Riemann-integral construction in Calculus that the integral R f(t)g(t)dt can be regarded approximately as the sum over very small time-intervals [t, t + ∆] of the quantities f (t)g(t)∆, quantities which are interpreted as the base ∆ of a rectangle multiplied by its height f (t)g(t), and the rect-angle closely covers the area under the graph of the function f g over the interval [t, t + ∆]. The term f (t)g(t)∆ can alternatively be interpreted as the product of the value g(t) — essentially equal to any of the values g(T ) which can be realized when T falls within the interval [t, t + ∆] — multiplied by f (t) ∆. The latter quantity is, by the Fundamental Theorem of the Calculus, approximately equal for small ∆ to the area under the function f over the interval [t, t + ∆], and is by definition equal to the probability with which T ∈ [t, t + ∆]. In summary, E(Y ) =R∞
0 g(t)f (t)dt is the average of values g(T ) obtained for lifetimes T within small intervals [t, t + ∆] weighted by the probabilities of approximately f (t)∆ with which those T and g(T ) values are obtained. The expectation is a weighted average because the weights f (t)∆ sum to the integral R∞
0 f (t)dt = 1.
The same idea and formula can be applied to the restricted population of lives aged x. The resulting quantity is then called the conditional
expected value of g(T ) given that T ≥ x. The formula will be different in two ways: first, the range of integration is from x to ∞, because of the resitriction to individuals in the life-table who have survived to exact age x; second, the density f (t) must be replaced by f (t)/S(x), the so-called conditional density given T ≥ x, which is found as follows. From the definition of conditional probability, for t ≥ x,
P (t ≤ T ≤ t + ∆¯
Thus the density which can be used to calculate conditional probabilities P (a ≤ T ≤ b¯ The result of all of this discussion of conditional expected values is the for-mula, with associated weighted-average interpretation:
E(g(T )¯
Since payments based upon unpredictable occurrences or contingencies for in-sured lives can occur at different times, we study next the Theory of Interest, which is concerned with valuing streams of payments made over time. The general model in the case of constant interest is as follows. Compounding at time-intervals h = 1/m , with nominal interest rate i(m), means that a unit amount accumulates to (1 + i(m)/m) after a time h = 1/m. The principal or account value 1+i(m)/m at time 1/m accumulates over the time-interval from 1/m until 2/m, to (1+i(m)/m)·(1+i(m)/m) = (1+i(m)/m)2. Similarly, by induction, a unit amount accumulates to (1 + i(m)/m)n = (1 + i(m)/m)T m after the time T = nh which is a multiple of n whole units of h. In the