The hiring problem - Third Edition - ALGORITHMS INTRODUCTION TO

Third Edition

5.1 The hiring problem

Suppose that you need to hire a new ofﬁce assistant. Your previous attempts at hiring have been unsuccessful, and you decide to use an employment agency. The employment agency sends you one candidate each day. You interview that person and then decide either to hire that person or not. You must pay the employment agency a small fee to interview an applicant. To actually hire an applicant is more costly, however, since you must ﬁre your current ofﬁce assistant and pay a substan-tial hiring fee to the employment agency. You are committed to having, at all times, the best possible person for the job. Therefore, you decide that, after interviewing each applicant, if that applicant is better qualiﬁed than the current ofﬁce assistant, you will ﬁre the current ofﬁce assistant and hire the new applicant. You are willing to pay the resulting price of this strategy, but you wish to estimate what that price will be.

The procedure HIRE-ASSISTANT, given below, expresses this strategy for hiring in pseudocode. It assumes that the candidates for the ofﬁce assistant job are num-bered 1 through n. The procedure assumes that you are able to, after interviewing candidate i , determine whether candidate i is the best candidate you have seen so far. To initialize, the procedure creates a dummy candidate, numbered 0, who is less qualiﬁed than each of the other candidates.

HIRE-ASSISTANT.n/

1 best D 0 //candidate 0 is a least-qualiﬁed dummy candidate 2 fori D 1 to n

3 interview candidate i

4 if candidate i is better than candidate best

5 best D i

6 hire candidate i

The cost model for this problem differs from the model described in Chapter 2.

We focus not on the running time of HIRE-ASSISTANT, but instead on the costs incurred by interviewing and hiring. On the surface, analyzing the cost of this algo-rithm may seem very different from analyzing the running time of, say, merge sort.

The analytical techniques used, however, are identical whether we are analyzing cost or running time. In either case, we are counting the number of times certain basic operations are executed.

Interviewing has a low cost, say ci, whereas hiring is expensive, costing ch. Let-ting m be the number of people hired, the total cost associated with this algorithm is O.cin C c_hm/. No matter how many people we hire, we always interview n candidates and thus always incur the cost cin associated with interviewing. We therefore concentrate on analyzing chm, the hiring cost. This quantity varies with each run of the algorithm.

This scenario serves as a model for a common computational paradigm. We of-ten need to ﬁnd the maximum or minimum value in a sequence by examining each element of the sequence and maintaining a current “winner.” The hiring problem models how often we update our notion of which element is currently winning.

Worst-case analysis

In the worst case, we actually hire every candidate that we interview. This situation occurs if the candidates come in strictly increasing order of quality, in which case we hire n times, for a total hiring cost of O.chn/.

Of course, the candidates do not always come in increasing order of quality. In fact, we have no idea about the order in which they arrive, nor do we have any control over this order. Therefore, it is natural to ask what we expect to happen in a typical or average case.

Probabilistic analysis

Probabilistic analysis is the use of probability in the analysis of problems. Most commonly, we use probabilistic analysis to analyze the running time of an algo-rithm. Sometimes we use it to analyze other quantities, such as the hiring cost

in procedure HIRE-ASSISTANT. In order to perform a probabilistic analysis, we must use knowledge of, or make assumptions about, the distribution of the inputs.

Then we analyze our algorithm, computing an average-case running time, where we take the average over the distribution of the possible inputs. Thus we are, in effect, averaging the running time over all possible inputs. When reporting such a running time, we will refer to it as the average-case running time.

We must be very careful in deciding on the distribution of inputs. For some problems, we may reasonably assume something about the set of all possible in-puts, and then we can use probabilistic analysis as a technique for designing an efﬁcient algorithm and as a means for gaining insight into a problem. For other problems, we cannot describe a reasonable input distribution, and in these cases we cannot use probabilistic analysis.

For the hiring problem, we can assume that the applicants come in a random order. What does that mean for this problem? We assume that we can compare any two candidates and decide which one is better qualiﬁed; that is, there is a total order on the candidates. (See Appendix B for the deﬁnition of a total or-der.) Thus, we can rank each candidate with a unique number from 1 through n, using rank.i / to denote the rank of applicant i , and adopt the convention that a higher rank corresponds to a better qualiﬁed applicant. The ordered list hrank.1/;

rank.2/; : : : ; rank.n/i is a permutation of the list h1; 2; : : : ; ni. Saying that the applicants come in a random order is equivalent to saying that this list of ranks is equally likely to be any one of the nŠ permutations of the numbers 1 through n.

Alternatively, we say that the ranks form a uniform random permutation; that is, each of the possible nŠ permutations appears with equal probability.

Section 5.2 contains a probabilistic analysis of the hiring problem.

Randomized algorithms

In order to use probabilistic analysis, we need to know something about the distri-bution of the inputs. In many cases, we know very little about the input distridistri-bution.

Even if we do know something about the distribution, we may not be able to model this knowledge computationally. Yet we often can use probability and randomness as a tool for algorithm design and analysis, by making the behavior of part of the algorithm random.

In the hiring problem, it may seem as if the candidates are being presented to us in a random order, but we have no way of knowing whether or not they really are.

Thus, in order to develop a randomized algorithm for the hiring problem, we must have greater control over the order in which we interview the candidates. We will, therefore, change the model slightly. We say that the employment agency has n candidates, and they send us a list of the candidates in advance. On each day, we choose, randomly, which candidate to interview. Although we know nothing about

the candidates (besides their names), we have made a signiﬁcant change. Instead of relying on a guess that the candidates come to us in a random order, we have instead gained control of the process and enforced a random order.

More generally, we call an algorithm randomized if its behavior is determined not only by its input but also by values produced by a random-number gener-ator. We shall assume that we have at our disposal a random-number generator RANDOM. A call to RANDOM.a; b/ returns an integer between a and b, inclu-sive, with each such integer being equally likely. For example, RANDOM.0; 1/

produces 0 with probability 1=2, and it produces 1 with probability 1=2. A call to RANDOM.3; 7/ returns either 3, 4, 5, 6, or 7, each with probability 1=5. Each inte-ger returned by RANDOMis independent of the integers returned on previous calls.

You may imagine RANDOM as rolling a .b a C 1/-sided die to obtain its out-put. (In practice, most programming environments offer a pseudorandom-number generator: a deterministic algorithm returning numbers that “look” statistically random.)

When analyzing the running time of a randomized algorithm, we take the expec-tation of the running time over the distribution of values returned by the random number generator. We distinguish these algorithms from those in which the input is random by referring to the running time of a randomized algorithm as an ex-pected running time. In general, we discuss the average-case running time when the probability distribution is over the inputs to the algorithm, and we discuss the expected running time when the algorithm itself makes random choices.

Exercises

5.1-1

Show that the assumption that we are always able to determine which candidate is best, in line 4 of procedure HIRE-ASSISTANT, implies that we know a total order on the ranks of the candidates.

5.1-2 ?

Describe an implementation of the procedure RANDOM.a; b/ that only makes calls to RANDOM.0; 1/. What is the expected running time of your procedure, as a function of a and b?

5.1-3 ?

Suppose that you want to output 0 with probability 1=2 and 1 with probability 1=2.

At your disposal is a procedure BIASED-RANDOM, that outputs either 0 or 1. It outputs 1 with some probability p and 0 with probability 1 p, where 0 < p < 1, but you do not know what p is. Give an algorithm that uses BIASED-RANDOM

as a subroutine, and returns an unbiased answer, returning 0 with probability 1=2

and 1 with probability 1=2. What is the expected running time of your algorithm as a function of p?

在文檔中 ALGORITHMS INTRODUCTION TO (頁 135-139)