Operations on Lists - Utility Functions

Utility Functions

4.3 Operations on Lists

partly because we can notate it with a single character: “+”. Even if we believe that a construct is expensive, we will often prefer it to a cheaper one if it will cut our writing effort in half.

In any language, the “tendency towards brevity” will cause trouble unless it is allowed to vent itself in new utilities. The shortest idioms are rarely the most efficient ones. If we want to know whether one list is longer than another, raw Lisp will tempt us to write

(> (length x) (length y))

If we want to map a function over several lists, we will likewise be tempted to join them together first:

(mapcar fn (append x y z))

Such examples show that it’s especially important to write utilities for situations we might otherwise handle inefficiently. A language augmented with the right utilities will lead us to write more abstract programs. If these utilities are properly defined, it will also lead us to write more efficient ones.

A collection of utilities will certainly make programming easier. But they can do more than that: they can make you write better programs. The muses, like cooks, spring into action at the sight of ingredients. This is why artists like to have a lot of tools and materials in their studios. They know that they are more likely to start something new if they have what they need ready at hand. The same phenomenon appears with programs written bottom-up. Once you have written a new utility, you may find yourself using it more than you would have expected.

The following sections describe several classes of utility functions. They do not by any means represent all the different types of functions you might add to Lisp. However, all the utilities given as examples are ones that have proven their worth in practice.

4.3 Operations on Lists

Lists were originally Lisp’s main data structure. Indeed, the name “Lisp” comes from “LISt Processing.” It is as well not to be misled by this historical fact, however. Lisp is not inherently about processing lists any more than Polo shirts are for Polo. A highly optimized Common Lisp program might never see a list.

It would still be a list, though, at least at compile-time. The most sophisti-cated programs, which use lists less at runtime, use them proportionately more at

4.3 OPERATIONS ON LISTS 45

(proclaim ’(inline last1 single append1 conc1 mklist)) (defun last1 (lst)

(car (last lst))) (defun single (lst)

(and (consp lst) (not (cdr lst)))) (defun append1 (lst obj)

(append lst (list obj))) (defun conc1 (lst obj)

(nconc lst (list obj))) (defun mklist (obj)

(if (listp obj) obj (list obj)))

Figure 4.1: Small functions which operate on lists.

compile-time, when generating macro expansions. So although the role of lists is decreased in modern dialects, operations on lists can still make up the greater part of a Lisp program.

Figures 4.1 and 4.2 contain a selection of functions which build or examine lists. Those given in Figure 4.1 are among the smallest utilities worth defining.

For efficiency, they should all be declared inline (page 26).

The first, last1, returns the last element in a list. The built-in function last returns the last cons in a list, not the last element. Most of the time one uses it to get the last element, by saying (car (last ...)). Is it worth writing a new utility for such a case? Yes, when it effectively replaces one of the built-in operators.

Notice that last1 does no error-checking. In general, none of the code defined in this book will do error-checking. Partly this is just to make the examples clearer.

But in shorter utilities it is reasonable not to do any error-checking anyway. If we try:

> (last1 "blub")

>>Error: "blub" is not a list.

Broken at LAST...

the error will be caught by last itself. When utilities are small, they form a layer of abstraction so thin that it starts to be transparent. As one can see through a thin

layer of ice, one can see through utilities like last1 to interpret errors which arise in the underlying functions.

The function single tests whether something is a list of one element. Lisp programs need to make this test rather often. At first one might be tempted to use the natural translation from English:

(= (length lst) 1)

Written this way, the test would be very inefficient. We know all we need to know as soon as we’ve looked past the first element.

Next come append1 and conc1. Both attach a new element to the end of a list, the latter destructively. These functions are small, but so frequently needed that they are worth defining. Indeed, append1 has been predefined in previous Lisp dialects.

So has mklist, which was predefined in (at least) Interlisp. Its purpose is to ensure that something is a list. Many Lisp functions are written to return either a single value or a list of values. Suppose that lookup is such a function, and that we want to collect the results of calling it on all the elements of a list called data.

We can do so by writing:

(mapcan #’(lambda (d) (mklist (lookup d))) data)

Figure 4.2 contains some larger examples of list utilities. The first, longer, is useful from the point of view of efficiency as well as abstraction. It compares two sequences and returns true only if the first is longer. When comparing the lengths of two lists, it is tempting to do just that:

(> (length x) (length y))

This idiom is inefficient because it requires the program to traverse the entire length of both lists. If one list is much longer than the other, all the effort of traversing the difference in their lengths will be wasted. It is faster to do as longer does and traverse the two lists in parallel.

Embedded within longer is a recursive function to compare the lengths of two lists. Since longer is for comparing lengths, it should work for anything that you could give as an argument to length. But the possibility of comparing lengths in parallel only applies to lists, so the internal function is only called if both arguments are lists.

The next function, filter, is to some what remove-if-not is to find-if.

The built-in remove-if-not returns all the values that might have been returned if you called find-if with the same function on successive cdrs of a list. Analo-gously, filter returns what some would have returned for successive cdrs of the list:

4.3 OPERATIONS ON LISTS 47

(defun longer (x y) (labels ((compare (x y)

(and (consp x) (or (null y)

(compare (cdr x) (cdr y)))))) (if (and (listp x) (listp y))

(compare x y)

(> (length x) (length y))))) (defun filter (fn lst)

(let ((acc nil)) (dolist (x lst)

(let ((val (funcall fn x))) (if val (push val acc)))) (nreverse acc)))

(defun group (source n)

(if (zerop n) (error "zero length")) (labels ((rec (source acc)

(let ((rest (nthcdr n source))) (if (consp rest)

(rec rest (cons (subseq source 0 n) acc)) (nreverse (cons source acc))))))

(if source (rec source nil) nil)))

Figure 4.2: Larger functions that operate on lists.

> (filter #’(lambda (x) (if (numberp x) (1+ x)))

’(a 1 2 b 3 c d 4)) (2 3 4 5)

You give filter a function and a list, and get back a list of whatever non-nil values are returned by the function as it is applied to the elements of the list.

Notice that filter uses an accumulator in the same way as the tail-recursive functions described in Section 2.8. Indeed, the aim in writing a tail-recursive function is to have the compiler generate code in the shape of filter. For filter, the straightforward iterative definition is simpler than the tail-recursive one. The combination of push and nreverse in the definition of filter is the standard Lisp idiom for accumulating a list.

The last function in Figure 4.2 is for grouping lists into sublists. You give group a list l and a number n, and it will return a new list in which the elements

of l are grouped into sublists of length n. The remainder is put in a final sublist.

Thus if we give 2 as the second argument, we get an assoc-list:

> (group ’(a b c d e f g) 2) ((A B) (C D) (E F) (G))

This function is written in a rather convoluted way in order to make it tail-recursive (Section 2.8). The principle of rapid prototyping applies to individual functions as well as to whole programs. When writing a function like flatten, it can be a good idea to begin with the simplest possible implementation. Then, once the simpler version works, you can replace it if necessary with a more efficient tail-recursive or iterative version. If it’s short enough, the initial version could be left as a comment to describe the behavior of its replacement. (Simpler versions of group and several other functions in Figures 4.2 and 4.3 are included in the note on page 389.)

◦

The definition of group is unusual in that it checks for at least one error: a second argument of 0, which would otherwise send the function into an infinite recursion.

In one respect, the examples in this book deviate from usual Lisp practice: to make the chapters independent of one another, the code examples are as much as possible written in raw Lisp. Because it is so useful in defining macros, group is an exception, and will reappear at several points in later chapters.

The functions in Figure 4.2 all work their way along the top-level structure of a list. Figure 4.3 shows two examples of functions that descend into nested lists.

The first, flatten, was also predefined in Interlisp. It returns a list of all the atoms that are elements of a list, or elements of its elements, and so on:

> (flatten ’(a (b c) ((d e) f))) (A B C D E F)

The other function in Figure 4.3, prune, is to remove-if as copy-tree is to copy-list. That is, it recurses down into sublists:

> (prune #’evenp ’(1 2 (3 (4 5) 6) 7 8 (9))) (1 (3 (5)) 7 (9))

Every leaf for which the function returns true is removed.

4.4 Search

This section gives some examples of functions for searching lists. Common Lisp provides a rich set of built-in operators for this purpose, but some tasks

4.4 SEARCH 49

(defun flatten (x) (labels ((rec (x acc)

(cond ((null x) acc)

((atom x) (cons x acc))

(t (rec (car x) (rec (cdr x) acc)))))) (rec x nil)))

(defun prune (test tree) (labels ((rec (tree acc)

(cond ((null tree) (nreverse acc)) ((consp (car tree))

(rec (cdr tree)

(cons (rec (car tree) nil) acc))) (t (rec (cdr tree)

(if (funcall test (car tree)) acc

(cons (car tree) acc))))))) (rec tree nil)))

Figure 4.3: Doubly-recursive list utilities.

are still difficult—or at least difficult to perform efficiently. We saw this in the hypothetical case described on page 41. The first utility in Figure 4.4, find2, is

◦

the one we defined in response to it.

The next utility, before, is written with similar intentions. It tells you if one object is found before another in a list:

> (before ’b ’d ’(a b c d)) (B C D)

It is easy enough to do this sloppily in raw Lisp:

(< (position ’b ’(a b c d)) (position ’d ’(a b c d)))

But the latter idiom is inefficient and error-prone: inefficient because we don’t need to find both objects, only the one that occurs first; and error-prone because if either object isn’t in the list, nil will be passed as an argument to <. Using before fixes both problems.

Since before is similar in spirit to a test for membership, it is written to resemble the built-in member function. Like member it takes an optional test argument, which defaults to eql. Also, instead of simply returning t, it tries to

(defun find2 (fn lst) (if (null lst)

nil

(let ((val (funcall fn (car lst)))) (if val

(values (car lst) val) (find2 fn (cdr lst)))))) (defun before (x y lst &key (test #’eql))

(and lst

(let ((first (car lst)))

(cond ((funcall test y first) nil) ((funcall test x first) lst)

(t (before x y (cdr lst) :test test)))))) (defun after (x y lst &key (test #’eql))

(let ((rest (before y x lst :test test))) (and rest (member x rest :test test)))) (defun duplicate (obj lst &key (test #’eql))

(member obj (cdr (member obj lst :test test)) :test test))

(defun split-if (fn lst) (let ((acc nil))

(do ((src lst (cdr src)))

((or (null src) (funcall fn (car src))) (values (nreverse acc) src))

(push (car src) acc))))

Figure 4.4: Functions which search lists.

return potentially useful information: the cdr beginning with the object given as the first argument.

Note that before returns true if we encounter the first argument before en-countering the second. Thus it will return true if the second argument doesn’t occur in the list at all:

> (before ’a ’b ’(a)) (A)

We can peform a more exacting test by calling after, which requires that both

4.4 SEARCH 51

its arguments occur in the list:

> (after ’a ’b ’(b a d)) (A D)

> (after ’a ’b ’(a)) NIL

If (member o l) finds o in the list l, it also returns the cdr of l beginning with o. This return value can be used, for example, to test for duplication. If o is duplicated in l, then it will also be found in the cdr of the list returned by member.

This idiom is embodied in the next utility, duplicate:

> (duplicate ’a ’(a b c a d)) (A D)

Other utilities to test for duplication could be written on the same principle.

More fastidious language designers are shocked that Common Lisp uses nil to represent both falsity and the empty list. It does cause trouble sometimes (see Section 14.2), but it is convenient in functions like duplicate. In questions of sequence membership, it seems natural to represent falsity as the empty sequence.

The last function in Figure 4.4 is also a kind of generalization of member.

While member returns the cdr of the list beginning with the element it finds, split-if returns both halves. This utility is mainly used with lists that are ordered in some respect:

> (split-if #’(lambda (x) (> x 4))

’(1 2 3 4 5 6 7 8 9 10)) (1 2 3 4)

(5 6 7 8 9 10)

Figure 4.5 contains search functions of another kind: those which compare elements against one another. The first, most, looks at one element at a time. It takes a list and a scoring function, and returns the element with the highest score.

In case of ties, the element occurring first wins.

> (most #’length ’((a b) (a b c) (a) (e f g))) (A B C)

For convenience, most also returns the score of the winner.

A more general kind of search is provided by best. This utility also takes a function and a list, but here the function must be a predicate of two arguments. It returns the element which, according to the predicate, beats all the others.

(defun most (fn lst) (if (null lst)

(values nil nil)

(let* ((wins (car lst))

(max (funcall fn wins))) (dolist (obj (cdr lst))

(let ((score (funcall fn obj))) (when (> score max)

(setq wins obj max score)))) (values wins max)))) (defun best (fn lst)

(if (null lst) nil

(let ((wins (car lst))) (dolist (obj (cdr lst))

(if (funcall fn obj wins) (setq wins obj))) wins)))

(defun mostn (fn lst) (if (null lst)

(values nil nil)

(let ((result (list (car lst))) (max (funcall fn (car lst)))) (dolist (obj (cdr lst))

(let ((score (funcall fn obj))) (cond ((> score max)

(setq max score

result (list obj))) ((= score max)

(push obj result))))) (values (nreverse result) max))))

Figure 4.5: Search functions which compare elements.

> (best #’> ’(1 2 3 4 5)) 5

We can think of best as being equivalent to car of sort, but much more efficient.

4.5 MAPPING 53

It is up to the caller to provide a predicate which defines a total order on the elements of the list. Otherwise the order of the elements will influence the result;

as before, in case of ties, the first element wins.

Finally, mostn takes a function and a list and returns a list of all the elements for which the function yields the highest score (along with the score itself):

> (mostn #’length ’((a b) (a b c) (a) (e f g))) ((A B C) (E F G))

4.5 Mapping

Another widely used class of Lisp functions are the mapping functions, which apply a function to a sequence of arguments. Figure 4.6 shows some examples of new mapping functions. The first three are for applying a function to a range of numbers without having to cons up a list to contain them. The first two, map0-n and map1-n, work for ranges of positive integers:

> (map0-n #’1+ 5) (1 2 3 4 5 6)

Both are written using the more general mapa-b, which works for any range of numbers:

> (mapa-b #’1+ -2 0 .5) (-1 -0.5 0.0 0.5 1.0)

Following mapa-b is the still more general map->, which works for sequences of objects of any kind. The sequence begins with the object given as the second argument, the end of the sequence is defined by the function given as the third argument, and successors are generated by the function given as the fourth argu-ment. With map-> it is possible to navigate arbitrary data structures, as well as operate on sequences of numbers. We could define mapa-b in terms of map-> as follows:

(defun mapa-b (fn a b &optional (step 1)) (map-> fn

#’(lambda (x) (> x b))

#’(lambda (x) (+ x step))))

(defun map0-n (fn n) (mapa-b fn 0 n)) (defun map1-n (fn n)

(mapa-b fn 1 n))

(defun mapa-b (fn a b &optional (step 1)) (do ((i a (+ i step))

(result nil))

((> i b) (nreverse result)) (push (funcall fn i) result))) (defun map-> (fn start test-fn succ-fn)

(do ((i start (funcall succ-fn i)) (result nil))

((funcall test-fn i) (nreverse result)) (push (funcall fn i) result)))

(defun mappend (fn &rest lsts)

(apply #’append (apply #’mapcar fn lsts))) (defun mapcars (fn &rest lsts)

(let ((result nil)) (dolist (lst lsts)

(dolist (obj lst)

(push (funcall fn obj) result))) (nreverse result)))

(defun rmapcar (fn &rest args) (if (some #’atom args)

(apply fn args) (apply #’mapcar

#’(lambda (&rest args)

(apply #’rmapcar fn args)) args)))

Figure 4.6: Mapping functions.

4.5 MAPPING 55

For efficiency, the built-in mapcan is destructive. It could be duplicated by:

(defun our-mapcan (fn &rest lsts)

(apply #’nconc (apply #’mapcar fn lsts)))

Because mapcan splices together lists with nconc, the lists returned by the first argument had better be newly created, or the next time we look at them they might be altered. That’s why nicknames (page 41) was defined as a function which “builds a list” of nicknames. If it simply returned a list stored elsewhere, it wouldn’t have been safe to use mapcan. Instead we would have had to splice the returned lists with append. For such cases, mappend offers a nondestructive alternative to mapcan.

The next utility, mapcars, is for cases where we want to mapcar a function over several lists. If we have two lists of numbers and we want to get a single list of the square roots of both, using raw Lisp we could say

(mapcar #’sqrt (append list1 list2))

but this conses unnecessarily. We append together list1 and list2 only to discard the result immediately. With mapcars we can get the same result from:

(mapcars #’sqrt list1 list2) and do no unnecessary consing.

The final function in Figure 4.6 is a version of mapcar for trees. Its name, rmapcar, is short for “recursive mapcar,” and what mapcar does on flat lists, it does on trees:

> (rmapcar #’princ ’(1 2 (3 4 (5) 6) 7 (8 9))) 123456789

(1 2 (3 4 (5) 6) 7 (8 9))

Like mapcar, it can take more than one list argument.

> (rmapcar #’+ ’(1 (2 (3) 4)) ’(10 (20 (30) 40))) (11 (22 (33) 44))

Several of the functions which appear later on ought really to call rmapcar, including rep on page 324.

To some extent, traditional list mapping functions may be rendered obsolete by the new series macros introduced inCLTL2. For example,

(mapa-b #’fn a b c) could be rendered

(defun readlist (&rest args) (values (read-from-string

(concatenate ’string "("

(apply #’read-line args)

")")))) (defun prompt (&rest args)

(apply #’format *query-io* args) (read *query-io*))

(defun break-loop (fn quit &rest args)

(format *query-io* "Entering break-loop.~%") (loop

(let ((in (apply #’prompt args))) (if (funcall quit in)

(return)

(format *query-io* "~A~%" (funcall fn in)))))) Figure 4.7: I/O functions.

(collect (#Mfn (scan-range :from a :upto b :by c)))

However, there is still some call for mapping functions. A mapping function may in some cases be clearer or more elegant. Some things we could express with map-> might be difficult to express using series. Finally, mapping functions, as functions, can be passed as arguments.

4.6 I/O

Figure 4.7 contains three examples ofI/Outilities. The need for this kind of utility varies from program to program. Those in Figure 4.7 are just a represen-tative sample. The first is for the case where you want users to be able to type in expressions without parentheses; it reads a line of input and returns it as a list:

> (readlist) Call me "Ed"

(CALL ME "Ed")

The call to values ensures that we get only one value back (read-from-string itself returns a second value that is irrelevant in this case).

在文檔中 Bottom-upDesign Preface (頁 53-66)