指標與陣列

(1)

指標與陣列

柯向上 Josh Ko 2005.12.28 核心概念

指標與陣列之間的關係，可算是 C 語言最有趣的設計之一。但 C/C++的學習者往往沒辦法掌握關鍵所在，而弄不清楚指標、陣列的互換關係。說穿了，陣列實用上只有以下一個規則：

出現在算式之中的陣列名稱可被隱喻轉換為「指向陣列第一個元素」的指標。

就是這個規則而已。（事實上本文至此就可以結束了 ☺。）我在文中暫且把它稱為「(the) Fundamental

Rule」。除此之外，陣列沒有別的操作了。別的操作、應用都是由這個規則以及其他 C/C++語言基本規

則組合推導而得。

所謂「Array Subscripting」

我說：「除此之外，陣列沒有別的操作」，第一個丟出來的問題一定是：「那 a[k]這動作怎麼說？」事實上，subscripting 運算子（operator[]）從來就不是針對陣列，而是指標的操作。當我們寫

a[k]

因為這是個算式，根據 Fundamental Rule，a 可以轉換為指標，進而施行 subscripting。後面就較為大眾所熟知了：對於一個指標 p，若寫

p[k]

這個算式完全等價於

*(p + k)

於是存取到陣列 a 的第 k 項元素。又因為指標與整數的加法具有交換性，上面的算式可寫為

*(k + p)

於是等價於

k[p]

所以如果寫

(2)

k[a]

以此存取 a 的第 k 項元素，也毫無問題。此時，要把 operator[]解釋成針對陣列的操作，恐怕就比較困難了。☺

陣列引數傳遞

首先我們必須了解：不能以一個陣列初始化另一個陣列。也就是說

int a[5];

int c[5] = a; // 錯誤：不能以陣列初始化另一個陣列

第二行無法通過編譯。而函式引數傳遞的方式，是以引數將參數初始化。舉例：

void f(int j){

// ...

}

int i;

// ...

f(i);

進行呼叫時，f 的參數 j 會以對應引數 i 的值進行初始化。陣列無法進行初始化，因此不能當作函式參數使用。

那麼：

void g(int[]); // 或是 void g(int[N]);

這又是什麼呢？這參數是隻披上羊皮的狼（對很多初學者而言 ☺），我們知道它不可能是個陣列參數。很

多人都清楚，這個參數事實上是個指標：

void g(int*);

當我們呼叫 g 時：

int a[5];

g(a);

因為 a 出現在算式中，而 g 的參數是個指標，於是 Fundamental Rule 介入，進行 array-to-pointer 轉換，

實際傳入函式的是「指向 a 的第 0 項元素」的指標。

當 C++ reference 出現時，情形變得比較不一樣，但仍未脫出 Fundamental Rule 的規範。以下手法相當常見：

(3)

template<typename T, size_t N>

inline size_t array_size(const T (&)[N]){

return N;

}

這個 function template 可用來取得一個靜態陣列的元素個數。因為這個 function template 的參數是個 reference to array，可以用陣列進行初始化，所以進行呼叫時，是以貨真價實的陣列把參數初始化。可能有人問：「那 Fundamental Rule 不就沒派上用場？」哈，Fundamental Rule 在此是沒派上用場，但也沒有錯：

出現在算式之中的陣列名稱可被隱喻轉換為「指向陣列第一個元素」的指標。

也就是說，陣列在必要時可進行轉換，但在array_size的例子裡，陣列不必轉換就已適用，真的進行轉換還會出問題呢。再舉一個C++ templates的例子（[6]，p. 169）：

template<typename T>

const T& max(const T& a, const T& b);

std::cout << max("Apple", "Pear") << std::endl;

呼叫者顯然認為"Apple"和"Pear"兩個 string literals 會以 const char*的形式傳入，但因為 max 的兩個參數都是 reference，所以無須進行 array-pointer 轉換，兩個參數的 T 分別會被推導為 const char[6]

和 const char[5]，型別不合，於是 template argument deduction 失敗。

又例如 C 取得陣列元素個數的慣用手法：

#define ARRAYSIZE(a) (sizeof(a) / sizeof((a)[0]))

sizeof(a)也是個算式，如果 a 先被轉換為指標，整個算式就走樣了。

當然，這種不施行 array-to-pointer 轉換的程式碼相對而言較為少見。

證據在此

為了避免有人不服氣，我摘錄幾段權威的描述作為佐證。首先是C++03 Standard [1]裡面所提的 array-to-pointer轉換：

4.2 Array-to-pointer conversion [conv.array] (p. 60)

An lvalue or rvalue of type “array of N T” or “array of unknown bound of T” can be converted to an rvalue of type “pointer to T.” The result is a pointer to the first element of the array.

正是 Fundamental Rule 的嚴格定義版。接下來，看看關於 subscript 運算子的部份：

(4)

A postfix expression followed by an expression in square brackets is a postfix expression. One of the expressions shall have the type “pointer to T” and the other shall have enumeration or integral type. The result is an lvalue of type “T.” The type “T” shall be a completely-defined object type. The expression E1[E2] is identical (by definition) to *((E1)+(E2)).

C99 Standard [2]在這部份的文字處理比較有趣：

6.5.2.1 Array subscripting (p. 70) Constraints

One of the expressions shall have type “pointer to object type”, the other expression shall have integer type, and the result has type “type”.

Semantics

A postfix expression followed by an expression in square brackets [] is a subscripted designation of an element of an array object. The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that apply to the binary + operator, if E1 is an array object (equivalently, a pointer to the initial element of an array object) and E2 is an integer, E1[E2] designates the E2-th element of E1 (counting from zero).

就高階語意而言，的確沒錯，因為 pointer arithmetic 本來就應該針對「指向陣列元素的指標」施行。不過就底層語意而言，operator[]仍然只能施行於指標之上。C99 Standard 在此段下面補充一個例子：

EXAMPLE

Consider the array object defined by the declaration int x[3][5]; Here x is a 3 x 5 array of ints; more precisely, x is an array of three element objects, each of which is an array of five ints.

In the expression x[i], which is equivalent to (*((x)+(i))), x is first converted to a pointer to the initial array of five ints. Then i is adjusted according to the type of x, which conceptually entails multiplying i by the size of the object to which the pointer points, namely an array of five int objects. The results are added and indirection is applied to yield an array of five ints. When used in the expression x[i][j], that array is in turn converted to a pointer to the first of the ints, so x[i][j] yields an int.

最後一句的「that array」指的是 x[i]。由此例子可看出，任何陣列在施行 subscripting 之前，必然都會先被轉換為指標，符合我先前的描述。

最後是 C++03 Standard 裡面關於函式宣告的「陣列參數」部份：

8.3 Meaning of declarators / 8.3.5 Functions [dcl.fct] (p. 139)

... The type of a function is determined using the following rules. The type of each parameter is

(5)

determined from its own decl-specifier-seq and declarator. After determining the type of each parameter, any parameter of type “array of T” or “function returning T” is adjusted to be “pointer to T” or “pointer to function returning T, ” respectively. ...

可見沒有「陣列參數」這種東西。

講古

最後提一點 C 演化出陣列、指標的歷史，作為終曲。C 的 predecessor 是 B，B 的 predecessor 是 BCPL，

後兩者都沒有型別的概念（typeless）。在 BCPL 和 B 的世界裡，記憶體是一塊塊大小固定的 cell，呈線性排列。變數基本上都當作整數看待，所有的操作都不分型別：例如看到 operator+，就對兩個 cells 執行機器的整數加法…等等。

BCPL 和 B 的記憶體模型呈線性排列，又因為 cell 沒有型別，一個 cell 裡面存放的值可能是單純的數值資料，也可能是另一個 cell 在記憶體內的 index。因此，若對一個 cell 施行 unary operator*，就會把該 cell 所存的值當作 index，跳到這個 index 所示的位置去。因為每個 cell 都是整數，所以把一個 cell 的值加上某個值之後再施行 unary operator*，就會存取到鄰近的 cell。這就是最原始的指標和指標算術。

由此我們也可看出，指標與整數的加法具有交換律，歷史上是將整數加法（因為 BCPL/B 沒有型別概念）

的交換律套上型別而演化出來的。*(a + k)看起來比較冗贅，BCPL 於是將此縮寫為 a!k，B 對此的縮寫則是 a[k]。這在數學領域有個恰當的 counterpart：ak，也就是一般對 C 陣列「取下標」的想法。

BCPL 和 B 的陣列語意很有趣。在 B 裡，

auto V[10];

這個述句會配置出 11 個 cells：首先配置出一個名為 V 的 cell，再另外配置 10 個連續的 cells，最後把陣列第一個 cell 的 index 存於 V 中。所以，*V 就是陣列的第一個元素，也就是*(V + 0)，也就是 V[0]。

因為 C 繼承了 BCPL 和 B 的陣列設計，也就是以「指向第一個元素的指標」存取陣列，因此如果以慣常方法詮釋 a[i]的話，就會產生「陣列元素編號從 0 開始」的感覺。V 本身是個變數，所以在 B 裡，我們甚至可以寫

V = V - 1;

「V 這個陣列」的 index 上下限就變成了 1 到 10。

到了C，陣列的語意有了相當大的轉變，但對使用方式幾無影響：在BCPL和B裡建立陣列時，必須實際儲存一個指標，指向陣列的第一個元素。但C不再將這個指標儲存起來，而直接把陣列名稱看作是指向陣列第一個元素的指標。除了對陣列名稱的assignment以外，其他的陣列操作語法都不受影響。直接引用原文 [4]：

(6)

(record) types. Structures, it seemed, should map in an intuitive way onto memory in the machine, but in a structure containing an array, there was no good place to stash the pointer containing the base of the array, nor any convenient way to arrange that it be initialized. For example, the directory entries of early Unix systems might be described in C as

struct { int inumber;

char name[14];

};

I wanted the structure not merely to characterize an abstract object but also to describe a collection of bits that might be read from a directory. Where could the compiler hide the pointer to name that the semantics demanded? Even if structures were thought of more abstractly, and the space for pointers could be hidden somehow, how could I handle the technical problem of properly initializing these pointers when allocating a complicated object, perhaps one that specified structures containing arrays containing structures to arbitrary depth?

The solution constituted the crucial jump in the evolutionary chain between typeless BCPL and typed C. It eliminated the materialization of the pointer in storage, and instead caused the creation of the pointer when the array name is mentioned in an expression. The rule, which survives in today’s C, is that values of array type are converted, when they appear in expressions, into pointers to the first of the objects making up the array.

This invention enabled most existing B code to continue to work, despite the underlying shift in the language’s semantics. The few programs that assigned new values to an array name to adjust its origin – possible in B and BCPL, meaningless in C – were easily repaired. More important, the new language retained a coherent and workable (if unusual) explanation of the semantics of arrays, while opening the way to a more comprehensive type structure.

結語

根據我自己的C/C++學習經驗，學習程式語言時，若能多研究其語法語意（例如Fundamental Rule），

並觀察其歷史演進（例如typeless BCPL/B到typed C的種種轉變）和重要設計決策（例如BCPL/B/C的

陣列語意演化），將能對該語言有更深一層了解，運用起來也會更加得心應手。初學者常問的一些基本問

題（例如「陣列index為何從 0 起算？」)很多都能因此而豁然開朗。在此推薦 [4]、[5]，分別描述C和C++

的演化歷程與重大決策，相當值得一讀。

(7)

參考資料

[1] ISO/IEC 14882:2003, International Standard – Programming Languages – C++.

[2] ISO/IEC 9899:1999, International Standard – Programming Languages – C.

[3] The C Programming Language, 2/e, by Brian W. Kernighan and Dennis M. Ritchie, Prentice Hall PTR 1988.

[4] The Development of the C Language, by Dennis M. Ritchie, Second History of Programming Languages conference, Cambridge, Mass., April, 1993.

[5] The Design and Evolution of C++, by Bjarne Stroustrup, Addison-Wesley 1994.

[6] C++ Templates 全覽，侯捷、榮耀、姜宏合譯，碁峰 2004。

原文本：C++ Templates: the Complete Guide, by David Vandevoorde and Nicolai M. Josuttis, Addison-Wesley 2002.