Some Reflections on the Mark-up and Analysis of Dūnhuáng Manuscripts:


Academic year: 2022

Some Reflections on the Mark-up and Analysis of Dūnhuáng Manuscripts:

Exemplified by the Platform Sūtra

Christoph Anderl (University of Ghent) Kevin Dippner (Malakoff High School)

Ø ystein Krogh Visted (Jiao Xie Center for Chinese Culture and Language) Abstract

This paper deals with several questions and problems related to the editing, digitization and analysis of Buddhist Dūnhuáng texts. The Dūnhuáng corpus of Chán (Zen) manuscripts is the most important source for the study of the early history of this Chinese Buddhist school. The authors discuss paleographic and textual features of the manuscripts and investigate several possibilities of TEI-compatible mark-up concerning the collation, translation, annotation, and semantic and syntactic analysis of this type of manuscript literature, in addition to methods of transformations into visual media. The approaches are exemplified by an experimental mark-up of the Dūnhuáng versions of the Platform Sūtra. In the second part of the paper, the newly initiated Chan Database Project is introduced and collaborative methods of dealing with Chán literature are discussed. In the appendix to the paper, the system of phonetic loans, as well as scribal conventions and errors in the manuscript versions of the Platform Sūtra are described and compared.


Platform Sūtra of the Sixth Patriarch (Liuzǔ Tanjing), Dūnhuang Manuscripts, Phonetic Loan Characters, Analytic Mark‐up, Zen Buddhism


Chung-Hwa Buddhist Journal Volume 25 (2012)


Christoph Anderl (根特大學) Kevin Dippner (馬拉科夫高中) Ø ystein Krogh Visted (交諧中國文化語言中心) 摘要

此篇文章處理有關佛教敦煌文獻的編輯,數位化及分析上的問題,而其中有關 禪的文集是研究中國佛教宗派早期歷史的重要資源。作者討論寫本的古文書學及文 本性質並探討許多與文獻編碼協定(TEI)可相容性之標記的各種可能性,而除了影 像的轉化外,這些是有關此類文獻的校對、翻譯、註解、語意和句法之分析。其方 法是以《壇經》的敦煌文本之實驗性的標記為例。而此文的第二部分則是介紹新近

開始的禪學資料專案(Chan Database Project) 及討論處理有關禪學文集的協作方





The Significance of Dūnhuáng Manuscripts


In ca. 1900, thousands of manuscripts were found behind a wall of the Mògāo 莫高 Cave 16/17 (Dūnhuáng, Gānsù Province, China). Soon after, most of the manuscripts were removed from China by several expeditions from Great Britain, France, Russia, and Japan.

Today, the majority of the Dūnhuáng manuscripts are stored at various institutions such as the British Library (Stein Collection) and the Bibliothèque Nationale (Pelliot Collection), as well as collections in Russia (The Institute of Oriental Manuscripts), Japan (e.g., Ryūkoku Univ., Kyoto), and China (e.g., The Dūnhuáng Academy, The National Library of China in Běijīng; Běijīng University Library; there are also collections in Tiānjīn, Shànghăi, and other places in China).2 Especially since after World War II

‘Dūnhuáng studies’ have developed into a major field of research and today numerous individual scholars and institutions are investigating the textual and iconographic materials from a variety of perspectives.

The manuscripts are one of our most important sources for the study of medieval Chinese religion and culture. Whereas most of the Chinese manuscripts consist of copies of canonical Buddhist scriptures, there is also a significant amount of texts on popular religion, as well as sectarian texts. Many of these non-canonical texts were not transmitted after the Táng Dynasty and the Dūnhuáng materials give us a unique window for studying Buddhist history, doctrine and practice from ca. the 7th to the 10th centuries.

Texts of the early Chán 禪 Schools, Esoteric Buddhism, Buddho-Daoist texts, ‘popular’

Chinese religion and related topics (including devotional and ritual texts, almanacs, prognostication and astronomical texts, talisman manuals, etc.) have received special attention among scholars.

Until the discovery of the Dūnhuáng texts, our understanding of the early history of Chán was to a great degree based on much later Sòng Dynasty materials and the retrospective understanding of Táng Chán during that period.3 The study of the

1 We want to thank the two anonymous reviewers of the article for their many helpful comments.

2 For a very good introduction to Dūnhuáng studies and the history of the manuscripts, see the following webpage (‘The International Dunhuang Project’): http://idp.bl.uk/. 10.000s of manuscripts and manuscript fragments are digitized in high quality and freely downloadable (the digitization of the Pelliot and Stein collections is nearly complete, whereas only parts of the Russian and Chinese collections are included so far). The digitized manuscripts are most conveniently found by manuscript number, other search functions of the webpage are unfortunately only at a rudimentary stage.

3 The Sòng versions of Táng materials were often heavily revised and altered, and, retrospectively, a Sòng Dynasty understanding of the development of the Chán School(s) was imposed on earlier materials. Táng texts which did not fit the doctrinal or historiographic


Chung-Hwa Buddhist Journal Volume 25 (2012)

Dūnhuáng Chán texts revolutionized the study of the early period in the evolution of Chán. However, despite the immense progress of Chán studies from the 1970s to the 1990s there are still many texts which have not been properly edited, analyzed or translated, and many problems pertaining to the texts have not been solved.4

The Scholarly Value of Dūnháng Manuscripts

The manuscripts are not only an important source for the study of medieval Chinese Buddhism but also for research in the development of the semantics and syntax of medieval Chinese, including colloquial grammatical constructions (classifier constructions, plural formation, coverb constructions, sentence finals, etc.).

There are certain types of Dūnhuáng manuscripts which contain a considerable amount of vernacular elements, most importantly the so-called Transformation Texts (biànwén 變 文 )5 and related genres. Also certain types of Chán treatises contain important information of the development of medieval vernacular Chinese (e.g., the treatises attributed to Shénhuì and his disciples, and the Lìdài fǎbǎo jì 歷代法寶記).6 As such, these materials are important sources for the study of the transition from treatises written in Buddhist Hybrid Chinese to more vernacular types of narratives (many of these texts are characterized by containing a considerable portion of passages with direct speech).7

Copied by hand, the manuscripts are equally important for the study of palaeography during the Táng period, in addition to scribal conventions and errors, the study of phonetic loans, dialects, and vernacularisms. Medieval manuscripts are a significant source for reconstructing the development of Middle Chinese with its colloquial vocabulary and vernacular grammatical constructions. Many grammaticalized function words still current in Modern Mandarin and other modern varieties of Chinese originated during the late Táng (or, more precisely, surfaced in texts during that time). Thus, some

standards of the Sòng Dynasty were often not transmitted at all (on “text sanitation” during the transition period from Táng to Sòng, see for example Anderl 2012a, 16-26).

4 E.g., the interdependence between texts; there are also few properly collated and annotated texts at this point, and many textual and philological problems have only been touched upon.

5 On the genre of Transformation Texts, see for example Mair (1989).

6 For a recent excellent study of that text, see Adamek (2007).

7 Naturally, vernacular elements appear in passages recording direct speech and as such reflecting the spoken word to some degree. This can be also observed in another early vernacular text dating from the middle of the 10th century, the Zǔtáng jí 祖堂集 (ZTJ). In this text, the frame narratives are usually using a more conservative language whereas many of the passages in direct speech are written in the vernacular (on aspects of the language of ZTJ, see Anderl 2004; more generally, on the features of vernacular Chán texts, see Anderl 2012a).


manuscripts contain many early written forms of function words used in spoken Chinese.

Since many of these function words were representations of words used in the spoken language, Chinese characters were loaned in order to present their phonetic value. It was usually not before the Sòng period that specific characters were created to represent these colloquial words. A good example is the appearance of the pronoun shénme 什麼 (什么) which was written in various forms on Dūnhuáng manuscripts, e.g., 是沒 (Dūnbó 77), 是 摩 / 甚摩 (Stein 2503), 甚謨 (Stein 2669), 甚物 / 甚沒 (Bǎolín zhuàn 寶林傳, 801 AD), 甚麼 (10th cent.). Dūnhuáng Chán materials reflect different degrees of colloquialisms, depending on the period they were written in and which genre they belong to.

The Chan Database Project (CDP)

The recently initiated CDP8 aims at electronically publishing Chán texts with a critical apparatus and a set of analytical modules. In this paper, certain strategies and problems concerning this aim will be discussed. Although a variety of Chán texts (including the printed editions starting from the Sòng Dynasty) are included in this project, one of the major challenges will be the technical and analytical framework for the publication of the corpus of the Dūnhuáng Chán manuscripts. In this paper, only a few problems will be addressed and illustrated by an experimental edition of the Dūnhuáng manuscripts of the famous Platform sūtra.9 The aim was the production of a collated and annotated version of the Dūnhuáng Platform sūtra which allowed annotations and comments on several aspects of the text.

One of the motivations for the initiation of such a project was the realization that—

despite the above described importance of the manuscripts in terms of Buddhist and linguistic studies—there are frequently no authoritive and collated editions of many important manuscript texts, and often the philological and linguistic aspects have been somewhat neglected in the study of the materials. In many studies of Chinese Buddhist texts in the West, there seems to be an overall contrast to the approach taken in the research on Sanskrit Buddhist texts and Gāndhārī manuscripts, for example (which shows a strong emphasis on thoroughly edited texts and philological studies).10 Not only being a

8 This project was originally initiated by the late John McRae, Christian Wittern, and Christoph Anderl, and aims at creating and applying tools for editing and analyzing Chán/Zen Buddhist texts, as well as organizing collaboration within the field of Chán/Zen Buddhist text studies.

9 This work on the Platform sūtra edition was originally started as a master class on Buddhist Dūnhuáng texts at Oslo University taught by Christoph Anderl, with Christian Wittern (Kyoto University) supervising the work on TEI compatibility and programming. The basic programming and transformation of the xml mark-up was done by Kevin Dippner. The mark- up and anaylsis was done by Anderl and Visted. We want to thank all participants of the course for their helpful comments.

10 An exception to this tendency is the study of (early) Buddhist translation literature in China;


Chung-Hwa Buddhist Journal Volume 25 (2012)

purpose in itself, thorough philological research on the texts will reflect back on our understanding of their contents, as well as being helpful in contextualizing them historically and intertextually.11

Some Important Features of the Manuscript Texts

Variant Characters

The study of character variants has developed into a significant subfield in the study of Dūnhuáng manuscripts and the materials are important sources for the study of the orthography and writing conventions of the Táng period. The history of many ‘non- standard’ characters is extremely complex and important for deciphering the texts.

Historically, many Chinese characters which served as models for establishing the abbreviated characters in the process of the language reforms in 20th century China, were actually based on ‘vulgar’ (and other) forms of Táng and Sòng characters, in addition to

‘ancient’ forms of characters which were revived during these periods. After the Táng, Dūnhuáng texts gradually ceased to circulate in China and many forms of characters typical for Dūnhuáng writing conventions were forgotten or became obsolete. On the other hand, many character forms were transmitted to Japan and continued to circulate there until modern times.12 By recording the palaeographic features of the manuscripts

these studies are deeply influenced by the philological approach of Sanskrit/Pāli studies.

11 Specifically, modern Chán Buddhist studies in the West often seem somewhat reluctant to approach texts also from a linguistic and philological angle, occasionally resulting in interpretations and translations based on a fragmentary understanding of the language they are written in. Part of the problem is maybe the fact that there is hardly any systematic training in the semantics and syntax of Buddhist Hybrid or Medieval Vernacular Chinese at Western universities. These types of texts are in many respects fundamentally different from texts written in ‘Literary Chinese’ (for a good contrastive case study, see for example Harbsmeier 2012; for a grammar of the vernacular language of the 10th century, see Anderl 2004).

12 Interesting examples are the contractions (for púsà 菩薩 ‘bodhisattva’), (for nièpán 涅槃 ‘nirvāṇa’), and (for pútí 菩提 ‘bodhi’) which were widely used in Dūnhuáng texts but eventually ceased to be used in China. However, these characters continued to circulate in Japan and are nowadays even frequently recognized by non-specialists! For a list of special characters used in Japanese Buddhist manuscripts, see Ui (1983). The history of many Dūnhuáng variants needs further investigation. Dictionaries such as the Lóngkān shŏujìng 龍 龕 手 鏡 (10th century) were criticized by scholars of subsequent periods for containing unusual Chinese character forms. However, after the discovery of the Dūnhuáng manuscripts in 1900 it became clear that the motivation for the compilation of this dictionary aimed at providing the reader with the correct pronunciation of characters, as well as providing reference to non-standard characters widely circulating on handwritten manuscripts and inscriptions. Even for early Sòng Buddhists themselves, it had become difficult to understand texts written in countless different forms of characters. Establishing the ‘correct’ (zhèng 正)


and collecting them in a database, the development of the Chinese characters during these periods can be studied in a more systematic way.13 In addition, orthography and calligraphy can be an important factor in dating the copies of the manuscripts.

In many Dūnhuáng materials, multiple forms of the same character can appear in the very same text. Below, there are a few examples of character forms appearing in the beginning section of the Stein (left) and Dūnbó (right) versions of the Platform sūtra:14

Scribal Errors and Conventions

By contrast to the often heavily edited and revised printed Chán scriptures of the Sòng period (many of them eventually being integrated in the official Buddhist canon sanctioned by the imperial court), Dūnhuáng Chán manuscripts were copied by hand and—besides giving us information about the early stages of a text’s formation—are a rich source for studying scribal conventions during different periods of the Táng dynasty, in addition to errors and inaccuracies typical for the process of copying. The study and identification of these typical errors and misreadings (for a few examples, see below) facilitate the reading of handwritten manuscripts and the identification of corrupt

pronunciation and form was of great concern for the Buddhist scholars during the Táng and later periods; on the one hand for reasons of philological concerns (there was an amazingly high level of insight by many Buddhist scholars concerning the phonological, palaeographic, and semantic aspects of texts), on the other hand based on the assumption that only correctly pronounced characters/words were soteriologically efficient (especially in the dhāraṇī and mantra texts which became greatly popular among all Buddhists from the 8th century onwards).

13 On a discussion of character databases, see the article by Christian Wittern in this volume.

14 There are both differences in character shapes internally (i.e., within the same text) as well as compared to the other manuscripts.


Chung-Hwa Buddhist Journal Volume 25 (2012)

passages. Dūnhuáng manuscripts are also a rich source for studying conventions of adding diacritics and markers in the texts. During the process of editing texts during the Sòng dynasty, these markers (including section markers) were usually removed. Thus, Táng dynasty manuscripts give us important information not only on the process of copying but also on the conventions of reading the texts15 (often, markers are inserted by the reader or monastery librarian rather than the copyist).16 A rich source for errors is the similarity of characters in their handwritten forms which—in the process of copying—

are confused with each other.

Dūnhuáng manuscripts are also an very important source for the oral features of texts and the phonetic loans used in them (for a list of phonetic loans in the Platform sūtra, see the Appendix to the article). An important subtype are dialect phonetic loans which appear in a number of manuscripts and usually reflect the language of the Northwestern regions during the periods of the Táng Dynasty.

Some Important Aspects in the Digitization of Buddhist Manuscripts

The digitization of Buddhist texts and the availability of manuscript facsimile have progressed immensely during the recent years. This opens for the possibility to develop tools for enhancing our understanding of these texts and manuscripts through an analytical ‘fine-reading’.

Analytical Modules

The multi-faceted features (paleography, orthography, linguistic and Buddhological aspects, etc.) of manuscript study call for flexible approaches in the study of the

15 E.g., there are ‘performance markers’ (text portions usually inserted with smaller characters) in the manuscripts, suggesting that the scripture was used in ritual contexts related to the bestowal of the precepts/commandments. The inserted passage informs the reader how often sets of precepts have to be recited unisono during the ceremony. These markers are usually not extant in the Sòng editions.

16 For an interesting study of these markers, see Galambos (forthcoming). For a more thorough forthcoming study on these features of the Platform sūtra, see Anderl (2012b). In this paper, I also try to show that a thorough philological approach can unravel new aspects of a text.

Concretely, a study of the textual features, internal structure, and intertextual relations (i.e., certain features typical for ‘esoteric’ texts can be found) of the Platform manuscripts suggest certain re-evaluations of the text, for example, the possibility that the title Tánjīng 壇經 (Platform sūtra) originally did not refer to the text itself at all, but rather to the Diamond sūtra, a text which was especially important in the Platform rituals of conferring the Mahāyāna precepts at large congregations. As such, the text itself originated possibly as a commentary to the Diamond sūtra, and the Platform sūtra only gradually developed an

‘internal’ reference to itself (for a detailed forthcoming study, see Anderl 2012b).


materials.17 The development and implementation of XML-based markup seems to accommodate many needs in this respect, including analytic ‘modules’ for different purposes, the possibility for constant revision, multiple transformations and visualizations, as well as entering into an interactive dialogue with the ‘text consumer’ or fellow- researcher.18

Some Objectives for the Study of Chán Texts

- Web-based editions of important Chán manuscripts and texts can be permanently updated, extended, and revised.

- Once developed, the edited texts can be analyzed by a set of analytical tools (e.g., syntactic analysis, terminology/dictionary tools, ‘text dependency’ analysis, character analysis).

- Chán materials in non-Chinese languages (e.g., Tibetan, Uighur, Tangut, etc.)—which are of great importance for the development of this branch of Buddhism in the East Asian context—have so far been rather neglected in Chán studies.

- Manuscripts give us a unique insight in the processes of text production and reproduction (as opposed to extant printed texts edited and ‘sanitized’ during the Sòng period, for example). A thorough documentation of these features is the basis of a better understanding of these processes. A documentation of textual features is not only important for palaeographic and linguistic studies but also in the framework of religious studies; e.g., the textual build-up and structure can give us important information on the development of a text, which again might reflect the evolution of doctrines, lineage systems, for example. In addition, the study of textual features can be important for the

17 A similar approach was taken in a recently initiated database project on Buddhist narratives at the Ruhr University Bochum (The Mercator/Ceres Database of Buddhist Narratives; edited by Christoph Anderl and Jessie Pons). Based on the diversity of the materials (both textual and iconographic materials, in addition to information on locations), a system of dynamically interconnected sets of sub-collections was used in the XML database. According to specific needs arising during the concrete work with the iconographic and textual materials, custom- tailored tools and modules are developed and implemented (e.g., input masks for subsets of data, analytical tools, visualizations, etc.). The ca. 20 sub-databases are held together by a system of ‘labels’ for narratives, texts/manuscripts, and places (which can be interconnected to each other). The internal research database has been online since 2011, whereas a public version will be published in November 2012.

18 As it is also pointed out in other contributions, the XML approach also contains certain difficulties, such as the necessity to follow a strictly hierarchical build-up and nesting. Thus, multiple mark-up of the same text might overlap and offend against this rule. A ‘module’

approach could facilitate the work on the text, i.e., different aspects of the same text are analyzed and marked-up separately (“stand-off” mark-up; as a by-product, the reader can activate or deactivate specific modules when reading the text). Another problem is naturally the time-consuming aspect of implementing analytical mark-up to texts. As such, questions of quantity versus analytical quality have to be constantly considered and balanced.


Chung-Hwa Buddhist Journal Volume 25 (2012)

dating of texts, as well as for linking and ‘contextualizing’ them within a corpus/group of texts.19

- The analysis of Chinese characters: The Táng Dynasty witnessed the emergence of numerous new character forms (specifically vulgar and abbreviated forms of Chinese characters).

- Syntactic analysis (see below).

- The development of Chán terminology: The mark-up and registration of Chán terminology in the relevant texts can provide researchers with important information of the evolution of terms.

- A ‘text dependency’ module will enable the mark-up of relationships between texts and parallel passages. This will facilitate the study of the often complex relations between texts or text portions and also aid in the dating of the manuscript texts. Such a tool would also help researchers to retrace the origin, development, and interdependence of themes, topics, ideas, and concepts as they appear in texts from various periods. Ideally, instead of marking-up text portions or narrative sections by hand, dependent texts could be automatically identified by sets of overlapping items.

- Dictionary module (e.g., the linking with internal referential databases or external databases such as the DDB).20

19 See also the Appendix to the paper: the study of manuscripts features can give us important information on the actual function of texts, e.g., the emphasis on ‘orality’ and ritual functions (as indicated by ‘performance markers’ which were often removed in edited and printed versions of texts).

20 On the Digital Dictionary of Buddhism (DDB), see Charles Muller’s article in this volume.


Illustration 1: Library building at Haein-sa 海印寺 where the Tripiṭaka Koreana is stored (Second Kǒryo 高麗 edition; also referred to as Chaejo Taejanggyǒng 再調大藏經). The project was initiated in 1236 by King Kojong 高宗 in order to secure help from Buddhas and Bodhisattvas against a pending invasion of Korea by foreign armies (i.e., a project in the context of

‘state-protecting Buddhism’). The work of carving the 81.258 wood blocks (most of them carved on both sides, amounting to 162.516 surfaces) lasted until 1251. One woodblock measures ca.

67x23 cm and is ca. 3 cm thick, weighing around 3,5 kg. There are typically 23 lines carved on each surface, each line consisting of 14 Chinese characters (ca. 322 per surface), totaling about 52.330.000 characters. After having disappeared from China during the Song dynasty, the text survived in Korea and was carved in the 15th century as part of the ‘supplementary canon’ of the Tripiṭaka Koreana. However, the text was never printed before the printing blocks were rediscovered in the beginning of the 20th century in Korea. ZTJ (which is one of our main sources of early Chán historiography) was carved on 386 surfaces (ca. 190.000 characters). Today, the canon is still stored in the library building which dates back to the 15th century. There was an attempt to move the printing blocks to a modern library facility but within weeks the woodblocks started to decay and had to be returned to the old building. The original building appears to have been designed intuitively to provide ideal storage conditions (e.g., windows of different size insure natural ventilation; a special kind of moisture-absorbing clay which covered the floor; the way the woodblocks are arranged on shelves; etc.).21

21 Photograph by C. Anderl; on the background of the printing of ZTJ, see Anderl (2004, 1:2-52).


Chung-Hwa Buddhist Journal Volume 25 (2012)

Illustration 2: Detail of a printing-block of ZTJ; scribes outlined each character on the woodblock in mirror-writing and afterwards the wood surrounding each character was chiseled out; the tool marks are still recognizable on the blocks; the wood (birch tree) is of exceptional hardness and was especially prepared for carving during a process lasting several years (photograph by C. Anderl).

Work-steps in the Establishment of a Chán Database:

- Determining the text corpus22 - Input and text collation

- Linking of facsimiles with digital editions

- Basic mark-up and linking the text with reference materials (e.g. information on proper names, Buddhist terms, etc.)

22 The most important groups of materials consist of (1) Dūnhuáng texts, (2) the printed texts of

‘classical’ Sòng Dynasty Chán (including primarily historical transmission texts (chuándēng lù 傳燈錄), recorded sayings texts (yǔlù 語錄), and collections (gōngàn 公案); (3) materials which complement and contextualize the above materials, e.g., letter-exchanges between monks and officials, descriptions of Chán Buddhism in non-Buddhist materials, funeral and pagoda inscriptions, imperial edicts, Neo-Confucian yǔlù, ritual texts, texts on monastic rules, iconographic materials, lineage charts and other diagrams, etc. Another important aspect is the inclusion of non-Chinese materials (e.g., in Tibetan, Tangut, Uighur). Whereas the corpus of (2) is relatively easy to determine, it is considerable more difficult to pinpoint the relevant Dūnhuáng manuscript materials. The point of departure are the texts listed in Yanagida Seizan’s Zenseki kaidai 禪籍解題 (Nishitani, Keiji 西谷啟治/Yanagida, Seizan 柳田聖山 1974, 445-514). This list was recently expanded by Tanaka, Ryoshū; see also Sørensen (1989) for a discussion of early Chán materials (with an emphasis on the esoteric texts). There needs to be done more research concerning the manuscripts stored in the minor collections (e.g., the collections of the Peking University and the Peking National Library, and those in Shànghǎi, Tiānjīn, Dūnhuáng, etc.).


- Development and implementation of analytical modules (terminology, syntactic analysis, text dependency,…)

- Collaboration, development of (multiple-user) ‘interfaces’,23 specific projects, etc.

Illustration 3: Experimental transformation of a Zǔtáng jí mark-up into an edited text parallel to the woodblock facsimile. Circled items mark place and personal names, respectively, and can be connected to referential databases on proper names. In addition, the edited text was linked with an XML version of Anderl’s grammar on ZTJ. Entries in the grammar are automatically matched with the text and the grey dots on top make the grammatical annotations by Anderl visible (the initial mark-up of ZTJ and the transformation/programming was done by Christian Wittern; this version of ZTJ is currently off-line).

23 The implementation of input- and analysis-interfaces for specific tasks can facilitate the work on the mark-up considerably, as compared to the time-consuming work in programs such as Oxygen.


Chung-Hwa Buddhist Journal Volume 25 (2012)

Illustration 4: This diagram shows the complex interrelation between the manuscript and printed versions of the Platform sūtra (the diagram is drawn based on Yáng Zēngwén’s reconstruction of the genealogy of the text).24

The Mark-up of the Platform Sūtra:


Many Chán texts exist in several versions, having varying textual features. An important issue for analytical web editions will be the collation of these manuscripts and the inclusion of other important witnesses (on the Platform sūtra versions, see ill. 4; for a short description, see the bibliography).25

In the concrete work on the Platform scripture one of the specific problems was related to the question how the label <lem> should be applied. All manuscripts of the Dūnhuáng text contain a great amount of errors, phonetic loans, and corrupt passages.

The <lem> labels was—somewhat atypically—used for marking an ‘ideal’ reading of the text; thus it is the ‘reconstruction’ of an ideal textual version according to the view of the

24 Yáng (1993, 297) and Lǐ (1999a, 19).

25 In the work on the text, it was attempted to include all extant manuscript witnesses (Or.8210/S.5475, Dūnbó 77, BD.48; the Lǚshùn manuscript was recently ‘rediscovered’ in China; however, no facsimile reproductions were accessible during the work on the text), in addition to occasional references to Sòng printed versions. For a description of the manuscripts, see Anderl (2012b); for the Sòng editions, see Schlütter (2007, 394-405).


editors. The differing readings of the other witnesses are added with the <rdg> label. In future versions of the web publication there will be the choice to read the text according to one specific manuscript version or to read an ‘ideal’ text with notes on the readings of the differing versions.

Illustration 5: Portion of the Platfom sūtra mark-up and manuscript collation in Oxygen. Note that sentence and phrase borders are generated with the <s> and <phr> tags. The basic mark-up contains references to personal names (‘persName’, subdivided into several categories), title (‘roleName’, with subdivisions), place names (‘placeName’), and terms (‘term’, with subdivisions).

The collation within the apparatus <app> includes references to an ‘ideal’ reading according to the editors and mostly based on a manuscript witness. If all manuscripts have ‘corrupt’ readings, than a

<lem> reading according to a later Sòng edition and/or the editors is established (e.g., <lem wit="#Editor">). Notes on the collation and the witnesses are inserted with <witDetail>, including references to the secondary literature. Additions, notes, deletions, etc. are also recorded in the manuscript description.


Chung-Hwa Buddhist Journal Volume 25 (2012)

Example of Recording and Commenting Different Readings:

<app><lem wit="#Stein_5475"> </lem><rdg wit="#Dunbo_77" type="errShape"

xml:id="w093-02"> </rdg><witDetail target="#w093-02" wit="#Dunbo_77">The characters 葉 and 業 are frequently confused with each other in Dùnhuáng treatises. Note that they have the same pronunciation and at the same time are similar in shape with each other. As such, this is a a “mixture” of errShape and phonLoan, or a case where characters are habitually interchanged with each other although they do not have a direct connection with each other.</witDetail></app>

Within the apparatus (<app>) the lemma (<lem>) establishes the ‘correct’ reading according to the witness “#Stein_5475”, whereas the corrupt’ reading in the Dunbo_77 manuscript (wit=“#Dunbo_77”) is cited within <rdg>, with references to the type of corruption (type=“errShape”, i.e. based on the an confusion of handwritten characters).

Details on the type of corruption are provided in <witDetail>.

Example of Recording a Scribal Intervention:

<app><lem wit="#Stein_5475 #Huixin"></lem><rdg wit="#Dunbo_77" type="annotation" hand="reader"

rend="small"><add place="right"></add></rdg></app>

In this example the ‘correct’ reading (<lem>) is indicated as the absence of a character (by the lack of any information between the <lem></lem> tags) which is incorrectly inserted in Dunbo_77 manuscript on the right side (place=“right”) by an unidentified

‘reader’ of the manuscript (this can be for example either the copyist himself, a later reader or a temple librarian who archived the manuscript, hand=“reader”), rendered in small characters (rend=“small”).

XSL defining the transformation into HTML for the <app> element (including

<lem>, <rdg>, <witDetail>, etc.), with inserted programming commands in Javascript:

<xsl:template match="tei:app">

<div class="balloonstyle" id="{generate-id(.)}">


<xsl:apply-templates select="tei:rdg"/>

<xsl:apply-templates select="tei:witDetail"/>


<a rel="{generate-id(.)}" onclick="right_side('{generate-

id((preceding::tei:pb[@ed='#Stein_5475'])[last()])}','{generate-id(.)}');"><xsl:apply-templates select="tei:lem"/></a>


<xsl:template match="tei:lem">

<font color="00bb00"><xsl:apply-templates/></font>


<xsl:template match="tei:rdg">

<script type="text/javascript">document.write(getWitName("<xsl:value-of select="@wit"/>"));</script>


<script type="text/javascript">document.write(getRdgErrorType("<xsl:value-of



<xsl:text>: </xsl:text>


<br />


<xsl:template match="tei:witDetail">




<xsl:template match="tei:teiHeader">

<xsl:variable name="witnesstext"><xsl:apply-templates select="//tei:witness"/></xsl:variable>

<script type="text/javascript">

function newWindow() {

var generator=window.open('','vindu','height=500,width=600,scrollbars=1');


generator.document.write('&lt;html>&lt;head>&lt;title>Witness details&lt;/title>&lt;/head>');

generator.document.write('&lt;body bgcolor="#aaaaaa"><h2>Witness details</h2><br/><xsl:value-of select="normalize-space($witnesstext)"/>');


} </script>

<a href="javascript:newWindow();"><div align="center"><b>View witness details</b></div></a>


<xsl:template match="tei:witness">

<xsl:text disable-output-escaping="yes">&lt;h3></xsl:text><xsl:value-of select="@xml:id"/><xsl:text>&lt;/h3></xsl:text>

<xsl:variable name="a">'</xsl:variable>

<xsl:variable name="b">"</xsl:variable>

<xsl:value-of select="translate(., $a, $b)"/>



Chung-Hwa Buddhist Journal Volume 25 (2012)

Illustration 6: A ‘tripartite’ visualization of the marked-up text: On the left, the facsimile reproduction of the manuscript passage; in the middle, the collated version of the text, circled passages indicate parts where the manuscripts have different readings. The ‘ideal’ reading (<lem>) of the text can be chosen, or one of the readings recorded in the <rdg> section. By clicking on the green text portions the information on different readings is projected to the right column. Proper names are underlined. Translations and notes in the middle can be shown or hidden. In upcoming versions, the digitized text will be arranged vertically. Mark-up and text collation by C. Anderl and Ø . K. Visted; transformation/programming by K. Dippner (with support by C. Wittern). In order to encourage scholarly collaboration and permanent revision of the entries, future versions envisage a

‘comment box’ (concretely, the above entry could be modified by noting that wú 吾 actually did not become “obsolete” after the Hàn but that the usage of the pronoun decreased until the Middle Táng period).

- As part of the collation process, the differences between the witnesses were analyzed and categorized (phonetic loans; erroneous characters because of similar shapes; added characters; scribal interventions, etc.). Since this type of mark-up is very time- consuming other possibilities for collating texts should be considered, e.g., the digitization of electronic versions of different manuscripts which successively are

‘overlapped’ and a record of the differences automatically generated. As a second step, these differences have to be ‘manually’ analyzed. In addition, specific interfaces for mark-up work could be developed.


Typology of Textual Features in Manuscript Collations:

- General ‘visual’ features, i.e., information about paper features, writing tools, text arrangement, general character size, characters per column/line, alignment of columns/lines, features of the title section, calligraphic/paleographic information: the description of these important features are difficult to integrate in the formalized collation itself; alternatively, more ‘narrative’ descriptions of manuscript sections could be useful, or an integration in the ‘head’ section of the mark-up. As a useful aspect of the ‘tripartite’ visual presentation of the material, these features can be directly viewed in the facsimile reproduction represented to the left.

- Markers and scribal interventions26 (punctuation, repetition markers, markers for reversing reading sequence (e.g. ), markers for superfluous characters (e.g. ), scratched out characters ( 27), empty spaces, inserted characters, small-sized characters): information on these features is integrated in the ‘collation’ part of the manuscripts.

Example of a passage with characters inserted to the right side of the column/line: As an interesting feature, the text in small characters also includes repetition markers (rm) which do not mark the repetition of a single characters, but the group of characters preceding it (and, in addition, this group extending beyond sentence borders): this being the case, the passage must be analyzed in the following way:

…五祖[弘忍rm 和尚 rm]問惠能… > … 五祖弘忍和尚。弘忍和尚問惠能…

- Textual variations and ‘deviations’: this includes information on ‘missing’ characters, superfluous characters, corrupted characters,28 superfluous characters, phonetic loans, the wrong sequence of characters: An important aspect here is not only the recording of these deviations but also reflections on their typology and causes.29 Other variations

26 It is sometimes difficult to decide by which ‘hand’ these interventions were inserted, either by the copyist himself (who read through his copy of the manuscript), by an owner/reader, or by a temple-librarian. Sometimes, manuscripts have layers of interventions and annotations.

27 Stein 5475:03.01; Stein 5475:20.04.03.

28 Corruptions are often caused by the speed of the copying process, and by the decreasing capacity of concentration in the course of copying a text. Many of the corruptions are inherited from one copy to the next, and in some cases become even fixed parts of a text. One special type of corruption concerns the ‘miscopying by context’, i.e., the copyist copies a characters which appears in the columns/lines to the right or left. Another corruption could be called ‘miscopying based on conventionalized sequences’ and often appears in disyllabic terms/words: the copyist replaces a somehow unusual character combination with one which is ‘fixed’ in his mind, e.g., frequently used Buddhist terms.

29 For a typology of phonetic loan characters and the miscopying based on vernacular, handwritten forms of the characters, see the Appendix.


Chung-Hwa Buddhist Journal Volume 25 (2012)

encountered consists of the replacement of characters by (near-)synonyms or the replacement of a term/concept by a related term/concept.

Examples for Frequently Miscopied Characters, Based on Their Hand- written Forms

令 > 今 (Stein 5475: 04-01-09) 伐 > 代 (Stein 5475: 05-03-02; etc.30) 特 > 持 (Stein 5475: 04-02-05)

白 > 自 (Stein 5475: 05-02-10; 05-04-02) 偈 > 但 ( > Stein 5475: 09-01) 記 > 訖 (Stein 5475: 04-11-17)31

Some of the Many Handwritten ‘Vulgar’ Forms of Characters Found in the Platform Manuscripts:32

zuì 最 (modification/replacement of the determinative and right part of the phoneticum)

bān 般 (modification of the upper right part of the phoneticum, typical for handwritten/inscribed forms during that period)

jīng 經 (abbreviation of the phonetic part)

xiàng 相 (replacement of the determinative and modification of the phoneticum)33

jiān 兼 (modification/replacement of the lower part of the character) shēng 昇

30 This error can be found throughout the manuscript! For a thorough list of this type of errors, see the table in the Appendix.

31 Note that the error is also motivated by the fact that the compound 集記 appeared earlier in the manuscript (‘error generated by the context’).

32 Recently, many good reference works on Dūnhuáng variant characters have been published in the PRC. A very good resource is also the ‘The Dictionary of Chinese Character Variants’

(, recording more than 100.000 different variants and providing references to dozens of historical dictionaries (of major importance in this respect is the 10th century Lóngkān shŏujìng 龍龕手鏡).

33 In the handwriting of many Dūnhuáng manuscripts, the number of strokes within ‘boxes’ is often modified, and structural elements such as 目 and 日 become undistinguishable.

鍵 入 文 件 的 引 文 或 重 點 的 摘 要

。 您 可 以 將 文 字 方


zuò 座 (modification of the left upper part of the phoneticum, 人 > 口, typically the same modification appears in other character containing the phoneticum 坐; compare also the right upper part of bān above.)

xué 學 (a typical way of writing 學 in certain Dūnhuáng manuscripts; it is not incidentally that the replacement wén 文 ‘pattern; Chinese character;

literature’ is chosen for the character meaning ‘to study’; this is actually an ancient form of this character.)

zōng 宗 (an odd variant form of this characters, replacing both the determinative and modifying the phonetic part)

zhǐ 旨 (‘slight’ modification of the upper part) dì 遞 (a radical abbreviation of the phonetic part)

- The edition should be flexible enough and allow annotations and comments on several levels (multiple translations; multiple comments; linguistic analysis,…). These modules can be made visible or excluded, according to the interests of the reader.

Tripartite Structure

An important question is how to ideally structure and visualize the edition of such a text.

Also in this respect, the flexibility of XML is convenient since different types of visualization can be generated according to specific purposes (e.g., printed editions, different types of web editions, ‘working’ editions, etc.). For our project, the following solution was chosen: on the left side, a reproduction of the original (inhibited by copy right limitations; in the text version only the Stein version is visible); in the middle, the edited and collated text; on the right side, the annotations to the textual features (see ill. 6).

Some Notes on Syntactic Analysis

One of the challenges of the CDP is to find proper methods for recording the textual and linguistic features of Dūnhuáng texts, in addition to providing other analytical tools.

Many manuscripts pose great problems in terms of linguistic analysis, also due to the fact that many texts have heterogeneous (hybrid) features, i.e., integrating a variety of syntactic and semantic features based on a variety of styles, genres, and periods of language development. The section on grammatical mark-up in the TEI manuals is in this respect not fully developed yet and maybe also has to be better adapted to non-European


Chung-Hwa Buddhist Journal Volume 25 (2012)

languages.34 For consequent syntactic mark-up it would be also necessary to develop visual adds and interfaces for specific analytical purposes.

Ideally, there should be the possibility for a layered analysis which covers different features of a text, e.g., the mark-up of syntactic units and the relationship between them, the identification and analysis of grammatical function words, the marking of modal and style features, etc. These reflections on useful grammatical analysis are still in a very tentative stage since considerable technical problems are involved.

In terms of Literary Chinese/Buddhist Chinese, an ‘immediate constituent’ approach for the analysis of sentences seems to be useful since the sentence structure fits well to the hierarchal structure of XML mark-up. As such, the syntactic units are identified and their relationship between them determined. This kind of approach could be enormously useful as an aid for producing more analytical approaches to Buddhist texts and eventually more reliable translations.

Another promising approach is the implementation of an underlying narrative grammar in XML-format which is linked to the texts (as described in the example above, where in a collaborative effort a mark-up version of ZTJ by Wittern was linked to a XML version of Anderl’s grammar on the text).35

In the course of the work on the Platform sūtra, several possibilities concerning the linguistic mark-up were considered. However, these consideration are only in an experimental stage (one problem is also the time-consuming aspect of this mark-up).

34 For a very interesting approach for the mark-up of Old Japanese see the article by Kerri L Russell and Stephen Wright Horn in this volume.

35 After the transformation, the XML file of the grammatical notes still has to be ‘cleaned-up’

for the next version.


Illustration 7: Mark-up of a sentence in the Platform sūtra; <s> and <phr> are used in order to indicate the phrase structure and constituents are broken down until word level (<w>), specified with ‘type’ and ‘subtype’; further specification by ‘function’ and ‘ana’ elements ; ‘next’ and ‘prev’

are untypically (in terms of their definition in the TEI manual) used to indicated relations between immediate constituents; in future version, this will be replaced by ‘links’ (which will be used to define the relations between the phrases).

Illustration 8: Possible ‘visualization’ of a grammatical mark-up based on the immediate constituent analysis; successive analytical ‘break-down’: sentences level, phrase level, word level, etc. The relationship between the constituents is indicated by a set of symbols.


Chung-Hwa Buddhist Journal Volume 25 (2012)

Appendix: A Comparison of Some Textual Features of the Platform Manuscripts

Conventions Used in the Table with Notes on the ‘Northwestern’ Dialect

In the table below, the variations in the use of Chinese characters in the four manuscripts are compared.36 The addition and deletion of characters and other aspects of important differences between the manuscripts are not taken into account here.37 The focus is on phonetic loans, alterations of parts of the characters (such as the determinative or phonetic parts of the Chinese characters) and on mistakes made by the copyists based on similar (and often ‘vernacular’) shapes of the characters in the handwritings. There is also a minor category marked with ‘c’, indicating mistakes based on the context in which the characters appear.38

In addition to the registration of the ‘dialect phonetic loans’ it was attempted to analyze the system of ‘regular phonetic loans’ as well. Occasionally, it was difficult to determine whether a character variation was caused by an alteration of the determinative part (a very common phenomenon encountered in Dūnhuáng manuscripts) or should rather be interpreted as a phonetic substitution. It can be shown that except the rather high number of dialect loans and a few number of other uncommon phonetic loans, the manuscripts of the Platform sūtra generally use a system of more or less established phonetic substitutions, some having a very long tradition. As such, the use of phonetic loan characters is by no means arbitrary in the manuscripts.39

Attention has been given to the uncommon phonetic loans based on the dialect of the Northwestern region during the late Táng period. These loans are marked with ‘*’ and

36 In the table, the Dūnbó 77 manuscript is abbreviated to ‘D.’, Stein 5475 to ‘S.’, the Běijīng manuscript to ‘B.’, the Lǚshùn manuscript to ‘L.’ (for a discussion of these manuscript copies, see Anderl 2012b). To the left, the assumed ‘correct’ character is listed. References to the later Kōshōji (‘K.’, reflecting the Huìxīn version, based on Yampolsky’s edition) and Zōngbǎo (‘Z.’) editions are only provided occasionally for purposes of comparison. It also nicely illustrates how loans and mistakes were ‘normalized’

or ‘sanitized’ in the Sòng versions of the Platform sūtra (on these issue, see also Schlütter 1989 and Anderl 2012a, 16-26). The characters are usually listed according to their first appearance in the manuscripts, however, phenomena such as phonetic loans which are related to each other are grouped together (the characters taken out of their order of appearance are marked with ‘/’). This method aims at allowing a more direct comparison and illustrating

‘clusters’ of phonetic loans, for example.

37 Concerning this aspect of the manuscripts, see Anderl (2012b).

38 E.g., the case when the copyist mistakenly inserts a character which also appears in the right or left line/column.

39 References to two large dictionaries on phonetic loans have been used in the anal ysis of the system of loan characters (Loan 1 and Loan 2, see the bibliography).


references to explanations in Dèng and Róng (1999) are provided. These loans are of great importance for determining the regional character of the manuscript copies and the differences in the use of this kind of loans among them. Although the Stein, Dūnbó and Běijīng manuscripts all use dialect loans, it is very obvious that they are most commonly used in the Stein manuscript (i.e., the ‘*’ appears most frequently in the ‘S.’ column of the table). The abundant use of regular and dialect loans also shows the important role of

‘orality’ in this type of manuscripts, i.e., the recording of the ‘sound’ of these texts was more important than focusing on orthography and finding the ‘standardized’ characters.

This phenomenon can be observed in many Dūnhuáng manuscripts but seems to be especially current in texts originating during the Táng period (as, for example, the Chán treatises).40 A such, there is an abundant use of phonetic loans in this rather short text, in

40 Luó, Chángpéi 羅常 培 (1933) was one of the first who tried to reconstruct the North- Western dialect based on a selection of Buddhist scriptures. However, the sources he had available for this purpose were rather limited. Later on, these dialect studies were expanded based on the identification of an ever-growing number of Dūnhuáng manuscripts in which dialect loans were detected. The most important scholar in this respect is Takata Tokio (e.g., Takata 1987 and 1988). He discerns two specific types of dialects which can be detected on Dūnhuáng materials, first, the dialect based on the language of Cháng’ān, the capital of Táng China. The ‘standard’ colloquial language of that time was based on this dialect, and also current in Dūnhuáng until it came under the control of Tibet (787 AD). The other one is the Héxī 河西 dialect. This dialect is also referred to as North-Western (Xīběi 西北) dialect which started to prosper after the relations to the central government of China were cut.

According to Takata, the dialect was also influenced by elements of the Tibetan language (e.g., zhū 諸 was pronounced ‘ci’). The usage of the dialect was at its height after 851 when Dūnhuáng became a quasi-independent area.

Typical for the dialect loans used in the Dūnhuáng Platform sūtra, especially the Stein version, are the features that syllables with a nasal final ‘-ng’ are not distinguished from those without, resulting in homophones such as mí 迷- míng 名, tǐ 體 – tīng 聽, dì 第 – dìng 定, xī 西 – xīng 星, lǐ 禮 – lìng 令, etc. In addition, the initial consonants (shēngmǔ 聲母) of the 端 – 定 and the 審 – 心 categories are not differentiated, as well as the finals (rhymes) of the 侵 and 庚 groups (see Dèng and Róng 1999, 25-26; for other studies concerning the Northwestern dialect, see for example Shào Róngfēn 1963; for more bibliographic references, see Dèng and Róng 1999, 39-40).

More recently, Takata (2000) has drawn attention to the heavy influence of the Tibetan language during the period of the Dūnhuáng occupation, and the 10th century when Dūnhuáng was quasi-independent and communication to Central China reduced to a minimum. Large copying projects were initiated by the Tibetans (especially during 815 -841, ibid:7) and bilingual communities (Chinese-Tibetan) were prospering. Eventually, many Chinese would even use the Tibetan writing system for writing Chinese! “What is important here is the fact that the tradition of writing Chinese and the Tibetan script established during the period of Tibetan rule was still maintained in the tenth century under Return -to- Allegiance Army of the Cáo.” (ibid.:9). The developments outlined by Takata might as well be one of the factors that are reflected in the complex textual features of the late copies of


Chung-Hwa Buddhist Journal Volume 25 (2012)

addition to exchanges of parts of the characters such as the determinatives (for example in Dūnhuáng manuscripts the exchange between the ‘tree’ 木 and ‘hand’ 扌 determinatives is frequently encountered), the many passages where characters are mistakenly left out or added, and the many corrupt passages based on the copyists’ misreading of the handwritten characters. These are all factors which make parts of the Dūnhuáng versions of the Platform sūtra difficult to decipher and understand.

The corrupt characters based on copyists’ errors are marked with ‘#’ in the table.

Although it is clear that the Stein manuscript has a larger amount of corrupt characters in this category, the Dūnbó manuscript nevertheless also contains plentiful of mistakes based on misreadings and a wrong interpretations of character forms.41 A comparison of the use of phonetic loans and the number and type of corrupt characters also shows that the Dūnbó and Běijīng manuscripts are clearly closer to each other concerning their textual features (although by no means identical!).42

Many confusions concerning the copying of characters are caused by the use of

‘vernacular’ forms of characters and the structural similarities between them. Within the scope of this paper a thorough analysis of the orthography and paleographic features cannot be included here. Generally, it can be observed that there are major differences concerning the calligraphy and choice of character forms between the Stein and Běijīng manuscripts. In addition to the differences between the individual manuscripts, there are also significant internal differences, i.e., several forms of the same character are used in the same manuscript. The calligraphy of the Dūnbó manuscript (and also the Běijīng manuscript) is without doubt more ‘tidy’ and somewhat less ‘vernacular’ than the characters on Stein.

the Platform sūtra, which include many oral and dialect features, a particular system of phonetic loans, vernacular and often faulty orthography, and all kinds of textual corruptions.

41 Especially in Chinese secondary literature, the Stein manuscript is referred to as ‘bad copy’

(èběn 惡本), as opposed to the ‘good’ Dūnbó and Běijīng manuscripts. Another aspect of this judgment is the fact that the amount of mistakenly added or deleted characters is somewhat smaller on the Dūnbó manuscript, in addition to the much more even style of writing and text arrangement and the use of less distorted character forms as compared to the Stein manuscript. The Stein manuscript, on the other hand, often gives the impression that it was copied in a hasty and sloppy way.

42 A quantitative analysis is also difficult in this respect since in the Běijīng manuscript only ca.

one third of the text is extant.



