• 沒有找到結果。

A Study of “Original Enlightenment”

Lewis Lancaster

University of California, Berkeley, Emeritus

With the assistance of Howie Lan and Ping Auyeung University of California, Berkeley

ThE following disCoUrsE is based on a somewhat different ap-proach to the study of a doctrinal term in Buddhist texts. The informa-tion presented here has been gathered through a software interface developed at the University of California, Berkeley by team members lewis lancaster, howie lan, and Ping Auyeung. A two-year grant of support (2007–2009)1 was given by the national science foundation for the development of this tool. we have collaborated with the institute of Tripitaka Koreana2 in seoul, and they have generously shared scanned images of rubbings taken from the printing blocks at hae-in Monastery. The “software” makes use of a digital version of the pre-vious publication by lancaster and sungbae Park, The Korean Buddhist Canon: A Descriptive Catalogue.3 The digital version of the catalogue is the work of Charles Muller of Tokyo University, who has made it freely available on the internet.4 The interface project has also been a part of the Electronic Cultural Atlas initiative (ECAi)5 and received support from that group’s Atlas of Chinese Religions research, which was funded by the luce foundation and is in collaboration with the gis Center at Academia sinica in Taiwan.6 Continued research on devel-oping the interface used for this article is being pursued in cooper-ation with The school for Creative Media,7 AliVE,8 and the halliday Centre at the department of Chinese Translation and linguistics at City University of hong Kong.9 Additional support comes from col-laboration with research in the University of California at Berkeley College of Engineering.10 future expansion of the analytics is being

done through cooperation with staff and faculty members at Carnegie-Mellon University,11 rutgers University,12 and UClA13 through a sepa-rate grant from the national science foundation.14 This is an indication of the need for teamwork and collaboration as we find ways to make use of technology and computation in the humanities. The approach followed below could not have been possible without the technical help of howie lan and additional support from Ping Auyeung, and therefore they are rightly listed as co-authors.

The intention of the search and retrieval strategy described below is focused on the history of the appearance of the term/compound 本 覺, often translated as “original enlightenment.” it is an expression that has been studied in great detail and was chosen for this reason.

The works of Jacqueline stone15 and robert Buswell16 contain valuable accounts of the ways in which the term was used and interpreted from the seventh century onward. There is no attempt to reconstruct the careful discourse laid out in those volumes. rather, in this article, i have chosen to focus on the earliest appearances of the term as an at-tempt to trace the history of a word in Buddhist texts. This follows up on a lecture that i gave more than two decades ago under the title of

“The Question of Aprocryphal words in the Chinese Buddhist Texts.”17 At that time, i suggested that the term 本覺 should be considered

“apocryphal” since no sanskrit equivalent could be determined. in this paper, i attempt to address the issue once again in greater detail, using some of the tools that are under construction for tracing the patterns of occurrences of vocabulary in the canonic texts.

As we test the effectiveness of a new approach, it is important to match the computer and computational results with previous knowl-edge. Because of the need to compare different strategies of research, we are focusing on this particular compound that has a long history within the scholarly literature of the Buddhist tradition and studies.

The initial and crucial question that arises from the method is whether quantification of data has a role to play in the study of doc-trinal matters. what can be accomplished by computation of occur-rences of a term and displaying a report in visual form? in part, the motivation behind the development of this interface software has been the awareness that the deluge of data created by digital technology requires new ways of retrieval and analysis. scholars need tools that can help them quickly and efficiently use thousands of items identified by digital search.

The “work flow,” that is, the procedure by which a scholar ap-proaches a digital search and the results of that search, forms the structure of this article. The steps that have been used are one method of handling the technology and the data. The “work flow” procedure used for this software is somewhat different from the ordinary way of research, and thus it is necessary to outline the particular strategies in some detail.

when approaching a research problem, we have to make immedi-ate decisions about how to proceed. Academic training is structured along the lines of procedural steps. A major component of our past training and practice has been directed toward use of library refer-ence assistance, based on codex collections of the data. however, in the digital age, where thousands or even millions of data are available, our former methods have begun to falter. Today, a pressing question is how a scholar should arrange, classify, and analyze search results from large data sets and/or the internet. The older library reference system of using aids for research that point to sources of information is not so helpful in the digital arena, where we can go directly to the data without the intermediate step of consulting collateral documents.

It is often difficult, without the considered judgment of the compilers of reference works, to determine the nature of the data that we access through the internet and digital sets of information. Can software and new methods of approaching data through computation help us deal with these issues of verifying the accuracy of retrieved information? in other words, can we use computation of the data itself to solve prob-lems such as determining accuracy of the data?

in the example being described in this paper, we turn attention to the digital version of the thirteenth-century Korean printing block edi-tion of the Buddhist texts. it contains more than 52 million characters carved onto nearly 166,000 wooden surfaces, each producing a page of text when transferred to paper. There are other digital data sets that incorporate the readings of this edition, such as sAT and CBETA (see below). while it is recognized that volumes 1–55 of the Taishō shinshū daizōkyō18 print edition of the twentieth century are primarily based on the Koryŏ woodblock version, the editors of the Tokyo edition added dozens of texts known in Japan but not found in the northern sung corpus as recorded in the Koryŏ. For this reason, the study of termi-nology between sAT and CBETA, each based on the Taishō, will not be identical to the patterns found in the Koryŏ version. The conclusions

reached in this article are limited to the results obtained from the Korean block prints, which constitute the oldest complete set of origi-nal printing blocks for a version of the Chinese canon.

As mentioned above, in the past, scholars have approached a corpus such as the Korean canon through references in the form of catalogues, dictionaries, glossaries, concordances, and bibliographies. This type of research has been little changed in the field of Buddhist studies since the nineteenth century. A change is occurring in the contemporary world because a revolution in technology allows us to search and re-trieve from the whole of material in the digital format. As a result, a complete inventory of every word or phrase is available; sometimes the examples number in the thousands. The older references based on codex publications are ill-suited to deal with this superfluity of data.

in the comments below, the computational approach combined with visual analytics is explored as one way of handling reference questions in the digital age.

for the chosen example of how the new approach might be used, we start with the term 本覺. our interest is in all occurrences of this com-bination of glyphs adjacent to one another. The goal of the research is to see the term in its total context within the hae-in Monastery version of the canon and to determine the origin and uses of the glyphs in 1,514 texts. The previous approach of having scholars do a manual page by page reading and collecting has serious limitations when applied to more than 160,000 pages. in the last decade of the twentieth century, search and retrieval of target words and phrases was transformed for Buddhist scholars using Chinese language texts. digital versions for the canonic material make it possible to find all references and display the results in a menu listing each line in which a target word occurs. in many cases, such a menu contains thousands of line references. while the current search for a term identifies all of these lines, the numbers can be large enough to require days or weeks of study by a scholar to understand the pattern of the occurrences. we now need tools that take us beyond this current state of the art. The functions of one such tool are described in the search and computation discussed in this ar-ticle. we have not given a name to this tool since it is still in develop-ment. it will be referred to as the “software” (in quotation marks) for the present.

when we use the “software” to make a search in the Korean edition of the Buddhist canon for our target glyphs 本 and 覺 we find that they

occur adjacent to one another 763 times. Even though the number is large, it is a great advance over having to deal with the total number of glyphs in the whole of the corpus. Identifying 763 specific sites within 52,000,000 glyphs is a major accomplishment. nonetheless, 763 occur-rences is still a significant amount of data to handle, and the effort required to go through those hundreds of lines and analyze them is time consuming. The effort being made through the new “software” is directed toward taking these 763 examples and helping scholars ana-lyze and classify so that significant occurrences and patterns will be identified in the shortest time possible.

As a first step with the “software” interface, we look for the number of times that each of the two glyph/characters appears in the corpus (see fig. 1). As the search is made, a report appears in visual form on a

“ribbon” of blue dots, where each of the blue dots represents one of the 52 million glyphs. The dots are arranged by “panes” that correspond to the more than 160,000 pages of the version preserved in Korea. The dot is an abstract image that permits the user to see patterns of occur-rence without the barrier of complex display of natural language glyph constructions such as in the google report.

It is at this first step that we note the distinct shift in methodology.

The initial move on the part of the scholar is to turn directly to the data itself rather than to reference works. As mentioned above, this is accomplished because the “software” provides a process of searching through the entire corpus at once. we have not gone through a refer-ence work that points to data residing in another volume located at a separate site.

In order to proceed with the “work flow,” the user is shown the visual pattern of occurrence of 本覺 on the “ribbon of blue dots” (see fig. 2). This pattern is made into a visual one by changing the color of the dots that represent the target word from blue to red. Across the blue background, a pattern of red dots alerts us to the occurrences of the target word. This visual becomes the first factor in the scholar’s

“work flow” planning. It shows that the glyphs are adjacent to one an-other in a scattering, marked by heavy concentration, in a few places and single isolated ones throughout the canon. securing this much in-formation within a few seconds can be compared to the hours of effort it would take to construct such a pattern, even with an internet search that returns all examples of the term. in other words, an enormous amount of data is being displayed quickly and visually. we can “see”

the occurrences of our target search within the 52 million glyphs and immediately understand the nature of the pattern. This is very dif-ferent from the current internet search based on google algorithms where we have hundreds of individual items listed in a long series of

“pages” (see fig. 3). This is not a criticism of the present technology. It has been a great boon to Buddhist scholars that the digital versions of the Chinese are freely supplied by the CBETA19 and sAT20 sites. These efforts have advanced our research many fold. we are deeply indebted to fagu Buddhist College in Taiwan and Tokyo University for providing this service.

As with all digital technology, there is no point at which it can be said to be complete or finished. Data in the computer is always depen-dent on our continued efforts to preserve, disseminate, and access it.

The new “software” interface being described here is an attempt to take the search and retrieval function to another level of speed and analysis. Visualizations of data can take many forms. in the window below (fig. 3), the glyphs have been shown as individual “blue dots”

and the search, retrieval, and display was constructed based on “place of appearance” for each of the 52 million glyphs.

in the next visual, we explore the pattern of occurrence from the perspective of each text rather than each word and page. in this window (fig. 4) the 1,514 texts that make up the corpus of the Hae-in Monastery printing blocks are represented by a grid of squares. if the term in our search is present in any one of the texts, the square which represents it changes color to show the presence of the term (once or multiple times) in that text. such a view of the 1,514 divisions is in contrast to the “ribbon of blue dots” that displays 52 million charac-ters and 166,000 pages (fig. 2). Rather than looking at an image based on characters or pages, we have the possibility of a relatively smaller image exhibiting the search results showing only a report from each of the texts that make up the corpus. These visuals, whether based on each glyph or each text, are intended to provide different lens for viewing the pattern of occurrence for our two glyphs.

Attention turns to the “work flow,” which is determined based on the illustrations shown above. in order to understand the factual basis for the visual patterns, the “software” can provide the user with the following computations. A visual can be presented with all words counted and displayed by number of occurrences in a bar graph (see fig. 5).

figure 1. rubbings of printing blocks shown abstracted into pane of “blue dots”

figure 2. new interface showing all pages of the canon abstracted into a ribbon of “blue dots”

figure 3. CBETA search results with pages of line references and ribbon of “blue dots” with target word showing as a red dot

figure 4. full view of current interface of “software” showing multitude of search results

figure 5. Computation of graph of total count of each character/glyph in the canon

figure 6. Computation of occurrence by date of translation

Figure 7a. Compound occurrence by year showing “profile” graph

Figure 7b. “Profile” graph of companion compound

figure 8. scanned image of rubbing block appearing with target word highlighted

Figure 9. Computation and analysis graph showing five distinct segments of canon

CoMPUTATion sTEP onE:

CoUnTing All oCCUrrEnCEs of EACh glYPh 本 And 覺 now that we have the overall pattern of occurrence (763 places scattered throughout the whole of the set with sizable clustering at a few points), the next step is to discover the significance of that visual pattern. The “software” provides assistance in the following fashion:

the visual pattern based on computation can be used to give us a de-termination of the inner relationships of the glyphs that are the con-stituent elements of the data. our inquiry for both glyphs gives the information:

本 is found in 1,180 texts with 71,833 hits 覺 is found in 1,182 texts with 69,527 hits.

in this count, we have determined that the two glyphs appear individu-ally in large numbers throughout the 1,514 texts. Another computation reports that these individual glyphs are contained in 78% of the texts.

Thus, the visual view of the text squares (fig. 4), with a large number of these squares colored to report occurrence, is based on this numerical computation.

Work Flow Analysis: The fact that these two glyphs are so widely used alerts us to the possibility that there will be a number of variables in the function and meaning of any adjacent position of the glyphs.

CoMPUTATion sTEP Two:

CoUnTing All oCCUrrEnCEs of ThE

Two glYPhs 本覺 sTAnding AdJACEnT To onE AnoThEr The next search is to combine the glyphs and search for every oc-currence of the two in adjacent positions. This is at the heart of the research. we need to know when these two glyphs form the compound that means “original enlightenment.” The report comes back with the statistic that the adjacent pair can be found in 763 places in the 166,000 pages. we also receive the information that the 763 occurrences of the adjacent pair of glyphs appear in 28 of the 1,514 texts. This computa-tion can be further refined to show that the 28 occurrences represent about 2% of the 1,514 texts of the canon. while the number of hits for the adjacent glyphs number in the hundreds, this is far smaller than the numbers for the occurrence of each of the glyphs alone:

本覺 763 hits compared to 本 by itself 71,833 覺 by itself 69,527

The adjacent occurrence is less than half of 1% compared to the sepa-rate individual examples of the glyphs that form the compound.

Work Flow Analysis: Since the compound appears in only 2% of the texts and the combination of two glyphs is less than half of 1% of the times when the single glyphs occur, it seems that the adjacent glyphs form a specialized term that has limited range in the text corpus.

CoMPUTATion sTEP ThrEE:

CoUnTing ThE nUMBEr of oCCUrrEnCEs in EACh of ThE 28 TEXTs The “software” displays a new feature: a read-out of the catalogue K. (Korea) number of the texts where the words occur followed by the number of occurrences within the particular document. As we will see later, this count for each text is a crucial element in understanding the patterning of the 763 examples of 本覺. The report of the texts gives them in the sequential order of their appearances in the printing blocks at hae-in sa, i.e., K. 22–K. 1513. Thus we find that in K. 22 there is one hit (K. 22:1) for 本覺, and in K. 1397 there are 231 (K. 1397:231), etc.

Table 1. Occurrence count of target word listed by each text K. 22:1, K. 186:1, K. 385:2, K. 426:9, K. 521:11, K. 616:7, K. 623:9, K. 648:1, K. 951:1, K. 1258:10, K. 1262:2, K. 1263:9, K. 1272:2, K. 1331:1, K. 1340:1, K. 1381:5, K. 1397:231, K. 1406:2, K. 1499:246, K. 1501:133, K. 1502:8, K. 1503:1, K. 1504:5, K. 1507:4, K. 1508:2, K. 1509:11, K. 1510:32, K. 1513:15.

Work Flow Analysis: The distribution of the adjacent glyphs involves a rela-tively small number of texts with a wide range of difference in enumeration.

In order to judge the occurrences of the adjacent glyphs, we must search for characteristics of the 28 texts and determine if there are patterns that help explain the history of the adjacent glyphs 本覺.

CoMPUTATion sTEP foUr:

CoUnTing ThE nUMBEr of oCCUrrEnCEs of 本覺 BAsEd on TIME of TrAnslATion or CoMPilATion of EACh TEXT

Because the “blue dots” are not just pictures but each contains many fields of metadata behind the image, it is possible, for example, to compute the occurrences of the 763 adjacent glyphs based on the time of translation. The ancient catalogues of China, as well as the colophons attached to texts, give us temporal information about the translation or compilation/authorship of each text. This time-stamped data can also be used to look for patterns of occurrence. The profile of the image, representing “time,” indicates that there are specific

“peaks” of activity. when we look at the image by the arrangement of the canonic texts compared to the image that shows the arrangement adjusted to time of translation, there are questions about the resulting patterns (see fig. 6).

in order to make this more meaningful, the computation that formed the basis for the imagery can be expressed in tabulation:

(1) The text K. number in which the adjacent glyphs occur.

(2) The number of examples found in each text.

(3) The percentage of the occurrences in the text compared to the total number of 763.

(4) The time of translation or compilation.

In table 2, we have a numerical report reflecting the information that underlies the visual pattern of occurrence as seen in the “blue ribbon”

(fig. 2). Similar to the visual patterning, the tabulation indicates that the term is widely used throughout the corpus. in the tabulation, 本覺 is shown as appearing in sutras said to have been translated from the fourth century CE up to the Northern Sung dynasty and Koryŏ works of the tenth and fourteenth centuries CE (and one additional text dated to the Ming dynasty, seventeenth century CE).

The “software” now provides the next step, which displays the texts according the number of occurrences within them. The informa-tion shows that the range of hits is from a single one in a text (e.g., K.

1503) up to an impressive 246 (e.g., K. 1499). By this method, the texts can be clustered into units based on the number of times the adjacent glyphs occur (see table 3).

相關文件