• 沒有找到結果。

Chapter 3 Methodology

3.1 Corpus Selection

Corpora come in many forms, and it is only through careful selection that we can find the ideal corpus to suit a specific study. Much has already been written about the different types of corpora which exist out there and how we should categorise them, some of which are touched on briefly in the paper. Baker (1995) has said that there can be many ways to categorise corpora depending on the characteristics of the texts being used, this includes distinguishing between:

general language and restricted domain texts; written and spoken language; typicality in terms of ranges of sources and genres; geographical limit or scope; monolingual, bilingual or multilingual corpora.

Another commonly cited typology for corpora is Laviosa's (2002) four level classification system and typology for corpora used in translation studies, which cover aspects such as time span, register, presence of specialised language, number of languages and mediums of

communication. This allows for highly detailed profiles to be drawn up for any corpus, which

19

will therefore facilitate future comparisons between the corpora used in various studies. The specific details on the four levels are as follows:

Level 1 Coverage:

Full texts (unabridged texts); Samples (Parts of texts selected based on a set of criteria);

Mixed (full and sample); Monitor (full texts that are reviewed and updated regularly) Time Span:

Synchronic (texts created within a relatively limited period of time); Diachronic (texts created over a relatively long period of time)

Register:

General (texts written in non-specialised language, e.g. general news articles;

Terminological (texts written in specialised language, e.g. legal statutes, medical research papers) Number of languages:

Monolingual (texts made up of only one language); Bilingual (texts made up of two languages; Multilingual (texts made up of more than two languages)

Language:

English, Chinese, Japanese, etc.

Mediums of communication:

Written (texts written primarily to be read, e.g. newspaper articles); Spoken (texts transcribed from speech or texts written primarily to be spoken, e.g. collection of transcribed political speeches, film scripts); Mixed (written + spoken)

Level 2

20 Monolingual Corpus:

Single (all texts in one language only); Comparable (one translational monolingual sub-corpus and one non-translational monolingual sub-sub-corpus, both of which should be constructed based on the same principles, e.g. one sub-corpus consisting of English texts and one sub-corpus consisting of Chinese-to-English texts, both of which should be from the same or similar genres) Bilingual Corpus:

Parallel (texts in one language and their translations in another language. E.g. English texts and their English-to-Chinese translations); Comparable (two sets of texts in two different languages, texts should be from the same or similar genres, e.g. English texts and Chinese texts) Multilingual Corpus:

Parallel (texts in a range of different languages and their translations in other languages);

Comparable (texts in a range of different languages) Level 3

Single Corpus:

Translational (translated texts in one language, e.g. corpus consisting of English-to-Chinese texts); Non- translational (original texts in one language, e.g. corpus consisting of Chinese texts)

Bilingual Parallel Corpus:

Mono-directional (one or more texts in one language and their translations in another language); Bi-directional (one or more texts in language A and their translations in another language + one or more texts in language B and their translations in language A)

Multilingual Parallel Corpus:

21

Mono-Source-Language (one or more texts in one language and their translations in more than one other language, e.g. English texts and their translations in Chinese and Japanese); Bi-Source-Language (one or more texts in two language and their translations in two other

languages, e.g. English and Chinese texts and their translations in Japanese and Korean); Multi-Source-Language (one or more texts in more than two language and their translations in more than two other languages.)

Level 4

Translation Corpus:

Mono-Source-Language (texts translated from a single language, e.g. a set of texts translated from English); Bi-Source-Language (texts translated from two languages, e.g. a set of texts translated from English and Chinese); Multi-Source-Language (texts translated from more than two languages, e.g. a set of texts from translated from English, Chinese and Japanese)

3.2 Breakdown of Corpus Characteristics for this Study

Extracted from the Hong Kong Judiciary's online reference system, the corpus will primarily be monolingual and comparable, making it suitable for the study of explicitation. The judgments which make up the corpus can be found under the Legal Reference System

(http://legalref.judiciary.gov.hk/lrs/common/ju/judgment.jsp) of the Hong Kong Judiciary's website (http://www.judiciary.gov.hk/en/index/). Available in the Legal Reference System are judgments written in Chinese and English, as well as judgments translated from Chinese into English. For judgments that were translated, both the source text Chinese and target text English

22

are available for download. Although judgments translated from English into Chinese should theoretically be available on the website, none were found there.

The corpus for this study was created from one translational sub-corpus (Chinese-to-English) and one non-translational sub-corpus ((Chinese-to-English). The translational corpus consists of 50 judgments written in English (173,910 English words) and the non-translational corpus is made up of 60 judgments translated from Chinese into English (177,583 English words). All of these judgments are judgments from Hong Kong's Court of Appeal of the High Court. To be more specific, these judgments are in fact reasons for judgments. A reason for judgment is the

document which provides an explanation on why a particular court judgment was made. This is in contrast to an actual judgment, which would be much shorter since it is the official legal order issued by the court regarding the rights and liabilities of parties in a legal action or proceeding.

But for the purpose of this study, these reasons for judgments shall be referred to as judgments for simplicity's sake.

The following table outlines the basic profile of my corpus based primarily on Laviosa's four level typology.

23

Table 1 - Profile of corpus used in this study Judgments written in Chinese (Traditional),

translated to English

Judgments written in English

177,583 words/tokens 173,910 words/tokens

Level 1:

Full Text; Synchronic; Terminological*; Monolingual (English); Written

Level 2:

Monolingual Comparable; Non-translational (English) + Translational (Chinese-to-English)

Level 3: Single Non-translational (English) + Single Translational (Chinese-to-English)

Level 4: Partial Mono-Source-Language (applicable to the translational sub-corpus)

Level 1 Characteristics of Corpus Coverage:

Although the texts used in this corpus do not represent the entirety of what is available on Hong Kong Judiciary website, each individual judgment contains fully complete, unabridged content that can stand alone on its own, hence I have classified the corpus as a full-text type corpus.

Time span:

24

When trying to decide if a corpus is synchronic or diachronic, it can be difficult to determine what exactly a “long span of time” is. With respect to judgments made in the Hong Kong legal system, I would consider the period after the handover of Hong Kong (control over Hong Kong and its territory was transferred by the United Kingdom back to China (represented by the People's Republic of China) in 1997) to be the starting period since Hong Kong's existing legal system essentially began at that point. With this in mind, the corpus is largely synchronic since the judgments I have selected consist mostly of judgments published only within the last five years.

Size:

These two sub-corpora, one translational and the other non-translational, are each made up of judgments ranging from 100 to 16000 English words. The translational and

non-translational sub-corpora consists of 50 and 60 judgments respectively.

Register:

Although one might assume that legal judgments would almost certainly be

terminological-type texts as opposed to general-type ones, a closer inspection of the texts reveal that this is not necessarily the case. In fact, in a typical judgment, the details of incidents

described are written in a fairly easy-to-read manner with only a reasonable amount of legal jargon, not unlike a more detailed and drawn out news report.

A more "legal" style of writing occurs in the sections where a judge is explaining his or her rationale for giving judgments. Given this situation, I'm more inclined to consider the corpus as a mixed-nature corpus where register is concerned, with its texts leaning towards being terminological texts.

25 Language:

The non-translational sub-corpus consists of judgments originally written in English, which is common in Hong Kong since its legal system has English designated as one of its two official languages. As for the translational sub-corpus, its content was originally written in traditional Chinese and then translated into English. Such translations were undertaken primarily as the judgments involved were deemed as important judgments.

Mediums of communication:

The judgments are written primarily to be read.

Level 2/3 Characteristics of Corpus

The corpus is primarily a monolingual comparable corpus made up of a single non-translational corpus (English) and a single translation corpus (Chinese-to-English).

Level 4 Characteristics of Corpus

The corpus may be considered a partial mono-source-language corpus since the translational sub-corpus was translated into English from Chinese.

相關文件