The bei-construction and the ba-construction as two of the most important grammatical constructions in Mandarin Chinese have been subject to much research and numerous debates for decades. However, an initial survey of the existing literature did not turn up any formal syntactic account of either ba or bei based on corpus data, while, not surprisingly, there are quite a few corpus-based functionalist studies on grammaticalization and the discoursal and pragmatic aspects of the two constructions, e.g., Wang (2005), McEnery and Xiao (2004, 2005), Jing-Schmidt (2005), Methven (2007), Yin (2005), Yang (2006), Chang (1998), and Wu (1999). However, except Yin (2005), which used only 119 hits of bei-passives and Yang (2006) used only 5000 hits of randomly selected strings containing the character 被 bei, both from the Taiwan-based Sinica Corpus, the data collected in the other studies cited above are almost exclusively from sources within Mainland China and the largest corpus used is the one-million word LCMC Chinese corpus (http://bowland-files.lancs.ac.uk/corplang/lcmc/), which also contains sources from within Mainland China only.

The first objective of the project proposed here is thus to construct a special purpose subcorpus1 of bei and ba with data extracted from the Sinica Corpus and the NCCU Corpus of Spoken Chinese. Both contain materials from sources within Taiwan only and are thus representative of Taiwan Mandarin. We may also include data collected from Internet sites clearly identified to be within Taiwan. This special purpose subcorpu then serves as the database for our study on the grammatical structures of the bei-construction and the ba-construction in Taiwan Mandarin. Furthermore, this corpus will then be made available online to serve as the database for the study of these two constructions from any linguistic perspective, be it syntactic, semantic, discoursal, or pragmatic.

Our second objective is to compile a comprehensive bibliography on published works dealing with ba and bei. This bibliography will likewise be made available online. Further, the online version will contain links to all articles that are open access and thus can again serve as an important resource on the study of these two important constructions.

Our third, and the more important, objective is to provide a unified formal analysis within Lexical-Functional Grammar (LFG) for the grammatical structures of the bei-construction and the ba-bei-construction in Taiwan Mandarin, based on evidence from corpus data as well as elicited data. This part of the project has three components: 1) review the relevant literature and re-evaluate the previous formal accounts where the bei-construction is factored into the short passive (without an overt agent) and the long passive (with an overt agent), e.g., Her (1989, 1991), Ting (1989), Huang (1999), and Wang (2001), and attempt a unified verbal analysis, 2) review the relevant literature and re-examine Bender’s (2000) incomplete verbal analysis of the ba-construction and attempt a complete verbal analysis, and 3) formally relate the two accounts derived to reveal Hsueh’s (1989) insight that the two constructions mirror each other, as shown in (1) and (2).

(1) [A ba B + C]: in connection to A, B turns out to be what C describes. (2) [A bei B + C]: in connection to B, A turns out to be what C describes.

The general idea that corpus data and introspective or elicited data are complementary and both should be used in the study of formal syntax has been held by the project PI for some time (see Her 2005 and Her and Wan 2007 for discussions); however, the particular idea for this project originated from a conversation the PI had with his colleagues, where some of the data used in the studies on ba (e.g., Bender 2001) and bei (e.g., Her 1989, 1991, Ting 1998, Huang 1999) were questioned. And it was found that most of the formal syntactic studies on these two constructions shared roughly the same range of data, which have not been verified by data from a large corpus. The PI thus conducted a pilot study to see if the data on bei used in previous studies can be verified by genuine examples found by Google searches on the Internet sites within Taiwan.

There have been several different proposals regarding bei’s part-of-speech, varying from case marker (e.g., Li and Thompson 1974), preposition (e.g., Chao 1968, Hou 1979, Her 1985-6), co-verb (e.g., Chang 1977, Li and Thompson 1981), functional projector (or light verb) (e.g., Xu 2007), and verb. Among the generative grammarians, the current consensus seems to be that it is a verb. However, more specifically, Her (1989, 1991) argued, within Lexical-Functional Grammar (LFG), that bei as a verb comes in two forms, the short form (without the agent, as in (3)), and the long form (with the agent, as in (4)), the former involving functional control and the latter anaphoric control. This analysis is confirmed by two influential works on the subject, Ting (1989) and Huang (1999), both rendering an analysis within the mainstream derivational framework. Likewise, they demonstrated that bei as a verb requires two lexical entries, one for the ‘short passive’ and the other ‘long passive’; while the former involves A-movement, the latter requires A’-movement and predication instead, as shown in (3) and (4) respectively.

(3) 李四被打了。 IP NP V' V VP NP V' V NP Lisi bei PRO da-le t

Lisi got hit ....


(4) 張三被李四打了。 IP NP V' V IP NP V' V NP

Zhangsan bei OP Lisi da-le t Zhangsan was hit by Lisi.


... IP NOP

Interestingly, these three researchers apparently came up with similar analyses quite independently. These studies all based this distinction between the short passive and the long passive on introspected data only, which indicate that, crucially, the short passive is not allowed in the following syntactic environments.

(5) The short passive does not allow long-distance gaps. Ex. 張三被*(李四)派警察抓走_e_了。

(6) The short passive does not allow a resumptive pronoun at the gap. Ex. 張三i被*(李四)打了他i一下。

(7) The short passive does not allow suo clitic. Ex. 張三被*(李四)所批評。

However, an initial investigation using the Internet as the corpus clearly indicates that the observations in (3)-(5) based on introspection do not reflect how bei is really used. The following are all genuine examples of short passives that contradict (5)-(7).

(8) Short passives that do have a long-distance gap.

Ex. a. 最後 Ortlieb 仍被想辦法固定住_e_了。2

b. 通道都已經被派兵把守_e_。3

c. 我還是會被企圖吃掉_e_吧?4

d. 資料被設法拷貝_e_了。5

(9) Short passives that do have a resumptive pronoun at the gap.

Ex. a. 他只是撞到學長就被找人打他。6

b. 他都和他們相處得很好,常常被要求他做這做那,他都很樂意做。7

c. 有人穿紅內褲只是想中頭彩被說成他倒扁。8

d. 比死更悲慘的,他竟然被奪走了他的死;失蹤把他驅逐於生與死。9

(10) Short passives that do have the suo clitic.

Ex. a. 過來的回教,在台灣卻不被所重視,甚至不敢當眾提起。10

b. 外面的誘惑那麼多,不知道老公會不會被所誘惑,我好怕哦!11

c. 老天爺是公平的,你能愛人,也一定會被所愛。12

d. 如果不相信緣分,是不是就要承認,自己的存在,其實是不被所 愛?13

Huang (1999: 449) also observes that place adverbials may occur with the long passive, as in (11a), but not the short passive, as in (11b). Ting (1998: 350) makes the same observation, shown in (12).

(11) a. 張三被李四在學校騙走了。 b*張三被在學校騙走了。 (12) *張三被在公司裡批評了。

Nonetheless, once again such an introspected observation is contradicted by data from naturally occurring texts found on Taiwan Internet sites. Four of the counterexamples collected are given here in (13).

Huang (1999: 447) makes the further observation that agent deletion is not allowed in the general environment in which bei occurs, i.e., the V-NP-V configuration, as shown in (14). This is an important argument for the distinct structures of the short passive and the long passive, because if the long passive allows agent deletion, it would be difficult to explain this exception to the otherwise general prohibition. However, again, many counterexamples were found to this so-called general prohibition, some precisely in a ‘short’ bei passive no less, as in (15).

(14) *李小姐,我逼 改嫁了。

(15) a. 我母親在我生父過逝後被逼 改嫁。18

b. 共產黨當權後,奶奶被逼 改嫁。19

c. 舅媽在大陸被共黨迫害後又被逼 改嫁。20

d. 老娘可就會被逼 改嫁ㄛ。21

Should further data confirm that bei indeed behaves similarly with or without the agent phrase, then the distinct structure and analysis for the short passive is unjustified, all

bei passives are in fact long passive in nature, and thus the agent phrase is simply optional.

If so, Tang’s (2001) debate on how to account for the obligatory agent phrase in the long passive is also a non-issue. In this project we will further investigate these issues and all other related issues, e.g., the so-called indirect passive and adversative passive, carefully, using both corpus data and elicited data and will attempt a unified analysis within LFG. We will also explore a universal characterization of ‘passive’ (e.g., Huang 1999: 481, Givón 2006: 338, Keenan and Dryer 2007) and thus hope to settle the issue whether the bei-construction is a genuine passive bei-construction.

(16) ba V

(↑ PRED) = ‘ba <(↑ SUBJ)(↑ OBJ)(↑ XCOMP)>’22 (↑ OBJ) = (↑ XCOMP TOPIC)

(17) 他把橘子剝了皮。 f-structure23

PRED ‘ba <(↑SUBJ) (↑OBJ) (↑XCOMP)>’ OBJ [‘orange’]

SUBJ [‘3sg’] TOPIC

XCOMP PRED ‘peel <(↑SUBJ) (↑OBJ)>’

SUBJ [….]

OBJ [‘peel’]

An important feature of Bender’s account is that ba’s object controls the XCOMP’s TOPIC, indicated by the solid curved line. TOPIC as grammaticalized discourse function is subject to the Extended Coherence Condition (ECC) and thus must be is functionally identified with or anaphorically binds another function (Bresnan and Mchombo 1987: 8). In the f-structure of (17), the TOPIC in the complement clause, juzi ‘orange’, enters a possessor-possessed relation with OBJ in the local f-structure and thus anaphorically binds the latter, as indicated by the dotted curved line on the left. This topic analysis of ba’s object nicely accounts for the unbounded nature of the gap, if any, in the embedded VP complement; see (18).

(18) 警察局發現是臺灣人,結果把他派人送__回賓館了。24

In (18), the TOPIC of the embedded clause, which is controlled by ba’s object, ta ‘he’, enters into a long-distance control relation with the object of the embedded verb song ‘send’. According to Taso (1986), a non-controversial topic in Chinese has the following properties.

(19) Topic properties (Tsao 1986: 4)

a. Topic invariably occupies the S-initial position of the first sentence in a topic chain.

b. Topic can optionally be separated from the rest of the sentence by one of the four pause particles: a (ya), na, me, and ba.

c. Topic is always definite or generic.

d. Topic is a discourse notion; it may, and often does, extend its semantic domain to more than one sentence.

e. Topic is in control of the pronominalization or deletion of all the coreferential NPs in a topic chain.

f. Topic, except in cases where it is also subject, plays no role in such processes as reflexivization, passivization, and Equi-NP deletion.

Thus, while LFG’s ECC can be taken to be a universal constraint, the possible relations listed in (19) can be seen as Mandarin-specific restrictions on the kind of adjuncts that are allowed to serve for the incorporation of the topic (Bender 2000: 128). In this project, we will further investigate this issue with corpus data and elicited data to determine whether it is justified.

Meanwhile, however, it is interesting to note that this topic analysis of the ba-NP is similar to Her’s (1989, 1991) topic analysis of the bei subject. Thus, the topic analysis is likewise compatible with the A’-movement analysis that Ting (1998) and Hunag (1999) advocate for the bei long passive. However, Bender’s analysis of ba is not complete in that it leaves the subject of the embedded clause unaccounted for. The f-structure in (17) is repeated below in (20). An XCOMP’s SUBJ, by definition, must be functionally or anaphorically controlled. In (20), it is unclear how SUBJ in the embedded clause is controlled. This issue is explicitly left for future research in Bender (2000: 129).

(20) 他把橘子剝了皮。 f-structure

PRED ‘ba <(↑SUBJ) (↑OBJ) (↑XCOMP)>’ OBJ [‘orange’]

SUBJ [‘3sg’] TOPIC

XCOMP PRED ‘peel <(↑SUBJ) (↑OBJ)>’

SUBJ [….] ?

OBJ [‘peel’]

Hsueh (1989) argued that the ba-construction should not be seen as having a disposal reading and the bei-construction should not be interpreted as a passive construction. In this project, we expect to argue for Hsueh’s point on ba not being ‘disposal’, but we anticipate to argue against his view on bei not being passive. We further hope to validate and put in concrete formal analyses his insight that the two constructions mirror each other, as shown again in (20) and (21).

(21) [A ba B + C]: in connection to A, B turns out to be what C describes. (22) [A bei B + C]: in connection to B, A turns out to be what C describes.


The two examples in (22), where bei and ba as used by the speakers as two parallel lexical items that freely alternate, are interesting and indicative that Hsueh’s mirror generalizations in (20) and (21) are on the right track.

(23) a. 朋友就是被(把?)你看透了還能喜歡你的人。25

b. 我的寫作是把自己放在一個confusion裡頭,然後努力不要


In this project, we will thus further investigate the three-place predicate analysis of

ba in general and the topic analysis of ba-NP in particular. We will explore the corpus data

in Taiwan Mandarin and re-examine some of the questionable or controversial introspective data used in previous syntactic studies on ba in seeking a complete analysis. We hope to ultimately come up with a unified account for the two constructions, bei and ba.


We will first construct a special purpose subcorpus of bei and ba as used in Taiwan Mandarin. Data will be collected from the Sinica Corpus, the NCCU Corpus of Spoken Chinese, and perhaps also Internet sites within Taiwan. Elicited data from grammaticality judgment experiments may also be incorporated in the corpus as a separate module. Meanwhile, we will also compile a comprehensive bibliography on published works dealing with ba and bei and input all references in the program Endnotes.

In seeking a systematic account of the syntactic structures of ba and bei in Taiwan Mandarin, we expect to use both the generative methodology and the corpus methodology. Thus, corpus data will be complemented with data elicited from native Taiwan Mandarin informants. In doing the pilot study on Internet data leading to this project proposal, the PI came to truly appreciate Fillmore’s (1992:35) acknowledgment that corpora allow the establishment of new facts, some of which one ‘couldn't imagine finding out about in any

other way’. However, the fact remains that corpora, however big, do not contain all

possible sentences and, more importantly, do not provide any negative evidence in the form of ungrammatical sentences, which is crucial to generative argumentation. Therefore, this study also aims to demonstrate that the generative approach and the corpus approach can indeed complement each other, which Fillmore (1992) and Her and Wan (2007) advocate and some of the generativists and corpus linguists also support (e.g., Smith 1999: 15, Newmeyer 2003: 687, Kennedy 1998: 8, Biber et al. 1998: 10, 271).

While idealization is necessary, it must be emphasized that idealization away from speech errors, for instance, still allows one to use performance mistakes such as slips of the tongue as evidence… All our understanding of linguistic knowledge…has to be supported by evidence, and where that evidence comes from is limited only by our imagination and ingenuity. (Smith 1999:15)

The use of both introspection and corpus-based analysis can contribute to linguistic analysis and description. Corpora cannot tell us everything how a language works. For example, they cannot be used as a basis for stating what structure or processes are not possible. (Kennedy 1998:8)

The project PI’s main specialization is generative syntax, the first Co-PI, Claire Hsuen-hui Chang, teaches both corpus linguistics and field methods, and the second Co-PI, Kawai Chui, is the chief force behind the MOE’s ATU project NCCU Corpus of Spoken Chinese. Thus, this research team also hopes to demonstrate that the two kinds of linguists can work together.

The formal theoretical framework assumed in our formal formulation of syntactic analyses is LFG. As a non-derivational generative framework, LFG takes seriously the insight that some generalizations regarding the mapping between the predicate argument structure and the syntactic structure must be stated at an independent level of predicate valence (Levin 1987, Rosen 1989, Bresnan and Kanerva 1989, Bresnan and Zaenen 1990, Grimshaw 1990, Jackendoff 1990, Alsina 1993, 1996, Mohanan 1994, Neeleman 1994, Butt 1995, Butt and King 2000, among others), and thus poses an argument structure (a-structure), which links the lexical semantic structure and the syntactic structure of a predicator (e.g., Bresnan and Kanerva 1989, Bresnan and Zaenen 1990). The particular conception of the a-structure assumed here is based on Baker (1983) and Bresnan (1996, 2001).

(24) Lexical semantics (e.g., beat <beater beatee>)

a-structure (e.g., beat <agent theme>)

syntactic structure (e.g., beat <(↑SUBJ) (↑OBJ)>)

Furthermore, to capture the RG concept of grammatical relations, LFG posits two parallel planes of syntactic representation: constituent structure (c-structure) and functional structure (f-structure) (Kaplan and Bresnan 1982). The c-structure encodes the categorical hierarchies, usually represented as tree configurations. The f-structure, formally a feature structure, is the central locus of grammatical information, such as grammatical functions (e.g., SUBJ and OBJ), tense, aspect, polarity, case, person, number, gender, etc. These parallel structures are linked by correspondence principles and together provide the complete syntactic description.





The First Year:

1st-4th month: to train assistants, establish infrastructure, and obtain sources of data 3rd-7th month: to compile the comprehensive bibliographies of published works on bei

3rd-8th month: to construct the special purpose bei subcorpus


7th-10th month: to conduct necessary elicitation experiments and derive a formal LFG analysis for bei

9th-11th month: to review and evaluate progress and implement necessary measures 9th or 10th month: to prepare a paper presentation for an international conference

11th-12th month: to write the interim report for NSC and also to have a manuscript ready to submit to international journal.

The Second Year:

1st-4th month: to compile the comprehensive bibliographies of published works on ba

3rd-7th month: to construct the special purpose ba subcorpus

3rd-8th month: to review the existing literature and examine the subcorpus

5th-8th month: to conduct necessary elicitation experiments and derive a formal LFG

analysis for ba

7th-10th month: consolidate and integrate the two analyses, the two subcopora, and the two bibliographies

9th-11th month: to review and evaluate progress and implement necessary measures 9th or 10th month: to prepare a paper presentation for an international conference

11th-12th month: to write the interim report for NSC and also to have a manuscript ready to submit to international journal.


1) Items that we anticipate to accomplish in the project include:

a. Two comprehensive (online) bibliographies of published works on bei and ba b. Two special purpose (online) subcorpora on bei and ba

c. A comprehensive formal LFG analysis for bei d. A comprehensive formal LFG analysis for ba

e. An integrated formal LFG analysis on the parallelism of bei and ba f. Two to four conference presentations

g. Two journal articles

2) We further anticipate that the team members in the project will have the following benefits in linguistic training and experience:

a. the PI will gain in-depth knowledge of corpus-building, corpus methodology, and the Mandarin bei and ba

b. the co-PI will gain firsthand experience in LFG

c. the RAs will also gain valuable firsthand experience in information gathering, data collection, and documentation management

d. the RAs will witness how a research project is planned and methodically and scientifically carried out, which will be valuable for their MS theses

e. the RAs will learn how to construct grammatical hypotheses and argumentation f. the RAs will gain experience with the non-derivational theory of LFG and its

formal grammar formulations

g. the RAs will receive training in essay writing, conference presentations, and journal submissions



