Data analysis - 解析同性戀:台灣網路新聞中的詞彙搭配與社會態度

Analysis was carried out in four stages: information retrieval to build the study database, word segmentation and figuring out the concordance by AntConc (3.4.4w) tool, indexing of news posts with assessments, collocational and textual content analysis, as shown in figure 3.4.

Figure 3.4

Research Methodology Flowchart

Information retrieval to build the study database

Initially, we searched these keywords on the four online database of news (CNA NEWS, UDN NEWS, APPLE DAILY, STORM MEDIA). Retrieval of news was

done by full-text search in the online database using the five key words, including headlines, articles, opinion columns, feature articles, breaking news and special-topic news. Tongzhi, tonghun, tongxinglian were the main keywords, because nan tongzhi, nu tongzhi are the words belonging to tongzhi. All news posts contained these five keywords were collected, and they were carefully examined by

relevance, time period (from 2016-2017) and were deleted if the news posts overlapped. The total occurrence of keywords is 943. The number of news posts is 100. Furthermore, manual content analysis was done to check that all items sampled were relevant and to identify what appeared to be recurring thematic and formal patterns for more detailed analysis. We created a corpus by selecting posts containing at least one keyword associated with homosexuality (tongzhi 同志, tonghun 同婚, tongxinglian 同性戀, nan tongzhi 男同志, nu tongzhi 女同志). We

compiled 100 pieces of news posts (25 for each digital Internet media) together with the five keywords and manually scanning and eliminating irrelevant items.

Word segmentation by National Digital Archives Program and obtaining concordance by AntConc tool

Before we put our data into AntConc tool to carry out a corpus linguistic analysis, language processing software has to understand the words in a text so that the text can be processed. Chinese sentences contain no boundaries, unlike the writing systems of languages such as English that use spaces to separate words. Therefore, a typical segmentation system is necessary to prevent word ambiguity.

Figure 3.5

A snapshot of word segmentation

A process of segmentation was done by Language and Knowledge Processing Group, Institute of Information Science, Academia Sinica (http://ckipsvr.iis.sinica.edu.tw/),

which was ranked first for traditional Chinese word segmentation evaluation at the First International Chinese Word Segmentation Bakeoff held by ACL SIGHAN. This process helps calculate the frequency of the occurrence of the selected data and control the distance (30 words) from the central key words. Figure 3.5 shows a snapshot of a part of word segmentation with tags of part of speech for each word.

The word segmentation process is based on the lexicon, morphological rules for quantifier words and reduplicated words.

After retrieving the database by using the keywords, we obtain some information in the context with those keywords. As our purpose is to find the frequency of words in the database where they were used to define homosexuality, all possible

occurrences and collocations of five key words are investigated. This present study focuses on the analysis of content words such as nouns, verbs and adjectives.

However, it does not mean other word classes like conjunctions or determiners cannot carry ideological weight in the discourse. As studies within the field of Critical Discourse Analysis(CDA) traditions have shown, all levels of linguistic

analysis is ideologically relevant, since there is no necessary connection between linguistic form and ideological stance (Fairclough 1995). Ideologies are built upon interaction and co-presence of elements at different levels of linguistic analysis, from vocabulary and grammar to cohesion and text structure (Fairclough 1992).

The data were then investigated more systematically using AntConc tool, a software corpus analysis toolkit for concordancing and text analysis. We take advantage of the concordance tool. It is important to choose the appropriate definition of the term deciding what count as collocation. Most commonly, collocation is defined as “the occurrence of two or more words within a short space of each other within a text” (Sinclair 1991:170). Collocation refers to the characteristic co-occurrence patterns of words (McEnery et al 2006). In a broad sense, collocations draw on Firthian’s (1957:11) early definition of collocation as “the company a word keeps”. Other researchers like Harris (2006) see collocations as sequences of adjacent words. Hunston (2002:68) states, “to be biased in the ways they co-occur”.

Huston further describes collocation more specifically as the statistical tendency of the co-occurrence of words. Statistical significance tests help measurements be more reliable (Hunston 2002). On the other hand, researchers have pointed out that complex statistical analysis is not necessary. Stubbs (1995) justifies this point by showing the list of collocates for the node cause- accident, alarm, concern, confusion, damage, harm. There are no positive examples and only a few neutral examples. The

pattern is clear without statistical manipulation. In such techniques named collocation-via-concordance or hand-and-eye techniques (McEnery and Hardie

2012), examples and recurrent patterns are identified by a researcher who scans the individual concordance lines.

Figure 3.5

A snapshot of AntConc Tool

The present analysis follows collocation-via-concordance methods. As can be seen in figure 3.6, this tool shows search results of keywords in context. This function enables us to see how words are commonly used in a corpus under investigation. The number of text characters to be outputted on either side of the search term, 30 characters. We arrange keyword in context into different colors, sorting the concordance lines at three different colors. 1L are words to the left of target words (purple) and 1R are words to the right of the target words (red). By doing so, we can see the concordance results more clearly. The tool allows us to investigate the frequency in the corpus of the patterns or features for us to analyze, to examine these items in context by viewing their concordances and to identify significant keywords and frozen phrases by comparing their frequency and the words precede or follow them. Collocates are defined as words occurring close to one another. Only collocates occurring within the same sentence are considered, because words in separate sentences are not so closely related.

Indexing of collocates with assessments

The stage of analysis includes two steps: first analyzing collocations then providing evaluation. The first step, we use two analytical concepts: semantic preference and semantic prosody. The five keywords were searched in the corpus respectively to obtain concordances analyzed in order to better understand the semantic preference

and semantic prosody associated with the keywords. Semantic preference is the semantic environment in which a lexical item is typically used. It refers to the distinct tendency for a node word to co-occur with a class of words which share the same feature (Stubbs 2001). Semantic prosody expresses the attitudinal, often pragmatic, meaning a lexical item (Sinclair 2004), which involves a further level of abstraction, referring to the semantic coloring a number of semantic preferences build up over wider stretches of text (Partington 2004). The two concepts often interact with each other. Semantic prosody is often implicit, less clear-cut and at least sometimes deniable and dictates the general environment which constrains the preferential choices of the node item (Stubbs 2001), while semantic preference contributes powerfully to building semantic prosody (Partington 2004). Using semantic preferences and prosodies as analytical concepts, both the social domains and the evaluative connotations associated with particular words can be identified (Mautner 2007).

These concordances were divided into groups according to semantic features.

Concordances (tables which show all of the occurrences of keywords in the immediate context in which they occur) were then used to explore collocational relationships. Often, concordance lines needed to be expanded in order to access more context, which at times involved the reading of an entire piece of news.

Collocational analysis allows us to identify situations which are frequently associated with a common set of words.

The second step focused on assessments of homosexuality. Different stances emerged in debates about this controversial issue. Across the data corpus, people would locate themselves in certain stances. To account for different stances toward homosexuality, we would demonstrate them with examples in context. Studying examples of positive, negative and neutral assessments, the research subsequently shows that how the Internet media claim or deny the existence of homosexuality. Each assessment stands as a visible mark of the discursive, rhetorical and performative work (Travers-Scott 2010). Analysis of collocates were divided into three categories:

negative assessments, positive assessments, and neutral statements. The label collocates as negative, positive or neutral by observation in their context. For example, if one claims disliking tongzhi, then dislike can be treated as the expression of an essential negative attitude. If one claims approving homosexuals, then approve can be treated as the expression of positive attitude. If a social movement such as parade is held, then parade is treated as a neutral event. In this way, words become vectors for the transmission of underlying ‘cognition’ and ‘experience’. (Attenborough and Stokoe 2012). Actions like assessments are constructed to “display an orientation and sensitivity to their intended recipients” (Sacks, Schegloff and Jefferson 1974:727).

Because the authors of the assessments know that assessments are read not just for what they say about the object being assessed (in this case, the news posts), but for what they reveal about the subject doing the assessing (the journalists themselves). Authors and recipients constantly negotiate the ‘subject-side’ of publicly accessible assessments (Edward 2005).

Analysis of textual content

The general idea of analytical method of Critical Discourse Analysis (CDA) is followed in this study. In the past two decades, CDA has made a significant contribution to illuminating the relationship between language and ideology (Fairclough 1995, Fowler 1991, van Dijk 1998a). We believe that language is a reality-creating social practice, and “anything that is said or written about the world is articulated from a particular ideological position” (Fowler 1991:10). CDA emphasizes the need to critically examine the role of news language. Continued from the preceding stages, because collocation analysis is not sufficient to get better understanding of the whole picture, we go through 100 pieces of news content and their headlines in terms of their main themes and sentence construction from a linguistic perspective. By reading through 100 pieces of news posts, the study examines evaluative descriptions and selects extracts from the data for qualitative analysis. As for headlines, we list headline tables. Out of 100 headlines, some major

characteristics are identified. For example, the value of equality is described in an imperative form or direct quotes with exclamation marks. Of course, the unrelated or trivial data were crossed out, and we found out that over 80% of the news touch upon the theme concerning equality and sameness, the role of family and destruction to society such as illness, abnormality, crime.

3.4 Summary

The present study uses corpus linguistic techniques such as coding, frequency counting, and attempts to explore, both quantitatively and qualitatively, how social attitudes toward homosexuality is encoded by the linguistic resources such as lexical choices and evaluation across contemporary Internet media. We aim to investigate the occurrence of words which represent the concept of homosexuality in society. To study the specific usage of individual words in the Internet media, we made use of AntConc Tool, which helped elaborate concordances and lists that exhibited all of the occurrences of a particular keyword in selected text. In addititon, it showed where the word appeared in each text and linked the sentences to the text where they appeared. This contexualized analysis of the lexicon thus helped refine the quantitative analysis of the vocabulary in the news discourse and derived the conclusion with regard to ideology. The public attitude of the news in the Internet media, informed CDA is not always apparent but hidden in the subtle

choice of linguistic forms. Since the language within Internet media has become such an important source that we have begun to explore by examining the linguistic category in a critical way, the ideological nature of news discourse can be unlocked.

C HAPTER F OUR

R ESULTS AND D ISCUSSION

In this chapter, we attempt to provide collocates distribution related to homosexual showing an inherent tendency. After a general collocate description (section 4.1), we start to label them with different stances, positive, negative and neutral and to group them according to their semantic features (section 4.2). The analysis looks into a broad context level when we examine the news headlines and the content. Furthermore, we identify major themes and linguistic strategies applied in the text (section 4.3).

在文檔中解析同性戀:台灣網路新聞中的詞彙搭配與社會態度 (頁 70-82)