• 沒有找到結果。

of Partisan

N/A
N/A
Protected

Academic year: 2022

Share "of Partisan"

Copied!
28
0
0

加載中.... (立即查看全文)

全文

(1)

Research Articles

Reconstruct Partisan Support Distribution with Multiply

Imputed Survey Data:

A Case Study of Taiwan's 2008 Presidential Election *

Frank C. s. Liu**

ABSTRACT

Analyzing survey data is one of the most promising methods by which to predict election results. But respondents may conceal their preferences. Hence, it has been difficult for researchers to obtain true partisan support distributions of with one survey data set. Given the constraints of cost, could we possibly predict vote shares more accurately with one sample? This paper employs

*

An earlier version of this paper was presented at the 9'" Conference on Survey

孔Iethods and Application, September 11. 2009. Academia Sinic且 Taip剖, Tai wan. The author is grateful for comments from Hong Y ong. Tai (洪永泰) and Shia Ben-Chang (Ë說:邦昌)

* *

Associate Professo仁 Institute of Political Science. l\ational Sun Yat.Sen Uni.

verSl兮;e.mail: csliu@mail.nsysu.edu.tw

l\ote: Received: September 15, 2009; Accepted: September 17, 2010

(2)

136 調查研究方法與應用/第位期

multiple imputation (MI) for point estimation as a way to (re)con- struct the distribution of partisan supporters in Taiwan's 2008 presidential election. The findings show an identifiable difference between the biased point estimation and a better one of using MI.

Althoùgh there remain other types of errors that may influence the accuracy of a prediction, readers may find this method rela tively cost efficient when formulating strategies to improve point estimation pertaining to election results.

Key W ord: multiple imputation, partisanship, survey research, missing values, election prediction

以多重揮補法重建政黨支持比率的圖像:

以 2008 總統大選前夕面訪案資料為例*

劉正山**

摘要

透過調查樣本來推估得票率是選舉預測方式之中非常使用的方 法。然而,即使抽樣過程恰當且具有自體代表性,受訪者在面對投票 插話這類問題時所產生的拒答現象卻往往造成資料的遺失,進而造成 點估計的偏誤。這個情形在選舉期間的投票意願調查或敏感問題的謂 聳格外嚴重。這個嚴重遺漏值的問題不但導致民眾對於使用調查資料 的頭測能力失去信心,也造成學者對於這些資料產生出來的攝述統計 數接感到懷疑。本研究嘗試以多重揮補法進行遺失資料 (missing data) 補足的工作,並將此法應用到政黨得票率的點估計上。本研究 使用台灣 2008 年總統選舉前蒐集的「台灣選舉與民主化調查」菌訪資 料 (Taiwan's Election and Democratization Study for the 2008

*

本文初稿發表於 2009 年 9 月 11 日中央研究院主辦「第九屆調查研究方法與應用」學術 研討會,承蒙國立台灣大學政治學系洪永泰教授、輔仁大學統計資訊學系暨應用統計所 謝邦昌教授等多位學術先進對於本文提出的寶貴評論與意見,作者受益良多,特此致謝。

**

國立中山大學政治學研究所副教授, E-mail: csliu@mai1.nsysu.edu.tw

(3)

Reconstruct Partisan Support Distribution with Multiply Imputed Survey Data 137

legislative elections, TEDS2008L, N=1,240) ,比較多重揮補懿後輿 聽黨髏選人支持率的差異。研究發現,使用多重揮構法將有閻健館正 國高度遺羈值所造成的候選人支持率的點估計囑誤。

關鍵字:多重揮捕法、政黨傾向、謂查研究、點估計、選舉預測

1. Introduction

A commonly recognized problem among public opinion research- ers and pollsters is that respondents, when asked about sensitive issues, often hesitate to disclose their vote choices and attitudes. This item-non-response problem caused by respondents' reluctance (by saying “Don't Know",“It Depends", or simply refusing to answer some of the questions) has long been recognized as a cause of biased point estimation within a survey sample. One safe method for recon- structing partisan support distribution is to summarize a decent num- ber of polls and survey data, and then calculate the average of the polls. The problem with this method, however, is that poll companies and survey institutes may not release their data to the public, and/or they may not release the information prior to the election. Therefore, it has been difficu1t for individual researchers to apply this method to upgrade their understanding about the preferences of the elector- ate.

Over these years, scholars of vote prediction have proposed a few approaches to deal with this item-non-response problem and hope to enhance the accuracy of description, or even prediction, with single data set (the four methods will be discussed in Section 1). This

(4)

138 調查研究方法與應用/第 24 期

paper joints the line of effort by exploring the usefulness of multiple imputation (MI) in advancing the accuracy of preferences descrip- tions of the electorate.1 The rationale of this study is straight- forward: as MI has been a sophisticated method for regression analy- sis, it is reasonable that we use it to advance our understanding about the preference distribution of the electorate.

Note that this method is only applicable to the item-non- response situation. A number of methods that deal unit non-response (i.e., respondents not available for the survey or not being able to answering a series of questions), such as raking, are beyond the scope of this paper. Furthermore, even though this paper will show

that 由is method has potential to empower researchers to use single data set to reconstruct partisan support distribution, the results of

出is preliminary attempt should not be over-expected. The method proposed here is more about advancing our understanding about the preference distribution of the population (the electorate) than pursu- ing a perfect match between estimated vote shares and election results, given the fact that a good proportion of the electorate may give up their votes. This paper is aimed to show that applying MI in describing the preference distribution of the population is better than not using MI.

Sections 1 and II will give a brief overview about current

1. There are certain other methods by which to deal with missing data, such as giving weights to compensate for those who were exc1uded due to missing values, and con.

structing the likelihood based on incompletely observed data. These methods are not covered here, for they are not directly dealing with increasing estimation aαuracy.

For an overview, see Raghunathan (200是)and Särndal and Lundstrδm (2005).

(5)

RεconSLruct Partisan Support Distribution with l'vIultiply Imputed Data 139

methods dealing with item non-response and 1\11 techniques respec tively. Section III will consist of a description of the data set for inspection, including its pattern of missingness and the variables selected for imputing individuals' vote choices, vγhich are the key variables used to calculate the distribution of supporters of the t'vvo political parties in Taiwan法 2008 presidential election. This section will then introduce a software package, Arnelia 11, designed to han- dle these tasks. Section IV vvill present the results of ten experiments and suggest strategies for choosing a reasonable and manageable number of variables to achieve results that are as good as those de- rived from methods using all variables. This paper will conclude with a discussion (Section V) about the limitations of this method and provides a path for future research.

11. Methods of Dea Ii ng with ltern.國 Nor卜 Response

Over these years, scholars have proposed a few approaches to deal with this item-non-response problem. The first is replacing forced-choice questions with subjective probability scales. For exam- ple, instead of asking ,“1再Thom v九!ouldyou vote for if the election \九'as

held today?"~a question that usually results in a high rate of "un

decided弋 researchers could reduce this refusal rate by asking‘“On a scale of 1 to 10, hm\' likely are you to vote for each candidate on Election D的1?" As this method is found to be effective primarily for elections i口volvingmore than two candidates, the accuracy of predic- tions formulated using such adjusted question would not be better than that of using the original question in elections with only two

(6)

140 調查研究方法與應用/第 24 期

candidates (K. J. Flannelly, L. T. Flannelly,

&

Ma1colm S. Mcleod, Jr., 1998; Laura T. Flannelly, Keven J. Flannel材, & Ma1colm S.

Mcleod, Jr., 2000).

The second method is to change contextual settings in order to decrease respondents' anxiety levels while they are revealing their opinions regarding sensitive issues and to boost their willingness to express their true preferences. This might include using a self- administrated or “secret-ballot" questionnaire (Bishop & Fisher, 1995). There is also some inconclusive discussion related to ways such as online chat rooms that might be able to serve this purpose (for example, Ho & McLeod, 2008; McDevi坑, Kiousis, & Wahl- Jorgensen, 2003). However, adoption of such techniques will inevi- tably increase the cost of surveys.

Thirdly, it is suggested that researchers use alternative dimen- sions in order to probe for hidden partisans (Coakley, 2008). Coakley, in his study of N orth lreland, chose

nationalist" as an a1ternative dimension to probe for hidden Sinn Fe'in supporters. He found that many (approximately 30%) of the individuals whose attitudinal con- figurations suggested that they were likely Sinn Fe'in supporters claimed to support either another party or no party at all. The chal- lenge and difficulty of adopting this method, nevertheless, will be that researchers need to theoretically justify and repetitively test the validity of using such alternative survey questions. Whereas there is no single convenient poll question by which to uncover hidden parti- sans, there remains room to further explore various theory-based alternative questions to advance this approach.

The fourth approach, and also the one that is adopted in this

(7)

Reconstruct Partisan Support Distribution with Multiply Imputed Survey Data 141

paper, suggests that researchers deal with the problem of under- representation of extreme-right voters by multiply imputing missing values based on information drawn from voter profiles or character- istics (Durand, Blais, & Larochelle, 2004). While concerns exist regarding biased estimates, current development of methods is mak- ing this technique a cost-efficient and easy one to use.

11 1. Multiple Imputation for Electoral Studies with Missing Values

Multiple imputation (MI)-inserting new values into data cells with missing values based on information derived from other related variables-has been identified and used as one of the techniques by which to solve the problem of missing values caused by non-response in surveys.2 Over the past few decades, a growing number of re- searchers have followed Rubin's (1987) recommendation to replace missing or deficient values with a number of alternative values re- presenting a distribution of possibilities (for instance, Paul, Mason,

2. MI is a method commonly used to deal with missing data problem, including item.

nonr的ponse(nonresponse to some, but not all, survey questions) and unit-nonresponse (nonresponse to all survey questions)_ A common and still useful alternative is list- wise deletion of observations due to both item-nonresponse and unit-nonresponse in the regression analysis. However, because a significant number of observations are excluded from analysis, this method may yield biased parameter estimates. While the default procedure of most statistical packages excludes the observations with missing values, list-wise deletion has been identified as a problem for most electoral studies (Gelman, King, & Liu,l998). This concem regarding biased estimates can be minim- ized if the loss of cases due to missing data is less than about 5%, and if pretest vari- ables can reasonably be included in the models as covariates (see Graham, 2009)_

(8)

142 調查研究方法與應用/第 24 期

McCaffrey,

&

Fox, 2008).

1n contrast to single imputation or stochastic imputation, which refers to conducting one stochastic imputation on the basis of infor- mation derived from other variables, M1 is a procedure whereby sev- eral data sets are created based on the original, and then the same analysis is performed separately on each complete data set.孔11 is preferable to single imputation because single imputation generally leads to underestimation of standard errors and overestimation of test statistics, as M1 reflects sampling variability and other uncertain factors inherent in models (Rubin, 1987, p. 11-18; 一Weisberg, 2005, p.

143-150). While some scholars may think this technique is unrealistic, or have concerns about

making up" data, Stuart, Azur, Frangakis, and Leaf (2009) inversely argue that“complete-case analyses require stronger assumptions than does imputation" (p. 1134).

To be more specific, the practice of M1 is comprised of three steps: (a) the imputation stage (creating imputed data sets), (b) the analysis stage (calculating parameter coefficients on the basis of each created data set), and (c) the combination stage (calculating final coefficients and standard errors from the numbers obtained in the analysis stage). After selecting proper variables for imputing the variable with missing values-i.e., those variables that are correlated with the targeted variable(s) of interest, the M1 algorithm will take the values of the chosen variables and generate for each missing value more than one new value. The procedure will then create sev- eral new data sets (usually five or more), in which all missing values are filled. N ext, a researcher runs his or her models based on each imputed data set. Supposing that there are five imputed data sets, he

(9)

Reconstruct Partisan Support Distribution with Multiply Imputed Survey Data 143

or she will consequently have five sets of coefficients. 1n the thírd stage, called "combination", the researcher obtains the value of the interested coefficient by averaging the five coefficients of the same variable acquired in the five imputed data sets. The standard error of the resulting coefficient is also determined based on the variance inside each imputed data set and between imputed data sets (see King, Honaker, Joseph, & Scheve, 2001; Stuart et al., 2009).

Most research using 瓦11 and discussions of this method focuses on enhancing the accuracy of estimated coefficients in regression models. For example, Penn (2007) employed M1 to estimate missing income data and update a recent study examining the influence of parents' standards of living on subjective well-being. He uses data from the 1998 General Social Survey and compares resu1ts of two ordered probit models: one using complete cases only, and the other replacing missing income data with multiple imputation estimates.

Consistent with earlier studies using M1, he confirms that M1 allows researchers to make use of more of the available data and decreases possible biases.

Although enhancing regression analysis is the original purpose of M1, there has been little research about using multiply imputed data sets for descriptive purposes. As yet, the c10sest exception is a study that compares descriptive statisti臼 derived from multiply imputed data sets to determine if two or more methods generate simi- lar results (Bernaards et al., 2003). Still, scholars have not taken a single further step to apply this method to the field of vote predic- tion. If the imputed data sets that are used to run regressions yield results that are better or more informative than analyses based on

(10)

144 調查研究一方法與應用/第 24 期

observed data, the descriptive statistics of the variables of interest in these multiply imputed data se臼 arecertainly worth exploring.

Given the above review, 1 suspect that multiply imputed data sets could be equally useful in terms of inspecting the distribution of the variables of interest, not only to coefficient estimation.

IV. Data Set: TEDS2008L

1. Data Description

Taiwan's Election and Democratization Study for the 2008 legis- lative elections (TEDS2008L, N = 1 ,泌的 hasbeen chosen to examine the applicabi1ity of the MI method.3 The TEDS2008L data was col- lected during mid-J anuary and early March of 2008. Compared to other TEDS and other large-scale surveys in Taiwan that were car- ried out during winter or summer vacations (the periods for the exec- utive board to recruit and train student interviewers), TEDS2008L is the only data set collected prior to a presidential election (March 22, 2008). In this investigation, the respondents were asked about voting choices in the upcoming presidential election. Thus, information about partisan support distribution is available 扭曲e original data

3. Data analyzed in this paper were from Taiwan's Election and Democratization Studies, 2008: Legislative Election (TEDS 2008L) (NSC96-2420-H-002-025). The coor- dinator of multi-year project TEDS is Professor Yun-Han Chu (National Taiwan University). TEDS2008L is a yearly project on the legislative election in 2008. The principal investigator is Prof的sorChi Huang. More information is on TEDS website (http://www.tedsnet.org). The author(s) appreciate the assistance in providing data by the institute and individual(s) aforementioned. The author(s) are alone responsible for views expressed herein.

(11)

Reconstruct Partisan Support Distribution with Multiply Imputed Survey Data 145

and can be used for comparison against partisan support distribu- tions based on multiply imputed data sets.

2. Software Package for the MI Analysis

Amelia II is a cross-operation system package designed to proc- ess EMis (Expectation Maximization with importance 間-sampling) , one of the suggested algorithms using Markov Chain Monte Carlo (MCMC) methods to calculate imputed values (Honaker, King, &

Blackwell, 2009; Horton, & Ken P. Kleinman, 2007; 1m剖, Gary, &

Olivia, 2009).4 Amelia II is a free, cross-platform toolkit. It is com- patible with R, a widely used open-source package for statistical

4. Expectation Maximization (EM) and Imputation Posterior (IP) are two primary algor- ithms by which to generate imputed data sets. While the IP algorithm is based on the Markov Chain Monte CarIo (MCMC) method, EM, which yields only the maximum values, is a faster and less complex altemative to IP. When using IP, a researcher needs to frequently draw an estimated mean and variance from the disputed data sets created from entire multivariate models of observed data posterior. In order to obtain an exact result as expected, a researcher is hence required to spend a substantial amount of time drawing infinitely before convergence occurs. As King et al. (2001) discuss, the trouble with EM is that it ignores estimation uncertainty and treats the estimated imputed parameters as though they were obtained from complete data sets without missing data, and, therefore, could result in biased coefficients and standard

eπors. Hence, EMs (EM with sampling) and EMis (EM with importance resampling) are proposed to solve the uncertainty problem in EM. As EMs is used for studies of large-N data sets and those composed of continuous variables, EMis is especially designed for samples with multiple parameters and categorical variables_ King et al recommend EMis over other algorithms because EMis posse岱esboth the precision of IP and the speed of EM. Additionally, EMis has the capability to deal with data sets with many variables and takes the concem about the uncertainty of imputed data into account. For the discussion about the strength and limitations of altemative algo- rithms, such as multiple imputation by chained equations (MICE), see Stuart et al.

(2009) and He and Raghunathan (2009).

(12)

• !

'1

146 調查研究一方法與應用/第 2是期

programming, and has a handy graphical user interface (GUI) that allows users to intuitively set the characteristics of the variables with ease. For example, specifying whether the variable is ordinal or nominal by pointing and clicking.

A conventional difficulty in dealing with MI involves transform- ing categorical variables that have more than three levels into dummy variables and then analyzing them by performing a separate EM analysis with the two-level version of the variable (for example,

o

versus other) (Graham, 2009, p. 563). Amelia 11 simplifies this proc- ess considerably; the researcher simply selects the key variables and designates them as nominal variables, and Amelia 11 transforms them into dummy variables and regards them as categorical vari- ables during the imputation process.5 In this study, 1 designate two key variables as nominal, namely vote choice (Kuomingtan, KMT or Democratic Progressive Party, DPP) and party identification (the two major parties and the four small parties); the former is the target variable for imputing, whi1e the latter is the primary variable that provides information for imputing the missing values of vote choice.6

5. For researchers following the three.step procedure of conducting MI, Zelig, another package compatible with R, is suggested for the combination stage (see Im剖, King,&

Lau 2009). Since hypothesis testing is not the goal of the present study, the analysis below wi11 concentrate on using Amelia II for the first two stages of MI.

6. In TEDS2008L, respondents were asked about their preferred candidate for president in the upcoming presidential election and partisan orientation. The question regarding vote choice is,“Who did you vote for?" This question is preceded by the voter turnout question,“In this presidential election, many people went to vote, while others, for various reasons, did not go to vote. Did you vote?" The wording of the party identifi.

cation question is as follows,“Among the main political parties in our country, includ.

(13)

Reconstruct Partisan Support Distribution with Multiply Imputed Survey Data 147

3. Variables for Imputing Vote Choice

Variables that 'vvill be used in the “analysis stage", such as logis- tic regression, are the best choice for the MI procedure. 1n other words, the choice of auxiliary varìables for imputing vote choice should be those theoretically associated \vith it, such as the depen 白

dent variabl郎, independent variables, and control variables (Abayomi, Gelman, & Levy 2008; King et aL, 2001; W ood, Whi妞,&

Royston, 2008). Self-claimed party identification is the first to be in- cluded in the list of variables for M1, because it has been well ac- knowledged by political scientists since Campbe日, Converse, Miller,

and Stokes (1960) as the most important variable for explaining vote choice.

Table 1 demonstrates a high level of missingness for these two key variables vote choice and party identification. The proportions of missingness for these two variables are 鈞。 and37.7, respectively If partisan support distribution is calculated by ignoring those who did not respond, the support for KMT's Ma Ying-jeou and Vincent Siew amounts to 70.3% (的 7/707) and that of DPP's Frank Hsieh and Su Tseng-chang comes to 29.7% (210/707). Although this "poll" pre- dicts the victory of KMT's Ma and Siew, \vhich is consistent with the result of the electio跤,the point estimate is far from satisfactory and nowhere near KMT's real vote share of 58.45% and DPP's 4l.55%.

The partisan support distribution of KMT is overestimated, while

ing the KMT, DPP, l\;P, PFP. and TSU, do you regard yourself as leaning t 。如 ardany particular party'"

(14)

148 調查研究方法與應用/第 24 期

Table 1

Descriptive Statistics for TEDS2008L

Targeted Variables V ote Choice for President President

Party Identification

Note. Source: TEDS2008L

Response Items

KMT (Ma Ying-jeou & Vincent Siew) DPP (Frank Hsieh & Su Tseng-chang) Missing (refuse to answer, don't know

& skip) KMT DPP NP PFP TSU Others

Missing (refuse to answer, depends, don't know)

N 二 1 , 240 ,並請說明 KMT 、 DPP 、 NP 、 PFP 、 TSU 代表的意思

N(%) 497(40.1) 210(16.9) 533(43.0)

446(57.8) 290(29.0) 16( 2.1) 1( 0.1) ll( 1.4) 8( 1.0) 468(37.7)

that of DPP is underestimated, implying that TEDS2008L fails to accurately describe of the partisan support distribution of each party.

Table 2 1ists variables drawn from TEDS2008L for MI. These variables were put together into a new data set and loaded into

Amelia

11.7 These variables were selected based on their theoretical assoCÌation with vote choice, including party identification men-

7. Because the variables chosen here are ordinal and nominal, 1 conducted the chi-square test of independence. If a chosen variable is numeric, it is suggested to show the corre.

lation between the cause(s) of the missingness or auxiliary variables, Z, and the model variable containing missingness, Y. The auxiliary variables that yield correlation

rZY=O.鉤,or at least .50, wiII have a major impact on reducing the biasing effects of attrition (Graham, 2009).One or two auxiIiary variables with rZY =0.60 are better than 20 auxiliary variables whose correlations with Y are aII less than rZY =0.哇。"

(570).

(15)

MNmno口叩門門口們HMV

呵成的呂∞晶宮泣。佐江

σ色。口在各法已卅一MvqHS宮門叩門早已可峙。但S

A List of All Variables 向rImputing Vote Choice (TEDS2008L)

Note dichotomous

multinomial; x2=930.7, df =5

a11φoint scale; x2=306.9, df=10 a 11-point scale; x2=336.3, df=10 a 11-point scale; x2=389.5, df 立 10

a 11-point scale; x2=170.8, df=10 a ll-point scale; x2=348.4, df=10 a11-point scale; X2=246.2, df=10 multinomial; x2=242.1, df=ll 152

A02A

3-option mutinomial;

x2=6.7, df =2 , ρ=0.03

3-option multinomial; x2=22.4, df =2

a 11-point scale; x2=350.2, df=10 a 11-point scale; x2=299.4, df=10 4-point ordinal; x2=21.0, df =3 multinomial; x 2 = 43.6, df = 4

a 13-1evel scale; x2=30.0, df=12 , 戶 =.003

Note. Source: TEDS2008L.N=I,240; chi-square test of independence against vote08"; p values lower than .001 are not 陀ported.1n tvNews top ten most watched TV news channels are coded to 1 to 10 ,問spectively;other channels are coded into 11; 0 is used for those saying never watch TV news.The coding of talkShows is based on a rule that re-categorizes mentioned talk show programs into corresponding ne叩schannels: 0 for never watch such programs; I=TVBS; 2=SET TV; 3=Formosa

TV; 是 =CtiTV; 5=others (less than 1%).

multinomial; X2=161.9, df=5 123

B02A Description

V ote Choice for President in 2008 Party ID

The degree of liking KMT The degn咒 ofliking DPP Evaluation of Ma Ying-Jio Evaluation of Vincent Siew Evaluation of Frank Hsieh Evaluation of Su Tseng-Chang

Most frequent1y watched TV news Channel (in general)

Most frequent1y watched TV news Channel (for political talk shows in particular) Prospective Economy Condition

Table 2

Missing 533 468

qJQdqu7.AYnqυ的/“內《un'FhυhdTi--ly--Ai

Code S05 M01B M02A M02B G05 G06 G07 G03 vote08

partyID likeKMT likeDPP MaScale HsiaoScale HsiehScale SuScale

Variable

tvNews

talkShows

271

35 130 119 155

日的

H05 G02 V01 U08 Retrospective Economy Condition

Evaluation of Chen Shui-Bian

Satisfaction with Chen's administration Satisfaction with Taiwan's Democracy prosEcon

retroEcon ChenScale incScore demScore

MA酬。

20 20 X02

X06 Father's ethnicity

Education level eth

edu

(16)

150 調查研究一方法與應用/第 2甚期

tioned above (X2=469.8, df斗, ρ<.001) and variables of evaluations of incumbents, political parties, and candidates (e.g., Erikson, Mack- uen, & Stimson, 2002); habits of watching political news (Lazarsfeld, Berelson, & Gaudet, 1968); evaluation of the economy (Alvarez, Nag- ler, & Bowler, 2000) and democracy (Sullivan & Transue, 1999); and demographics, such as ethnicity (Kam, 2007) and education level (Zuckerman, Kotler-Berkowitz, & Swaine, 1998). Tl;1ese variables are chosen based on the causal relationships (at least correlations) identified in the literature. The conventional wisdom suggesting that vote choices are influenced by their ethnicity or race and the level of political knowledge. The above literature, put together, suggest that voters' choice wil1 be influenced by (or correlated with) their liking of particular candidates or candidates, their favorite news sources that is potentially biased against certain issues, parties or candi- dates, and their satisfaction with current or past economy. Their satisfaction with democracy is also related to their partisanship and vote choices since the competition between parties of the two-party system wil1 turn into the trust and distrust of electoral processes.

Indeed, there shall be more variables indicated in the literature. As no data set provides the whole battery of variables for imputing vote choices, 1 chose the above that is theoretically and logically related

to 出etarget variable for MI.

4. Experiment Design

1 uti1ized the partisan support distributions of the two political parties in the 2008 Presidential Election, indicated by their vote shares, as the baseline of comparison, while these figures are to be

(17)

Reconstruct Partisan Support Distribution with Multiply Imputed Survey Data 151

contrasted against the partisan support distributions derived from TEDS2008L without MI and those derived from the same data set after 孔H is app1ied.

Three strategies of variable selection and nine experiments, each of which corresponds to a specific variable selection strategy, were consequently conducted. The partisan support distribution of KMT and DPP were recorded and compared in Table 2. The purpose of comparing the results is to present an efficient combination of variables that may later be used in a telephone survey, which is usu- ally constrained by the number of questions.

The first strategy, as shm凡Tl1 in Figure 1, includes all of the vari ables 1isted in Table 2. Note that 1 specified three variabl凹, specifi- cally partyID, vote08, and eth as nominal variables. These specifica- tions wi1l force Amelia 11 to impute these variables with integers in the sense of categories rather than in continuous or ordinal fashion.

Figure 1 Strategy 1: All variables are used

(18)

152 調查研究方法與應用/第 24 期

Figure 2 Strategy 2: Some variables are used

The second strategy entails the use of certain of the variables in the procedure. As i1lustrated in Figure 2, the 10 variables chosen are the respondent's party identification, feelings about the two parties, eval- uations of the two presidential candidates and the incumbent, evalua- tions about the economy, ethnicity and education. Note that those variables that are not to be used in the 孔11 procedures should be labeled as ID variables.

The third strategy, similar to the second, is an attempt to make use of even less variables. As Figure 3 shows, the 8 variables chosen in this strategy are party identification, the habit of watching politi- cal news and talk shows, the evaluation of the incumbent president, a retrospective view about the economy, and education.8

8. Indeed, other strategies for combining variables and determining the number of vari-

(19)

Reconstruct Partisan Support Distribution with Multiply Imputed Survey Data 153

Figure 3 Strategy 3: Minimum variables are used

V. Experiment Results

Table 3 lists the MI results of the three strategies of variable selection. The same seed 123 was used for reiteration, and the num- ber of data sets to be created is set to 10. The proportions shown in the table are the average of partisan support distributions derived separately from the 10 imputed data sets. The number 10 is not a fixed number; for this study, 1 assumed that the means of 10 samples would be more stable than that of other numbers of samples less than 10.

able that should be chosen also exist. These two topiωexceed the scope of this paper and await future inspection.

(20)

154 調查研究方法與應用/第 24 期

Table 3 Vari,αble Selection Strategies and Their Imputation Results (TEDS2008L)

Partisan Support Distribution (%)

Strategy Description

KMT DPP

2008 Presidential Election Results 58.45 41.55

1 All variables used 62.59 37.41

(0.57) (0.57)

2 Some variables used 62.94 37.06

(0.96) (0.96)

3 Minimum variables used 63.35 36.65

(1.17) (1.17)

Raw data (before using MI) 70.30 29.70 Note: In the parentheses are standard deviations. In the three experimen筒,出esame seed 123

was used and the number of data set to create is set to 10.

The first finding indicates that the more variables used the bet- ter and more stable the results are, as evidenced by a comparison of standard deviations. The second finding is that, although the strat- egies vary, a pattern of partisan support distribution emerges for the two parties: KMT around 62% and DPP around 37%. This result, as a point estimate, is much closer than that before applying MI. The gap shrinks from 11% to about 4% for each party. Note that the closing of the gap does not imply that MI is the best tool for election result prediction. Instead, it suggests that MI provides a better way to represent the uncovered (true) distribution of the support rates of the two pairs of presidential candidates. In other words,位le MI method provides an indirect way to predict election results. Readers should be reminded again not to take the figures generated by 出is

method as an empirically valid vote shares.

(21)

Reconstruct Partisan Support Distribution with Nlultiply Imputed Data 155

VI. Conclusion and Discussion

Analyzing survey data is one of the most promising methods by

\vhich to predict election results. A commonly perceived problem with this method is that a significant proportion of respondents con- ceal their preferences; this is a particular problem in Taiwan, when survey questions are sensitive or when the survey is about political preferences during a campaign season. Therefore, it is difficult to employ a data set, even when the observations are wel1-sampled, to inspect the unknown distribution of their vote choices.

This study argues that multiple imputation (MI) can be a tool for advancing point estimation, particularly when the cost of pol1ing is a concern. The method proposed in this paper helps to increase the predictability of single survey data. The preliminary finding derived from pre-election face-to-face survey data col1ected during Taiwan冶

2008 presidential electoral campaign confirms this perspective, sug- gesting that the averaged partisan support distributions based on 孔11

data sets provide a better guess for election results than those simply derived from the origina1 data set.

MI is a method commonly used to deal with problems of missing data. The other types of errors that lead to biased point estimation, such as systematic non-response error, measurement error, samp1ing error, and frame error (a1so called frame imperfection), are beyond the scope of this paper. These types of errors are worth mentioning here because they account for the inaccuracy of adjusted point esti- mates and should be dealt with in future studies.

(22)

156 調查研究方法與應用/第 24 期

Frame error should be the major cause of the inaccuracy of sur- vey data. This refers to situations in which a sample collected at a specific time fails to encompass certain elements of the target popu- lation. Coverage error is the most important form of frame error. It occurs when the elements in the sampling frame do not correspond correctly to the target population in which the researcher wants to make inferences. Coverage error, which occurs most commonly in telephone surveys, specifically refers to“the mathematical difference between a statistic calculated for the target population studied and the same statistic calculated for the target population. It occurs when there is bias due to the omission of non-covered units, such as omit- ting people who do not have phones in a telephone survey (Weisberg, 2005). Coverage error includes under coverage, over coverage, and duplicate listings (For example, a target population element is listed more than once in the frame). Frame error occurs in almost every survey and poll. It can be solved only when sampling techniques are advanced.

Although imperfect frames are a reality in most large surveys, the remedies employed in statistical agencies and elsewhere vary considerably. In the absence of ‘firmly established methodol- ogy', the procedures in use may seem ad hoc" (Särndal & Lundst的m,

2005, p. 179; original emphasis).

Furthermore, this method is based on the assumption that every voter has a true partisan preference, if he or she is forced to reveal such a preference honestly. This assumption can find its theoretical root in most American studies of voter behavior, such as the concept of belief system. While this assumption remains valid on a theoreti- callevel, it has not been empirically tested, so readers should be cau-

(23)

Reconstruct Partisan Support Distribution 玩 ithl\lultiply Imputed SUl \'ey Data 157

tious about applying the method to a situation \vhere respo日dents

may not have true preferences.

Besides the above factors that influence the quality of sampling, there are a few other topics that are associated with the 孔n method worthy of inspection. First, this study compares three strategies of variable choice (as shown in Table 3) by their standard divinations.

This selection by standard deviation, however, is insufficient for the selection of best strategy. Future studies investigating on this strategy-selection issue. Second, Amelia 11 is one of a few software packages available for MI. A project that compares results derived from other software packages and/or other algorithms, such as near- est neighbor imputation, will be a welcome addition for scholars using MI for election prediction. Third, the entire method of using MI for election prediction is built upon the assumption that respon- dents of the imputed data sets turn out to vote, and that their answers to auxiliary variables are safe for predicting their voting choices. In other words, future studies should take a closer look at each of these assumptions and evaluate the extent to which the viola- tion of these assumptions affects the results.

The forth issue is far beyond the scope of this paper, but is an important constraint that applies to all studies using MI, including the present one. That is, scholars tend to assume that the pattern of item-non-response lying behind data is "missing at random" (MAR) or "conditionally missing at random" (King et 泣, 2001). Specifically,

suppose non-response to a party identification question (Varl) is as- sociated \yith one's party orientation (Var2). The missingness on Var

(24)

158 調查研究方法與應用/第 M 期

1 is conditioned on the Var2 and is thus MAR.9 This is an assump- tion held in most 孔11 studies, but one that is very hard to verify, because this would require knowledge of the missing values them- selves (Little & Rubin, 1987). Particular1y, as G. David Garson states,

For purposes of univariate analysis (e.g., understanding the fre間 quency distribution of how subjects respond to an opinion item) imputation can reduce bias and often is used for this purpose if data are missing at random."10 As MAR is an important assumption em- bedded in the design of Amelia 11, the analytical tool used in this paper, and because this problem awaits a solution, readers shall fol-

9. In contrast to the other two assumptions about item-missing data that are regarded as unrealistic, i.e., missing completely at random (MCAR) and not missing at random (NMAR), MAR is most commonly assumed for large-scale data sets. MCAR means that the missingness is unrelated to the variable under study, therefore, the missing data are considered to be ignorable". NMAR, also called nonignorable (NI), means that the probability of missingness depends on both observed and unobserved values.

MAR, while empirically unverifiable, is often a reasonable assumption to make unless substantive knowledge about the data or data collection process indicates that the missingness may depend on unobserved values. . . . The MAR assumption is also sometimes made more reasonable by including auxiliary variables' that are related to the missingness but may not be of interest in the analyses themselves; in fact, this strategy can greatly improve the imputations" (Stuart et al., 2009, p. 1134). MCAR and MAR result in unbiased parameter estimates, while MNAR missingness is con sidered a problem because it yields biased parameter estimates (Graham, 2009). It is very difficult to identify the pattern and the mechanism of missing data,“since the MAR condition cannot be tested empirically, the analyst must decide on theoretical grounds whether to use imputation techniques appropriate for MAR missing data or to model the missing data as NI" (Weisberg, 2005, p. 151). In other words, MNAR would be classified as MAR if the variables were correlated, and MN AR would be classified as MCAR if the variables were uncorrelated (see Templ & Filzmoser, 2008 for their clear descriptions using visualization).

10. See http://faculty

(25)

Reconstruct Partisan Support Distribution with Multiply Imputed Survey Data 159

low the current development of methods to ímprove the robustness of analysis, such as ídentifying the patterns of missíng data visually (e.

忌, using R package “Visualization and Imputation of Missing Val- ues", VIM). One method that is worth consideration is applying sensi- tivity analysis to multiple imputation. Simply p哎,one can use impor- tance sampling to re-weight the parameter estimates obtained from the imputed data sets, making MI results represent the distributíon of imputations under a NMAR mechanism, by which researchers can judge whether the results are likely to be sensitive to the MAR assumption, as well as the likely direction and magnitude of the effect (Carpenter, Kenward, & White, 2007).

REFERENCES

Abayomi, Kobi, Andrew Gelman, & Marc Levy

2008 Diagnostics for Multivariate Imputations. Journal 01 the Royal Statistical

Socie妙: Series C (A妙liedStatistics), 57 , 273也291.

Alvarez, R. Michael, Jonthan Nagler, & Shaun Bowler

2000 Issues, Economics, and the Dynamics of Multiparty Elections: The British 1987 General Election. American Political Science Review, 94(1), l31-149.

Bernaards, Coen A., Melissa M. Farmer, Karen Qi, Gareth S. Dul剖,Patricia A. Ganz, &

Katherine L. Kahn

2003 Comparison of Two Multiple Imputation Procedures in a Cancer Screening Survey. Journal 01 Data Scie甜ce,1(3), 293-312.

Bishop, George F., & Bonnie S. Fisher

1995 ‘Secret Ballots' and Self-reports in an Exit.Poll Experiment. Public 印inion

Quarterly, 59(4), 568-58串,

Campbell, Angus, Phillip Converse, Warren Miller, & Donald Stokes 1960 The American Voter. New Y ork: Wiley.

(26)

160 調查研究方法與應用/第 24 期

Carpenter, James R., Michael G. Kenward, & Ian R. White

2007 Sensitivity Analysis after Multiple Imputation under Missing at Random: A Weighting Approach. S的tistical Methods in Medical Research, 16(3),259-275.

Coakley, ].

2008 Militant Nationalist Electoral Support: A Measurement Dilemma. Interna tional Journal 01 p,的licOpinion R俗的rch,20(2), 224-236

Durand, Claire, AndrεBlais, & Mylène Larochelle

2004 The PolIs in the 2002 French Pr的dentialElection: An Autopsy. Public 甸的­

ion Quarterly , 的 (4),602-622

Erikson, Robert S叮 Michael B. Mackuen, & James A. Stimson 2002 The Macro Polity. New York: Cambridge University Press.

Flannelly, Keven]., Laura T. Flannelly, & Malcolm S. McLeod, Jr

1998 Comparison of Election Predictions, Voter Certainty and Candidate Choice on Political PolIs. Journal 01 the Market Research Socie秒 , 40(呦,337-346.

Flannelly, Laura T., Keven]. Flannelly, & Malcolm S. McLeod, Jr

2000 Comparison of Forced-Choice and S的jective Probability Scales Measuring Behavioral Intentions. P:見ychological Rφ口的,86(1), 321-332

Gelman, Andrew, Gary King, & Chuan-Hai Liu

1998 Not Asked and Not Answered: Multiple Imputation for Multiple Surveys.

Journal 01 the American Statistical Associatio呵,93(443), 846-857.

Graham, J ohn W

2009 Missing Data Analysis: Making It Work in the Real World. Annual R叫ew

。ifPsychology, 60,549-576 He, Yulei , & Trivellore E. Raghunathan

2009 On the Performance of Sequential Regression Multiple Imputation Methods with N on N ormal Error Distributions. Communications in Statistiα

Simulation and C仰甜1忱的tio呵,38(4), 856-883.

Ho, Shirley S., & Douglas. M. McLeod

2008 Social回psychologicalInfluences on Opinion Expression in Face-to-Face and Computer-Mediated Communication. Communication Res的rch, 35(2), 190 207

Honaker, James, Gary King, & Matthew Blackwell

2009 Amelia II: A Program for Missing DAT A. Retrieved April 24, 2009, from Amelia Software Web Site: http://gking.harvar<i.edu/amelia/.

Horton, Nicholas]., & Ken P. Kleinman

2007 Much Ado about Nothing: A Comparison of Missing Data Methods and Soft回

ware to Fit Incomplete Data Regression Models. American S的tisticia刀, 61(1),

(27)

Reconstruct Partisan Support Distribution with Multiply Imputed Survey Data 161

79-90.

1m訟,Kosuke, Gary King, & Olivia Lau

2009 Zelig: Everyone's Statistical Software. Retrieved May 5, 2009, from http://

gking.harvard.edu/ zelig/.

Kam, Cindy D

2007 Implicit Attitudes, Explicit Choices: When Subliminal Priming Predicts Can司

didate Preference. Political Behavior, 29(3), 343一 367

King, Gary, James Honaker, Anne Joseph, & Kenneth Scheve

2001 Analyzing Incomplete Political Science Data: An Altemative Algorithm for Multiple Imputation. American Political Science Rel 信仰, 95(1),是9-69 Lazarsfe肘,Paul Felix, Bemard Berelson, & Hazel Gaudet

1968 The Peop訟法 Choice: How the Voter Makω up His M的d i呵。 Presidential

Cam戶。ign. New York: Columbia University Press.

Little, Roderick J. A., & Donald B. Rubin

2002 Statistical A nalysis 仰ith Missiηg Data (2nd ed.). Hoboken, N.J.: Wiley

McDevi哎,Michael, Spiro Kiousis, & Karin Wahl-Jorgensen

2003 Spiral of Moderation: Opinion Expression in Computer-Mediated Discussion.

International Journal 01 1有必lic 甸的io甜 Research, 15(是),是5哇哇70.

Paul, Christoph缸,William M. Mason, Daniel McCaffrey, & Sarah A. Fox

2008 A Cautionary Case Study of Approaches to the Treatment of Missing Data.

Statistical Methods and Applications, 17(3),351-372.

Penn, David A.

2007 Estimating Missing Values from the General Social Survey: An Application of Multiple Imputation. Social Science Quarterly, 88(2),573-584.

Raghunathan, Trivellore E.

2004 What Do We Do with Missing Data? Some Options for Analysis of Incom- plete Data. Annual Review 01 Public Hea飾, 25 ,的← 117

Rubin, Donald B.

1987 MultiPle Inψutation lor No咒res戶onse 的 Surz均IS.New York: Wiley.

Sä口ldal ,Carl-Erik, & Sixten Lundstrδm

2005 Esti~叩tionin Su仰砂swith Nonre秒。向se. Hoboken, N .J.: Wiley.

Stuart, Elizabeth A., Melissa Azur, Constantine Frangakis, & Philip Leaf

2009 Multiple Imputation with Large Data Sets: A Case Study of the Children's Mental Health Initiative. American Journal 01 E,戶idemiology , 169阱, 1133- 1139.

Sullivan, J. L., & Transue, J. E.

1999 The Psychological Underpinnings of Democracy: A Selective Review of Re-

(28)

162 調查研究方法與應用/第 24 期

search on Political Tolerance, Interpersonal Trust, and Social Capital.

Annual Review 01 p:句'chology , 呵, 625-650.

Templ, M., & P. Filzmoser

2008 Visualization of Missing Values Using the R-package VIM. Retrieved April 2, 2010, from http://www.statistik.tuwien.ac.at/forschung/CS/CS-2008-1com- plete.pdf

Weisberg, Herbert F.

2005 The Total Surv砂 Error A紗roach: A Guide to the New Science 01 Survey Research. Chicago: University of Chicago Press.

W ood, Angela M., Ian R. White, & Patrick Royston

2008 How Should Variable Selection be Performed with Multiply Imputed Data?

Statisti俗的 Medicine,27 (17), 3227-3246

Zuckerman, Alan S., Laurence A. Kotler-Berkowi紹,& Lucas A. Swaine

1998 Anchoring Political Preferences: The Structural Bases of Stable Electoral Decisions and Political Attitudes in Britain. Eurl妙的n Journal 01 Political Research, 33(3), 285-321

參考文獻

相關文件

6 《中論·觀因緣品》,《佛藏要籍選刊》第 9 冊,上海古籍出版社 1994 年版,第 1

Understanding and inferring information, ideas, feelings and opinions in a range of texts with some degree of complexity, using and integrating a small range of reading

Writing texts to convey information, ideas, personal experiences and opinions on familiar topics with elaboration. Writing texts to convey information, ideas, personal

Writing texts to convey simple information, ideas, personal experiences and opinions on familiar topics with some elaboration. Writing texts to convey information, ideas,

In case of non UPnP AV scenario, any application (acting as a Control Point) can invoke the QosManager service for setting up the Quality of Service for a particular traffic..

Seeing other good project videos from classmates may motivate students to prepare better. No excuses for “sudden” technical

Given a graph and a set of p sources, the problem of finding the minimum routing cost spanning tree (MRCT) is NP-hard for any constant p &gt; 1 [9].. When p = 1, i.e., there is only

Experiment a little with the Hello program. It will say that it has no clue what you mean by ouch. The exact wording of the error message is dependent on the compiler, but it might