Note: Standardized discriminations are estimated from initial 2PL calibrations with junior- and senior-high EFL learners’ responses

As these 23 items were intended for use with university undergraduate EFL

learners in the current study, they were then presented to a panel of university

sophomores concurrent with the face validity review of 21 BELLA indicators,

mentioned previously. The undergraduate students did not express dissatisfaction

with the Chinese wording of the items.

Inventory of Strategic Competence in EFL Listening (ISCEL)

It is not suggested that a single scale of general strategy use can capture the full

unobserved variance influencing listening outcomes. The reader may recall that a

salient theme running through the review on listening strategy research pointed to a

qualitative aspect of strategy use that must be accounted for. This logically follows

from the observations found in numerous effect studies that listening strategy use does

not guarantee successful listening processing. Learners of L2, must have a trait

strategic competence which supports effective implementation of strategies in their

personal repertoires, that is, they must match strategies to appropriate situations such

that the strategy can be used successfully. It is hypothesized that the relation of

strategic competence and strategic use must be manifest in interaction effects between

these traits. To measure this trait variable of strategic competence, the present study

proposed construction of a scale called the Inventory of Strategic Competence in EFL

Listening (ISCEL).

The purpose of ISCEL is to gauge the qualitative aspect of the strategies that are

endorsed in ELLSI by querying the respondents’ self-assessed confidence in adept

implementation of those strategies, thus, necessitating indicators identical to ELLSI.

Accordingly, the ISCEL instrumentation consists of a separate scale for each of the

ELLSI strategic knowledge indicators. While ELLSI queries proclivities to apply

strategic mental operations to listening processing, the corresponding ISCEL scale

will subsequently query the strength of confidence in using the strategy. For

example, every ELLSI item such as: “I use knowledge of English stress and

intonation to help me figure out words spoken unclearly,” is followed by a subsequent

statement: “I use this strategy well”. The former taps a trait use of extant listening

strategies, while the latter measures how well one uses the strategy, and each part of

these compound strategic variable indicators has a separate array of response options.

While the strategic knowledge component has a frequency scale from zero to four

with extremities marked “never” and “always”, the strategic competence component

has an incremented option array from zero to four with extremities marked “very

much disagree” and “very much agree.” When taken together, the ELLSI and

ISCEL instruments are intended to describe variance in both the quantitative and

qualitative aspects of trait listening strategy use.

Listening Test

Maintaining a professional and consistent quality of stimuli audio for the

listening comprehension test is the prime concern when designing the measure of EFL

listening ability. First, it is difficult and costly to hire trained voice talents for

recording item stems, and renting recording studio time further compounds the cost.

At the same time, a high-fidelity recording of clearly enunciated stimuli with

consistent delivery of speech rate and volume is required in order to reduce

confoundment during test delivery. Commercially produced practice test recordings

were selected as an expedient method to overcome recording quality issues.

However, the use of off-the-shelf exams introduced a new danger to the validity of the

instrument, namely, test effects derived from participants’ possible familiarity with the

content of the source material. At the tertiary level of EFL education in Taiwan, it is

anticipated that subjects would have garnered between six to 10 years of English

study experience, and may have sat for official English certification exams. The

solution was to utilize the stimuli recordings in random order and re-formulate the

item stems and option arrays into original questions and answers.

Wavepad Sound Editor, a free waveform editing program, was utilized for

editing and compiling all elements of the listening test items. Use of digital sound

editing enabled the creation of two, counter-balanced test forms for each round of

item piloting and calibration to counteract order and fatigue effects. Wavepad

functions of sound normalization and noise deletion provided for seamless,

professional-grade recordings on MP3 which were used for the duration of the study.

Test content. As the measurement of listening proficiency was meant to

imitate the experience of standardized English testing in Taiwan, a format similar to

the Taiwanese General English Proficiency Examination (GEPT) was devised. In

this case, the second and third parts of the GEPT listening examination, comprised of

stimulus-response items and short conversation items, respectively, were adopted.

The total length of the finalized version of the measurement instrument was 40 items,

based on the outcome of calibration studies and validity reviews of item pools.

Separate item pools for the stimulus-response and conversation items were

constructed for calibration studies. After an item pool was composed, it was sent to

three other experienced English instructors at two universities in Taiwan for review of

content validity and vetting of linguistically biased or ill-worded items, including

possible multiple correct answers in the option arrays. Item pools were finalized

based on comments from the expert reviewers. The item pool for stimulus-response

items consisted of 35 items, while the item pool for conversation items consisted of

28 items. The item pools were formulated into test format by adding a note

explaining the research purpose of the exam (in Chinese) and an example question at

the beginning of the test. These calibration tests were made into counter-balanced

forms A and B.

Calibration. The researcher visited classrooms of university students, both

English-majors and non-majors, and explained the purpose of the study and that

participation was voluntary. Participants received extra-credit from the teachers for

the respective courses. Pilot versions of the BELLA and ELLSI questionnaires were

administered prior to invigilation of the exams; examination commenced after

questionnaire forms were collected. The calibration form of the stimulus-response

exam took 10 minutes for completion, while the calibration form of the short

conversation exam took 17 minutes. Results of calibration studies and implications

for test construction are discussed under the Pilot Studies section.

Pilot Studies

Previously mentioned research regarding the construction of initial item pools for

belief and strategic knowledge indicators and reduction of said pools to

questionnaire-ready inventories described findings with respect to junior- and senior

high EFL learners. At the piloting stage for the present study, the belief inventory

was assumed two-dimensional and valid for a separate population of learners based on

findings in Nix and Tseng (2014). Meanwhile, the most reliable indicators for the

strategic knowledge inventory had been determined, while the dimensional structure

remained unspecified. And with respect to the listening exams, the item pools had

undergone the first expert content review and had been formatted into

counter-balanced test question forms and MP3 stimuli recordings. Accordingly,

piloting of the BELLA instrument involved cross-validation with a sample of

university undergraduates (the population under study herein) to ascertain structural,

metric and scale invariance. Piloting of the ELLSI instrument involved specification

and identification of the dimensional structure and gender DIF tests of metric and

scale invariance with samples of university undergraduates. Piloting of the listening

exam items involved item calibration, gender DIF analysis, and removal of items with

poor characteristics. These pilot analyses were conducted in the fall of 2013 and

spring of 2014 at a single university in eastern Taiwan.

Validating Measurement Model of Beliefs

In this section, descriptions of piloting procedures and data analysis specific to

the application the BELLA inventory on university undergraduates are provided. As

shown in previous analyses, the EFL listening learning beliefs of young adult learners

fall into a two dimensional structure of Axiomatic and Praxis beliefs with 11

Axiomatic indicators and 10 Praxis indicators. It was anticipated that the Axiomatic

and Praxis dimensions would remain distinct, yet the identical structure of 11 and 10

indicators, respectively, could not be assumed valid when applied to a relatively

mature population. In portions of the Chapter 2 literature review it was noted that

beliefs, though stable as trait variables, are still malleable over time. Hence, the

imperative to cross-validate the BELLA questionnaire with a representative university

undergraduate sample. The sample in this pilot study was comprised of 315 (175

females, 140 males) university undergraduates of mixed major of study, including

English majors, at a single public university in eastern Taiwan.

Results of BELLA Pilot

To accomplish cross-validation of the BELLA inventory, responses from a

sample of 315 university undergraduates of mixed major of study were collected.

EMIRT analyses compared unidimensional, two-dimensional and three-dimensional

variants. EMIRT analysis disconfirmed the presence of a third dimension as the

model was unidentified without constraints on two items loading on the second and

third dimensions, and seven items significantly cross-loaded on all three dimensions,

yielding no interpretable solutions. In contrast, the two-dimensional structure

exhibited only four significant cross-loading items which were still in line with the

structure identified in previous research with young EFL learners. Therefore, the

results of this EMIRT suggested that two dimensions were present as latent traits in

the subjects, yet, the structure of the loadings had altered such that more items

reflected Axiomatic beliefs and fewer items reflected Praxis beliefs.

EMIRT results showed that the Axiomatic dimension of listening learning beliefs

may be reflected by items one to 13, or possibly one to 14, as both items cross load

between dimensions. Item 13, “Listening to any English materials that I like helps

me learn English,” significantly cross-loads on both dimensions, rendering the

empirically based structure ambiguous, as it had previously described a Praxis belief

concerning study or practice habits. As seen in Table 3, item 14, “Listening to

material produced by native speakers is best for learning English,” which apparently

describes Praxis beliefs, also loads on the Axiomatic dimension. A case could be

made for partitioning the observed variables as 14 indicators of Axiomatic beliefs

versus the remaining seven Praxis indicators, which are completely focused on

strategy use. In this configuration, the Praxis dimension would exclusively reflect

individuals’ beliefs on listening learning strategy implementation. Therefore, the

objective of follow-up confirmatory MIRT was to compare the original structure,

model 2Do, against both the novel structures and unidimensional model as

alternatives.

Figures 4 to 7 provide a graphical summary of the current hypotheses: Model 1D

represents a default, parsimonious unidimensional model, model 2Do represents the

two-dimensional structure identified in previous research with EFL learners at the

secondary level of education, model 2D₁ represents the first two- dimensional

structure hypothesized on the current EMIRT analysis with tertiary level EFL learners,

and model 2D2 represents the second two-dimensional structure hypothesized on the

current EMIRT analysis with tertiary level EFL learners. The default unidimensional

model is a parsimonious structure where listening learning beliefs lie along a single

continuum of variation. The various two-dimensional structures posit listening

learning beliefs as measurable along two independent, but related continua, Axiomatic

beliefs and Praxis beliefs, with the latter contingent on the existence of the former.

Results of confirmatory MIRT suggest the novel two-dimensional structures, 2D1

and 2D2, to fit the data significantly better. Results of EMIRT are shown in Table 3

and model fit indices in confirmatory MIRT are shown in Table 4. It was found that

both of the novel BELLA structures reflected the underlying constructs better than the

original structure identified in prior research, with the 2D2 modelexhibiting optimum

values across all the model fit indices.

Figure 4. The unidimensional default Model 1D.

Figure 5. The two-dimensional model previously identified, Model 2D

Figure 6. Hypothetical alternative two-dimensional structure, Model 2D

Figure 7. Hypothetical alternative two-dimensional structure, Model 2D

Table 3.

Two –Dimensional BELLA Structure in EMIRT with Undergraduates (n = 315)

Axiomatic Praxis

在文檔中聽力聯合模式：說明信念與策略對聽力理解能力影響之實證模式 (頁 123-133)