As these 23 items were intended for use with university undergraduate EFL
learners in the current study, they were then presented to a panel of university
sophomores concurrent with the face validity review of 21 BELLA indicators,
mentioned previously. The undergraduate students did not express dissatisfaction
with the Chinese wording of the items.
Inventory of Strategic Competence in EFL Listening (ISCEL)
It is not suggested that a single scale of general strategy use can capture the full
unobserved variance influencing listening outcomes. The reader may recall that a
salient theme running through the review on listening strategy research pointed to a
qualitative aspect of strategy use that must be accounted for. This logically follows
from the observations found in numerous effect studies that listening strategy use does
not guarantee successful listening processing. Learners of L2, must have a trait
strategic competence which supports effective implementation of strategies in their
personal repertoires, that is, they must match strategies to appropriate situations such
that the strategy can be used successfully. It is hypothesized that the relation of
strategic competence and strategic use must be manifest in interaction effects between
these traits. To measure this trait variable of strategic competence, the present study
proposed construction of a scale called the Inventory of Strategic Competence in EFL
Listening (ISCEL).
The purpose of ISCEL is to gauge the qualitative aspect of the strategies that are
endorsed in ELLSI by querying the respondents’ self-assessed confidence in adept
implementation of those strategies, thus, necessitating indicators identical to ELLSI.
Accordingly, the ISCEL instrumentation consists of a separate scale for each of the
ELLSI strategic knowledge indicators. While ELLSI queries proclivities to apply
strategic mental operations to listening processing, the corresponding ISCEL scale
will subsequently query the strength of confidence in using the strategy. For
example, every ELLSI item such as: “I use knowledge of English stress and
intonation to help me figure out words spoken unclearly,” is followed by a subsequent
statement: “I use this strategy well”. The former taps a trait use of extant listening
strategies, while the latter measures how well one uses the strategy, and each part of
these compound strategic variable indicators has a separate array of response options.
While the strategic knowledge component has a frequency scale from zero to four
with extremities marked “never” and “always”, the strategic competence component
has an incremented option array from zero to four with extremities marked “very
much disagree” and “very much agree.” When taken together, the ELLSI and
ISCEL instruments are intended to describe variance in both the quantitative and
qualitative aspects of trait listening strategy use.
Listening Test
Maintaining a professional and consistent quality of stimuli audio for the
listening comprehension test is the prime concern when designing the measure of EFL
listening ability. First, it is difficult and costly to hire trained voice talents for
recording item stems, and renting recording studio time further compounds the cost.
At the same time, a high-fidelity recording of clearly enunciated stimuli with
consistent delivery of speech rate and volume is required in order to reduce
confoundment during test delivery. Commercially produced practice test recordings
were selected as an expedient method to overcome recording quality issues.
However, the use of off-the-shelf exams introduced a new danger to the validity of the
instrument, namely, test effects derived from participants’ possible familiarity with the
content of the source material. At the tertiary level of EFL education in Taiwan, it is
anticipated that subjects would have garnered between six to 10 years of English
study experience, and may have sat for official English certification exams. The
solution was to utilize the stimuli recordings in random order and re-formulate the
item stems and option arrays into original questions and answers.
Wavepad Sound Editor, a free waveform editing program, was utilized for
editing and compiling all elements of the listening test items. Use of digital sound
editing enabled the creation of two, counter-balanced test forms for each round of
item piloting and calibration to counteract order and fatigue effects. Wavepad
functions of sound normalization and noise deletion provided for seamless,
professional-grade recordings on MP3 which were used for the duration of the study.
Test content. As the measurement of listening proficiency was meant to
imitate the experience of standardized English testing in Taiwan, a format similar to
the Taiwanese General English Proficiency Examination (GEPT) was devised. In
this case, the second and third parts of the GEPT listening examination, comprised of
stimulus-response items and short conversation items, respectively, were adopted.
The total length of the finalized version of the measurement instrument was 40 items,
based on the outcome of calibration studies and validity reviews of item pools.
Separate item pools for the stimulus-response and conversation items were
constructed for calibration studies. After an item pool was composed, it was sent to
three other experienced English instructors at two universities in Taiwan for review of
content validity and vetting of linguistically biased or ill-worded items, including
possible multiple correct answers in the option arrays. Item pools were finalized
based on comments from the expert reviewers. The item pool for stimulus-response
items consisted of 35 items, while the item pool for conversation items consisted of
28 items. The item pools were formulated into test format by adding a note
explaining the research purpose of the exam (in Chinese) and an example question at
the beginning of the test. These calibration tests were made into counter-balanced
forms A and B.
Calibration. The researcher visited classrooms of university students, both
English-majors and non-majors, and explained the purpose of the study and that
participation was voluntary. Participants received extra-credit from the teachers for
the respective courses. Pilot versions of the BELLA and ELLSI questionnaires were
administered prior to invigilation of the exams; examination commenced after
questionnaire forms were collected. The calibration form of the stimulus-response
exam took 10 minutes for completion, while the calibration form of the short
conversation exam took 17 minutes. Results of calibration studies and implications
for test construction are discussed under the Pilot Studies section.
Pilot Studies
Previously mentioned research regarding the construction of initial item pools for
belief and strategic knowledge indicators and reduction of said pools to
questionnaire-ready inventories described findings with respect to junior- and senior
high EFL learners. At the piloting stage for the present study, the belief inventory
was assumed two-dimensional and valid for a separate population of learners based on
findings in Nix and Tseng (2014). Meanwhile, the most reliable indicators for the
strategic knowledge inventory had been determined, while the dimensional structure
remained unspecified. And with respect to the listening exams, the item pools had
undergone the first expert content review and had been formatted into
counter-balanced test question forms and MP3 stimuli recordings. Accordingly,
piloting of the BELLA instrument involved cross-validation with a sample of
university undergraduates (the population under study herein) to ascertain structural,
metric and scale invariance. Piloting of the ELLSI instrument involved specification
and identification of the dimensional structure and gender DIF tests of metric and
scale invariance with samples of university undergraduates. Piloting of the listening
exam items involved item calibration, gender DIF analysis, and removal of items with
poor characteristics. These pilot analyses were conducted in the fall of 2013 and
spring of 2014 at a single university in eastern Taiwan.
Validating Measurement Model of Beliefs
In this section, descriptions of piloting procedures and data analysis specific to
the application the BELLA inventory on university undergraduates are provided. As
shown in previous analyses, the EFL listening learning beliefs of young adult learners
fall into a two dimensional structure of Axiomatic and Praxis beliefs with 11
Axiomatic indicators and 10 Praxis indicators. It was anticipated that the Axiomatic
and Praxis dimensions would remain distinct, yet the identical structure of 11 and 10
indicators, respectively, could not be assumed valid when applied to a relatively
mature population. In portions of the Chapter 2 literature review it was noted that
beliefs, though stable as trait variables, are still malleable over time. Hence, the
imperative to cross-validate the BELLA questionnaire with a representative university
undergraduate sample. The sample in this pilot study was comprised of 315 (175
females, 140 males) university undergraduates of mixed major of study, including
English majors, at a single public university in eastern Taiwan.
Results of BELLA Pilot
To accomplish cross-validation of the BELLA inventory, responses from a
sample of 315 university undergraduates of mixed major of study were collected.
EMIRT analyses compared unidimensional, two-dimensional and three-dimensional
variants. EMIRT analysis disconfirmed the presence of a third dimension as the
model was unidentified without constraints on two items loading on the second and
third dimensions, and seven items significantly cross-loaded on all three dimensions,
yielding no interpretable solutions. In contrast, the two-dimensional structure
exhibited only four significant cross-loading items which were still in line with the
structure identified in previous research with young EFL learners. Therefore, the
results of this EMIRT suggested that two dimensions were present as latent traits in
the subjects, yet, the structure of the loadings had altered such that more items
reflected Axiomatic beliefs and fewer items reflected Praxis beliefs.
EMIRT results showed that the Axiomatic dimension of listening learning beliefs
may be reflected by items one to 13, or possibly one to 14, as both items cross load
between dimensions. Item 13, “Listening to any English materials that I like helps
me learn English,” significantly cross-loads on both dimensions, rendering the
empirically based structure ambiguous, as it had previously described a Praxis belief
concerning study or practice habits. As seen in Table 3, item 14, “Listening to
material produced by native speakers is best for learning English,” which apparently
describes Praxis beliefs, also loads on the Axiomatic dimension. A case could be
made for partitioning the observed variables as 14 indicators of Axiomatic beliefs
versus the remaining seven Praxis indicators, which are completely focused on
strategy use. In this configuration, the Praxis dimension would exclusively reflect
individuals’ beliefs on listening learning strategy implementation. Therefore, the
objective of follow-up confirmatory MIRT was to compare the original structure,
model 2Do, against both the novel structures and unidimensional model as
alternatives.
Figures 4 to 7 provide a graphical summary of the current hypotheses: Model 1D
represents a default, parsimonious unidimensional model, model 2Do represents the
two-dimensional structure identified in previous research with EFL learners at the
secondary level of education, model 2D1 represents the first two- dimensional
structure hypothesized on the current EMIRT analysis with tertiary level EFL learners,
and model 2D2 represents the second two-dimensional structure hypothesized on the
current EMIRT analysis with tertiary level EFL learners. The default unidimensional
model is a parsimonious structure where listening learning beliefs lie along a single
continuum of variation. The various two-dimensional structures posit listening
learning beliefs as measurable along two independent, but related continua, Axiomatic
beliefs and Praxis beliefs, with the latter contingent on the existence of the former.
Results of confirmatory MIRT suggest the novel two-dimensional structures, 2D1
and 2D2, to fit the data significantly better. Results of EMIRT are shown in Table 3
and model fit indices in confirmatory MIRT are shown in Table 4. It was found that
both of the novel BELLA structures reflected the underlying constructs better than the
original structure identified in prior research, with the 2D2 modelexhibiting optimum
values across all the model fit indices.
Figure 4. The unidimensional default Model 1D.
Figure 5. The two-dimensional model previously identified, Model 2D
o.Figure 6. Hypothetical alternative two-dimensional structure, Model 2D
1.Figure 7. Hypothetical alternative two-dimensional structure, Model 2D
2.Table 3.
Two –Dimensional BELLA Structure in EMIRT with Undergraduates (n = 315)
Axiomatic Praxis