Methods - Experiment 2: The Role of Personal Word Frequency in Lexical Decision

Chapter 4 Experiments on the Individual Differences of Lexical Behaviors

4.2 Experiment 2: The Role of Personal Word Frequency in Lexical Decision

4.2.1 Methods

Prior to collecting occurrence frequencies of LDT stimuli in each subject’s Facebook data, it should be known that which stimuli did appeared in the Facebook data. Therefore, all of 16 participants’ Facebook data were joined altogether into a file at first. If an LDT word stimulus appeared at least once in the file, it was chosen to be examined in this experiment.

In total, there were 218 stimuli conforming to the criterion. The chosen words were marked with asterisks on the left side of rows in Appendix B. For a quick glimpse of the characteristics of those words, their distribution in word variables including types of word frequency, sense number, character number, and neighborhood size is shown in Tables 4.6 - 4.9.

After the stimuli selection, personal word frequencies of the stimuli were automatically counted. The counts of a few stimuli are exemplified in Table 4.10. Note that the figures in different columns could not be directly compared yet. For instance, according to the table, Subject 1’s word frequency for 眼睛 (yian3-ching1) ‘eyes’ was 1, and this was the same

Table 4.6: Counts of Experiment 2 stimuli with different frequency types

Frequency Type High Mid Low

Count 131 26 61

Table 4.7: Counts of Experiment 2 stimuli with different sense numbers

Sense Number 1 2 3 4 5 6 8

Count 81 83 23 20 7 2 2

Table 4.8: Counts of Experiment 2 stimuli with different character numbers

Character Number 2 3

Count 200 12

Table 4.9: Counts of Experiment 2 stimuli with different neighborhood sizes

Neighborhood size 1-20 21-40 41-60 61-80 81-100 101-120

Count 30 43 32 23 22 12

Neighborhood size 121-140 141-160 161-180 181-200 201- 900 --

Count 25 7 6 3 9 --

for Subject 2. Given that the two frequency counts were from two different corpora (i.e. the personal corpora of Subject 1 and Subject 2), it could not be claimed that the two subjects utilized the word 眼睛 (yian3-ching1) ‘eyes’ to the same extent. In particular, as reported in Section 3.2, the total token numbers across participants’ Facebook data varied remarkably.

This experiment therefore adopted two distinct methods to normalize the frequency counts, intending to see whether one of them was more effective than the other. The first method is denoted as “the ratio approach,” where each subject’s word frequencies was divided by his/her own summed token number (Formula 2). In the formula, 𝐹𝑖𝑗 was the participant j’s frequency count of the ith lexical-decision stimulus; the i was limited between 1 to 218 since only 218 words in the lexical decision task were selected as stimuli in this experiment.

However, note that the i in the denominator was not limited within the range, but by n instead. The n was the number of word types in a participant’s Facebook data. In other

words, the denominator added up word frequency counts of all word types, thus representing the participant’s total token number. Consequently, the output of the formula, 𝑅_𝑖𝑗, was the participant j’s frequency ratio of the ith stimulus. Examples of 𝑅_𝑖𝑗 of the frequency counts in Table 4.10 are shown in Table 4.11.

Formula 2: Personal word frequency ratio

A potential problem of Formula 2 was that the normalized figures were affected by each participant’s token number. The token number was calculated according to the results of automatic segmentation, so it certainly would be contaminated by segmentation errors. For instance, the CKIP Segmentation System usually made mistakes by grouping a long string into a word, like “說我變瘦了”, “忘了帶家裡的”, or “大反黑是.”¹⁸ In an effort to diminish the influence of segmentation errors on normalization, the other method was adopted. This method is called as “the z-score approach,” where personal frequency counts were transformed into z-scores based on merely each participant’s word frequency counts of 218 stimuli, as demonstrated in Formula 3. Like the previous formula, 𝐹_𝑖𝑗 was the

participant j’s frequency count of the ith lexical-decision stimulus. 𝐹_𝑖 was the mean of the participant’s 218 word frequency counts, and 𝑆_𝐹_𝑖 was the standard deviation of those

18 As elucidated in Section 3.2, we did not manually check and correct the automatic segmentation results because the present study purports to develop a methodology that is not time-consuming and rather feasible for future research to compute and control the IDs of lexical behaviors.

frequency counts. Examples of 𝑍_𝑖𝑗 of the frequency counts in Table 4.10 are given in Table 4.12.

Formula 3: Personal word frequency z-score

The two types of normalized personal frequencies were finally aligned with each participant’s LDT response data in a table, then being analyzed with participants’

log-transformed response latencies in mixed-effects models.

Table 4.10: Frequency counts of several lexical-decision stimuli in each participant’s Facebook data

Note. “S” in the column names represents “Subject.”

Table 4.11: Frequency ratios of several lexical-decision stimuli in each participant’s Facebook data

Word Pinyin S01 S02 S03 S04 S05 S06 S07 S08

老師 lao3-shi1 0.003198 0.000723 0 0.000557 0.000242 0.000291 0.000699 0.000338

內容 nei3-rong2 0.00016 0 0 0 0 0.000291 0.000466 0

速度 su4-du4 8.00e-05 0.000362 0 0 0 0 0 0

語言 yu3-yan2 0.00016 0 0 0 0.000121 0.000291 0 0

時間 shi2-jian1 0.00096 0.000723 0.000933 0.001671 0.00097 0.000291 0.002096 0.001014

過程 guo4-cheng2 8.00e-05 0.000362 0 0.000557 0 0 0.000466 0

老師 lao3-shi1 0.000267 0.0013 0.001269 0.002734 0.000339 0.001663 0.003046 0.000442

內容 nei3-rong2 0.000134 0.00013 0 0.000228 0 0.000215 0 0

速度 su4-du4 0.000401 0.00013 0.000423 0 0 5.40E-05 0.00203 0

語言 yu3-yan2 0 0.00013 0 0 0 0.000268 0 0.000885

時間 shi2-jian1 0.001202 0.00052 0.001904 0.001139 0.001185 0.001663 0.00203 0.00354

過程 guo4-cheng2 0 0 0.000423 0 0 0.000161 0 0

Note. “S” in the column names represents “Subject.”

Table 4.12: Frequency z-scores of several lexical-decision stimuli in each participant’s Facebook data

Word Pinyin S01 S02 S03 S04 S05 S06 S07 S08

老師 lao3-shi1 12.38185 2.807824 -0.24629 1.409829 0.965365 1.061834 1.809352 1.430885 內容 nei3-rong2 0.333072 -0.29931 -0.24629 -0.43055 -0.46627 1.061834 1.074829 -0.32155

速度 su4-du4 0.015999 1.254256 -0.24629 -0.43055 -0.46627 -0.33262 -0.39422 -0.32155

語言 yu3-yan2 0.333072 -0.29931 -0.24629 -0.43055 0.24955 1.061834 -0.39422 -0.32155

時間 shi2-jian1 3.503803 2.807824 4.228019 5.090581 5.260255 1.061834 6.21649 4.935751 過程 guo4-cheng2 0.015999 1.254256 -0.24629 1.409829 -0.46627 -0.33262 1.074829 -0.32155 醫院 yi1-yuan4 0.650145 -0.29931 -0.24629 0.489641 -0.46627 -0.33262 -0.39422 1.430885 電話 dian4-hua4 1.601365 4.361391 -0.24629 -0.43055 3.11281 -0.33262 1.074829 -0.32155 地震 di4-zheng4 0.333072 -0.29931 -0.24629 -0.43055 0.24955 1.061834 -0.39422 1.430885 眼睛 yian3-jing1 0.015999 1.254256 1.990864 0.489641 4.54444 -0.33262 0.340307 -0.32155 情緒 ching2-xu4 0.015999 -0.29931 -0.24629 -0.43055 4.54444 -0.33262 0.340307 -0.32155 圖書館 tu2-shu1-guan3 1.284291 -0.29931 -0.24629 -0.43055 0.24955 -0.33262 1.074829 -0.32155

科技 ke1-ji4 0.015999 -0.29931 -0.24629 0.489641 0.24955 -0.33262 -0.39422 -0.32155

Word Pinyin S09 S10 S11 S12 S13 S14 S15 S16

老師 lao3-shi1 0.616947 5.152416 5.697449 7.8784 1.554071 6.05542 5.905595 0.623419

內容 nei3-rong2 0.145037 0.172593 -0.35147 0.275823 -0.38741 0.390734 -0.27402 -0.30111

速度 su4-du4 1.088857 0.172593 1.664839 -0.41532 -0.38741 -0.23868 3.845724 -0.30111

語言 yu3-yan2 -0.32687 0.172593 -0.35147 -0.41532 -0.38741 0.600538 -0.27402 1.547944 時間 shi2-jian1 3.920318 1.832534 8.721907 3.040397 6.40776 6.05542 3.845724 7.095097 過程 guo4-cheng2 -0.32687 -0.38072 1.664839 -0.41532 -0.38741 0.180931 -0.27402 -0.30111 醫院 yi1-yuan4 0.145037 -0.38072 -0.35147 -0.41532 -0.38741 -0.44848 -0.27402 -0.30111 電話 dian4-hua4 0.616947 0.725907 0.656687 -0.41532 7.378498 0.600538 1.785852 -0.30111 地震 di4-zheng4 -0.32687 -0.38072 -0.35147 1.65811 -0.38741 0.600538 -0.27402 -0.30111 眼睛 yian3-jing1 1.560767 1.832534 0.656687 0.275823 1.554071 1.43975 -0.27402 -0.30111 情緒 ching2-xu4 0.145037 -0.38072 -0.35147 -0.41532 0.583333 -0.44848 -0.27402 6.170572 圖書館 tu2-shu1-guan3 -0.32687 2.385848 -0.35147 0.275823 -0.38741 -0.02887 -0.27402 -0.30111

科技 ke1-ji4 0.145037 -0.38072 0.656687 -0.41532 0.583333 -0.23868 -0.27402 -0.30111

Note. “S” in the column names represents “Subject.”

4.2.2 Results and Discussion

Response errors in the lexical decision task (approximately 0.06% of the data set) were first screened. Due the high accuracy, we did not investigate the relationship of personal word frequency with the response accuracies, but looked into merely its relationship with the response latencies. Two types of normalized personal word frequencies (i.e. ratio and z-score) were analyzed together with the latencies by mixed-effects models. In both models, two random factors and six covariates were also included. Random factors encompassed experiment stimuli and participants. Covariates were procedure variables (i.e. block number and trial number) and word variables (i.e. types of word frequency, sense number, character number, and neighborhood size). The covariates were subsumed in order to avoid mis-attributing the variances caused by them to the effect of personal word frequency. If there was any covariate not reaching significance, which meant it statistically did not affect participants’ lexical-decision responses, then it would be removed from the analysis and the other variables refitted the mixed models.

The residuals of the two models, however, showed marked non-normality, especially at the end of long response latencies (see the upper right panel in Figure 4.5)¹⁹. To attenuate the unfitness, outliers with standardized residuals outside the interval (-2.5, 2.5) were removed.

The removed data in both the ratio and z-score models were 2.48% of the data set. After

19 Figure 4.5 shows the residuals of the model fitted by the personal word frequency ratios. The residuals of the z-score model are the same as those of the ratio model, so its residual plot is not given here.

trimming the outliers, we refitted the models. The residuals in the trimmed models were close to normality, as shown in the lower right panel of Figure 4.5. In addition, the final models also excluded one covariate, neighborhoods size, since it was not at least marginally significant (Ratio: p = .674; Z-score: p =.682).

Figure 4.5: Residual diagnostics for the model of personal word frequency ratios before (upper panels) and after (lower panels) removal of outliers

Statistical results of the final models are provided in

Table 4.13 (ratios) and Table 4.14 (z-scores). The two tables showed that nearly all covariates in the final models contributed to the response latencies in visual lexical decision (p < .001), except for the character number, which was marginally significant.²⁰ In the tables, the statistics of mid and low frequency types were presented, but no high frequency type; this was because the latter one served as the reference point for testing whether the former two types statistically reached significance. Besides, ‘estimate’ within the tables indicated the relationship between the response time and each covariate. To be more apprehensible, the estimates of variables passing the statistical tests are plotted in Figure 4.6.²¹ The upper leftmost and mid panels demonstrated that with increasing block and trial numbers, participants’ response speed decelerated. Seeing that the lexical decision task lasted around one hour, this effect of fatigue was anticipated. The effect, yet, was harmless to this experiment since its impact was taken into consideration and disentangled in our data analysis. With respect to the variable of frequency types, the upper rightmost panel illustrated that the response latency to the high type was the shortest and that to the low type was the longest. The estimate of the sense number is shown in the lower leftmost panel, which revealed that higher numbers were associated with more rapid responses.

20 As explained in Footnote 16, we provided the p-values which were calculated in Markov chain Monte Carlo sampling (MCMC sampling). The sampling can deal with both small and large data sets, thus being robust over t-test.

21 Note that the estimates for covariates in both ratio and z-score analyses were quite alike, so the figure showed only those in the ratio analysis.

Table 4.13: Statistical results of the mixed-effects models analyzing personal word frequency ratio and covariates (Response latencies)

Estimate Standard error t-value MCMC p-value

Block Number 9.502e-03 2.158e-03 4.40 0.0001

Trial Number 1.692e-04 3.743e-05 4.52 0.0001

Word frequency Mid 5.700e-02 9.581e-03 5.95 0.0001 Word frequency Low 1.517e-01 1.328e-02 11.42 0.0001 Character Number -2.700e-02 1.499e-02 -1.80 0.0530

Sense Number -1.125e-02 3.300e-03 -3.41 0.0004

Personal word

frequency (ratio) -2.364e+01 9.455e+00 -2.50 0.0096

Table 4.14: Statistical results of the mixed-effects models analyzing personal word frequency z-score and covariates (Response latencies)

Estimate Standard error t-value MCMC p-value

Block Number 9.564e-03 2.158e-03 4.43 0.0001

Trial Number 1.699e-04 3.745e-05 4.54 0.0001

Word frequency Mid 5.708e-02 9.594e-03 5.95 0.0001

Word frequency Low 1.518e-01 1.330e-02 11.42 0.0001

Character Number -2.706e-02 1.500e-02 -1.80 0.0544

Sense Number -1.121e-02 3.302e-03 -3.40 0.0004

Personal word

frequency (z-score) -6.163e-03 2.776e-03 -2.22 0.0210

Figure 4.6: Partial effects of block number, trial number, frequency type, character number, sense number, and personal word frequency (ratio and z-score) in the analysis of Experiment 2

The grey-painted rows in Tables 4.13 and 4.14 are the foci in this experiment. The statistics showed that personal word frequency significantly accounted for response latencies in both the analyses of frequency ratio (p < .001) and z-score (p < .05). The estimates in the grey rows were negative, which are visualized in the lower mid and rightmost panels of Figure 4.6. According to the figures, the negative estimates indicated that participants responded faster to stimuli with higher personal word frequencies. The experimental results revealed that IDs of frequencies of stimuli could explain individual variances between participants in lexical decision.

Words that frequently occurred in one’s Facebook data revealed the things or issues he/she paid closer attention, the words he/she got accustomed to use but was unaware of, or his/her daily-life surroundings. Therefore, the effect of personal word frequencies in this experiment was considered to result from people’s conscious or subconscious familiarity with words or concepts. The familiarity with word form and meaning facilitated the access to corresponding underlying lexical representations in the participants’ mental lexicon.

Gernsbacher (1984) verified that word frequency was not absolutely consistent with word familiarity, especially when it came to low-frequency words. Nonetheless, our results do not conflict with Gernsbacher’s findings. In her study, the word frequency referred to that in standard corpora, and word familiarity was subjectively rated by participants. In the present study, however, a participant’s word familiarity was associated with the frequency of words used by himself/herself.

Another discussion brought up in this experiment is a methodological issue of computing personal lexical behaviors. Among two types of normalization of personal word frequency counts, the ratio method was assumed to be possibly problematic since segmentation errors were involved, and the z-score method was hypothesized to be a better one. Nevertheless, the analyses of word frequency ratio and z-score both reached significance.

This indicated that normalizing frequency counts by the token number in each personal corpus is feasible even though there are segmentation errors and noise among the tokens.

Evidence can be found when we compare each participant’s total token number, which includes segmentation errors, with his token number summed from the 218 stimuli in Experiment 2, which includes no errors. The two categories of token numbers are highly correlated (r = .95), as visualized in Figure 4.7. The correlation suggests that although segmentation errors make the total token numbers of Facebook data imprecise and inaccurate, the numbers still generally reflect the comparative differences between participants’ genuine token numbers.

Figure 4.7: The correlation plot of participants’ total token numbers and their token numbers summed from the 218 stimuli in Experiment 2 (r = .95)

Chapter 5 Conclusion

在文檔中字詞辨識中個別差異之量度：個人詞彙行為之角色探究 (頁 77-92)