• 沒有找到結果。

A Mathematical Approach to Investigate the Relationship between Association Memory and Latent Semantic Analysis in English and Chinese

N/A
N/A
Protected

Academic year: 2021

Share "A Mathematical Approach to Investigate the Relationship between Association Memory and Latent Semantic Analysis in English and Chinese"

Copied!
6
0
0

加載中.... (立即查看全文)

全文

(1)

A Mathematical Approach to Investigate the Relationship between Association

Memory and Latent Semantic Analysis in English and Chinese

Ming-Liang Wei¹ (n26011623@mail.ncku.edu.tw), Chung-Ching Wang¹ (n26011623@mail.ncku.edu.tw),

Yen-Cheng Chen² (jeff790320@gmail.com), Yu-Lin Chang³ (gtyulin@gmail.com),

Hsueh-Chih Chen³ (chcjyh@ntnu.edu.tw) , Jon-Fan Hu² (jfhu@mail.ncku.edu.tw)

1 Department of Electrical Engineering, National Cheng Kung University, Tainan, 701 Taiwan

2 Department of Educational Psychology, Tainan, 701 Taiwan

3 Ministry of Education, National Taiwan Normal University, Taipei, 106 Taiwan

Abstract

Certain previous researches attempted to characterize how association memory works for items. LSA (latent semantic analysis) is usually showed highly related to forward association memory. A naïve postulation would assume that the mechanisms for that relationship is mainly due to semantic similarity. The present work proposes that the linkage between LSA and association memory could be built on classical conditioning itself. The assumption is proven by analyzing the degree of divergence of association networks in the English and Chinese word database. Association memory of low frequency words is not found correlated to LSA in English, but correlated to scenario situations or proverbs. The results also showed that the reciprocal of divergence degree indicates the correlation of LSA and association memory. Finally, the mechanism of classical conditioning can be used to explain how the association memory is formed, and why the strength of constructing condition responses of classical conditioning from context is featured by LSA.

Keywords: latent semantic analysis (LSA), association

memory, classical conditioning, semantic memory, word database

Introduction

When a sound or picture simultaneously stimulated our vision and auditory senses, the prior language experience induced the association memory (Nelson, & McEvoy, 2007). The association memory becomes an induced process from preexisting experiences or concepts. Past work proposed that association memory should be defined by characterizing the trace formed from available structures based on the descriptive properties of association memory items. For example, a statistic approach is proposed by neuron-dynamics (Amari, 1988, 1989). Further, a statistical bi-correlation system is applied for a recalling system. However, the recalling system there is only described as the

association between similar concepts. Next to the

neuron-dynamics and recalling system, the idea of Small World Property is then proposed to explain the processing of association memory (Davey, Calcraft, & Adams, 2006; Bohland, & Minai, 2001). Due to the working principles of Small World, higher association strength words can go through smaller paths but the word pairs with lower association cannot follow the Small World principles. However, one major problem of Small Word analysis

resides in the solid network of Small World structure. With that limited capacity of solid work analysis, the sematic similarity property of association memory for association memory is, therefore, proposed (Steyvers, Shiffrin, & Nelson, 2004). Having similar sematic features is one of a property of association memory, but association memory moreover depends on situational stimulations. Related stances claimed that many associative processes were induced by the familiarity of concepts, paired images, and casual-effect pairs. However, it cannot sufficiently describe the mental processes of forming association memory. Hence, finding out the mental processes of association memory remains an uncovered issue.

Nowadays, LSA is proposed to predict the outcomes of association memory but it has not been satisfactorily detailed yet (Nelson, 2004). In the present work, the LSA has been adapted to predict the products of association for items with high and low word frequency. Further, how LSA works with association memory in terms of classical conditioning is also discussed. Previous work only used English materials for analyses. In contrast, the present work collected Chinese association memory dataset and provided the comprehensive comparisons for the correlation between LSA and association memory data for Chinese tests. And finally, the differences between English and Chinese are further examined for revealing the dissimilarity and similarity between Chinese and English.

LSA

LSA (Latent semantic analysis) features the potential semantic information of words or articles from contexts. LSA calculates the degree of how two words with similar meaning will appear together. In other words, LSA is the measurement of the similarity between two words or their co-occurrence in context. (Naptali, Tsuchiya, Nakagawa, 2009)

Free Association

Various types of measurements for memory structure were proposed, such as knowledge structure, concept cluster, word net structure, and word similarity. However, these measurements cannot provide direct evidence to probe association memory. Free association memory is recognized

(2)

as a direct measurement of association memory. In a general test, the study participants are asked to write down the target word that they can associate in mind after a cue is presented. Based on the free writing experience induced by a hint, the memory performance could be used to detect the differences of association memory between cultures. Further, the performance of free association memory can also be directional by testing the co-occurrence of words. The word association strength measured from cue to target usually differs to the strength from target to cue.

Direct evidence. Although free association gives a direct

evidence to measure the strength of association memory, it represents the probability of association induced by cue to produce target. In a free association test with cues, free association memory is reflected by words reported by a given cue. Free association strength can be obtained by calculating the probability of reported words induced by a given cue.

Culture differences. Free association also reflects the

differences between cultures and languages. The connections between two words are constructed by life experiences and language use. By asking participants to write down the associated words, culture differences and grammar specificity can be reflected in free association measures.

Directional Association Memory differences. As stated

above, memory association has directional property. Forward association strength from cue to target is not the same as backward association strength. It is necessary to differentiate directional association strengths based on forward and backward associations.

Assumptions of how LSA relates with free

association

This section explains the reason that LSA can predict free memory association based on the mechanism of classical conditioning. If a single word-A (US- unconditional stimulation) had been presented, the concept of the word would be induced as well (UR- unconditioned response). Then the co-occurrence of word-B (CS-conditional stimulation) is presented to accompany with word-A for the response: concept A. By repetitive presentation of both words, the CS (word-B) would become US (concept-A). In other words the cue (word–B) becomes associative with concept A. The classical assumption on LSA is shown as Figure 1.

The routine stimulation of word-A and word-B can enhance the forward association of word-B and concept-A. Therefore, the co-occurrence of word-A and word-B can be characterized by LSA to describe the similar word uses in real contexts.

Word A (US) → Concept-A(UR) Word A (US) + Word B (CS) → Concept-A(UR) Word B (US) → Concept-A(CR)

Figure 1 explains the simple condition of association rules. In real life situation for language use, a cue can be conditioned with many targets. A single cue will be connected to more than one target, depending on the phase of word user’s mental states of conceptual processes. In contrast, the target can be associated with more than one cue afterwards:

CueA + CueC → Concept-A CueB + CueC → Concept-B CueC → Concept-A or Concept-B

If the conditioning scenario is disturbed by other presentation of word, the construction of association memory would be weakened. Moreover, the correlation between LSA and association memory will be decreased by the divergence of various cues. The present study assumes that the classical conditioning plays an important role on the relationship between LSA and association memory. The following sections are aimed to investigate the possible common ground for the two fronts from a mathematical perspective by adopting related databases.

Method

The evidence that LSA implies association by classical conditioning resolved the difference of correlation by analyzing association difference in English and Chinese. The Method section is separated into three 3 parts. The First part is to analyze the scatter of LSA-association memory samples. Linear distribution boundary of given samples is to find the boundary condition of clustering. The second part is to compare the correlation between LSA and memory association for high and low cue frequency and in English and Chinese. The third part is to calculate the divergence degree of association network.

Linear distribution boundary

Boundary for distribution edge is to find the boundary of association memory cluster in LSA-association memory plane. This work adapted the linear distribution boundary. The simplified formula of slopeαof boundary line is shown as below:

δ=α¯¹yi-xi

F(α)= Σ[δexp(-δ²/σ²)] α={α|α→ F(α)=MAX{F}}

The X is a score of LSA, and Y is the association strength. The σ is resolution argument to detect the boundary, and σ² is dependent on density of samples. The intercept of simple boundary line is assumed to 0. The boundary of cluster is the condition of classifier of predictor. The distribution boundary showed the boundary conditions. However the linear predictability of LSA is needed to be proven by correlation.

(3)

Correlation between LSA and association memory

The Second is to compare the Pearson’s correlation between LSA and association memory strength in four conditions. These four conditions were combination of high/low frequency of target and cue. Each condition was discussed in Chinese and English. The dependent variable is the Pearson correlation. The Pearson correlation is shown as below:

Rxy=(nΣxiyi-ΣxiΣyi)/((nΣxi2-(Σxi)2)(nΣyi2-(Σyi)2))

X is the score of LSA, and Y is the score of Pearson correlation of association memory strength. The difference of correlation is caused by divergence. Hence the divergence of cues associated from multi-target is needed to be analyzed further.

Divergence degree of association network

To analyze the frequency of multi-associated target, divergence of cue in association network should be compared. Association network is proposed to find divergence degree. The association network is constructed by cue and target. Cue and target were nodes of association network. The edge is directed from cue to target. The divergence degree is a property describing the frequency that cues associated with multi-target. Divergence degree featured the average of associational target of each cues. In other words the divergence degree featured the degree of edges emitted from cues. The definition of divergence degree is shown as below:

Dd=ΣE/ΣT

E is the total edges emitted from cues, and T is the counts of cue nodes.

Materials

Both free association memory and LSA data were collected from database. The Chinese association memory database is adopted from Chen (1999). The Chinese association database contained 1200 cues. The cues were separated into 6 groups which are combinations of high/mid/low concreteness degree, and high/low frequency. Each cue consisted of associated words, counts of associated words, sample size, word frequency, emotion quotient, concreteness degree, percent of common usage, and percent of individual usage. The sample size of Chinese database is 200. Free association database of English is from Nelson (2004) containing forward and backward strength, concreteness degree, and sample size. The English association memory database is composed of 126793 pairs. The sample size of English database is around 150.

The concreteness range of both English and Chinese were selected at least 50. The high frequency is defined more than 200 counts in article database, and the low frequency is less than 100 counts in article database.

The LSA of English and Chinese is calculated from web servers. English LSA is proposed by Landauer, Foltz, Laham (1998). The reference article database is selected by “General reading up to 1st year reading“. The Chinese LSA

is proposed by Chen, Wang, & Ko (2009). The referent context is based on the Academic Sinica Balance Corpus of modern Chinese.

Results

The result firstly displayed the scatters of LSA and association memory to preliminarily analyze their relations. The second part is to analyze the correlations of each condition, and the correlations were compared with the structure of association network.

Boundary analysis

The scatter of LSA and memory association is shown in Figure 1. The scatter plots were separated into three conditions. These were all pair conditions for high cue frequency and low cue frequency.

All samples. These distributions were all sample scatter. In

Figure 1, top left and right, the circles were the samples with word frequency >60. Because the classical conditioning were depended on high frequency of co-occurrence in the context, the frequency >60 is labeled and taken as filter. The samples below the boundary were assumed to be the multi-cues stimulations, which were more reality and led to weaken association memory strength. The result showed that the LSA-memory association strength sample below the boundary condition were with higher density than those beyond the boundary. Strength of association memory showed a trend to locate below the boundary line. Compared with English, the memory association of Chinese tended to be below the boundary line. The association memory in Chinese is a constructed conditional response by context and is a multi-target conditioning response.

High frequency cues. The samples were the cue with

frequency > 200. The result showed the samples without labeling circle, high frequency samples, were not related to LSA score. Theoretically, the LSA is low when two high frequency words have high co-occurrence with other words. For example “city” and “town” were both high frequency words but showed a low LSA score. Top 20 LSA words of each “city” and “town” were not contained “town” or “city”. However, the two high frequency words with similar property were having higher property to build similar situation stimulation. Hence the two words with high frequency would have low LSA score, and the memory association strength will not be related to LSA score.

Low frequency cues. The samples are the cues with

frequency <60. The memory association strength of samples without labeling circle is not related to LSA in English. In contrast, the memory associations of labeled samples, target with high frequency, were below the boundary. The divergence of cues with low frequency is low. Low divergence of cue presents high specificity of target. With high specificity of association memory from cue to target, co-stimulation is the major basis to construct conditional response. In other words, association memory of low-frequency cues is constructed by co-occurrence of two words in context.

(4)
(5)

Figure 2: LSA correlation and divergence degree

Divergence degree

The co-occurrence of cue and target in context is the status of the simple conditioning stimulation in which the target only builds conditioning by one cue. Figure 1 showed the association strength of all samples is below the boundary line. The correlation of LSA is decreased by the divergence degree of cues. Hence, the divergence degree of association network should be analyzed for examining the relationship. Divergence degree is an average edges emitted out from cues. Figure 2 showed that the reciprocal of divergence fully depicts the correlation between LSA and association memory in each condition of word frequency.

Language Difference. When frequency of cue is low and

frequency of target is high, the predictability of LSA is high in Chinese in Figure 2. In contrast, the correlation is low in English. The low frequency word in English is not conditioning by co-occurrence in context but in a specific situation. For example the cue “haystack” had high forward association strength with “needle”, but the two words were not related to each other. However “Needle in a haystack” is a proverb in English. In the specific situation the low frequency cue associate the LSA target with low frequency.

Language similarity. In both Chinese and English, the

associationbetween cue with high frequency and target with low frequency is not related to LSA shown in Figure 2. It is revealed that the low frequency words almost followed the high frequency words, but the target with high frequency co-occurs with other words.

Divergence degree is related to correlation. Figure 2

showed that the association memory strength is related to the 0.5/divergence degree in four conditions of word frequency. These four conditions were combinations of high/low frequency with cue and target. The divergence degree featured the predictability but the association strength. In other words, the predictability of LSA predicting memory association strength is depended on the divergence degree of association network.

Conclusion

The LSA depicted the forward association memory, but the mechanisms responsible for how LSA relates to memory association is not discussed in previous works. This study proposed an assumption that the association memory is constructed by classic conditioning. The unconditional stimulation is the co-occurrence of two words: cue and target. The concept of target constructed conditional response by the cue. The LSA is the score to depict the status of co-occurrence. The higher the LSA score, the higher the co-occurrence. With higher co-occurrence of cue and target, the stronger the memory association between cue and target is therefore built. However, the LSA can only depict the simple unconditional stimulations of classical conditioning. With the limitation of multi-cue stimulation, the co-occurring stimulation features, that is, LSA will not be related to forward association strength. To prove multi-cue stimulation decreasing with the predictability of LSA, this work analyzed the divergence of cues. This work compared the 0.5/association to the correlation between LSA and memory association. The results showed that the reciprocal of divergence degree is proportional to the correlations. That is to say the divergence degree implies the correlation between LSA and association memory.

Acknowledgments

This research is partially supported by the “Aim for the Top University Project” of National Taiwan Normal University (NTNU), sponsored by the Ministry of Education, Taiwan, R.O.C. and the “International Research-Intensive Center of Excellence Program” of NTNU and National Science Council, Taiwan, R.O.C. under Grant no. NSC 103-2911-I-003-301. NTNU also provided Chinese association memory database for the study. Finally, cordial thanks to all staff of “Language Acquisition and Cognitive Development Laboratory” for offering many academic supports and plenty of research resources.

References

Amari, S. I. (1988, July). Statistical neurodynamics of

various versions of correlation associative memory. In

Neural Networks, 1988., IEEE International Conference on (pp. 633-640). IEEE.

Amari, S. (1989). Statistical Neurodynamics—Associative

Memory and Self-Organization. In Cooperative Dynamics

in Complex Physical Systems (pp. 239-248). Springer Berlin Heidelberg.

Bohland, J. W., & Minai, A. A. (2001). Efficient associative

memory using small-world architecture. Neurocomputing,

38, 489-496.

Chen, S. Z., (1999), creating and analyzing Chinese of word

association norm, the Chinese Psychological Society

thirty-eighth session of the year - will, Taipei: National Taiwan University Department of Psychology, September 17, 1999 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Cue:H Target:H Cue:H Target:L Cue:L Target:H Cue:L Target:L c or rel ati on & s c or

es Chinese CorrelationChinese (0.5/Div.) English Correlation English (0.5/Div.)

(6)

Chen, M. L., Wang, H. C., & Ko, H. W. (2009). The

construction and validation of Chinese semantic space by using Latent Semantic Analysis. Chinese Journal of

Psychology, 51(4), 415–435.

Davey, N., Calcraft, L., & Adams, R. (2006). High capacity,

small world associative memory models. Connection

Science, 18(3), 247-264.

Landauer, T. K., Foltz, P. W., & Laham, D. (1998). Introduction to Latent Semantic Analysis. Discourse

Processes, 25, 259-284.

Naptali, W., Tsuchiya, M., & Nakagawa, S. (2009). Word

co-occurrence matrix and context dependent class in lsa

based language model for speech recognition.

International Journal of Computers, (1).

Nelson, D. L., & McEvoy, C. (2007). Entangled Associative

Structures and Context. In AAAI Spring Symposium:

Quantum Interaction (pp. 98-105).

Nelson, D. L., McEvoy, C. L., & Schreiber, T. A. (2004).

The University of South Florida free association, rhyme, and word fragment norms. Behavior Research Methods,

Instruments, & Computers, 36(3), 402-407.

Mitchell, A. A. (1982). Models of memory: Implications for

measuring knowledge structures. Advances in Consumer

Research, 9(1), 45-51.

Pavlov, I. P. (1927). Conditioned Reflexes. An Investigation

of the Physiological Activity of the Cerebral Cortex...

Translated and Edited by GV Anrep. London.

Steyvers, M., Shiffrin, R. M., & Nelson, D. L. (2004). Word

association spaces for predicting semantic similarity effects in episodic memory. Experimental cognitive

psychology and its applications: Festschrift in honor of Lyle Bourne, Walter Kintsch, and Thomas Landauer, 237-249.

數據

Figure 1. LSA- association memory strength scatter plot
Figure 2: LSA correlation and divergence degree

參考文獻

相關文件

 Promote project learning, mathematical modeling, and problem-based learning to strengthen the ability to integrate and apply knowledge and skills, and make. calculated

Now, nearly all of the current flows through wire S since it has a much lower resistance than the light bulb. The light bulb does not glow because the current flowing through it

If care was not taken to distinguish between the categories of texts, there would be a danger of describing Chinese mathematical thought solely in terms of ‘Chinese didactic

Using this formalism we derive an exact differential equation for the partition function of two-dimensional gravity as a function of the string coupling constant that governs the

• Contact with both parents is generally said to be the right of the child, as opposed to the right of the parent. • In other words the child has the right to see and to have a

The differential mode of association: Understanding of traditional Chinese social structure and the behaviors of the Chinese people. Introduction to Leadership: Concepts

● the F&amp;B department will inform the security in advance if large-scaled conferences or banqueting events are to be held in the property.. Relationship Between Food and

▫ Not only the sentences with high importance score based on statistical measure should be considered as indicative sentence... Proposed