An intelligent questionnaire analysis expert system

(1)

An intelligent questionnaire analysis expert system

Yian-Shu Chu

a

, Shian-Shyong Tseng

a,b,*

, Yu-Jie Tsai

a

, Ren-Jei Luo

a a_{Department of Computer Science, National Chiao Tung University, 1001 Ta Hsueh Road, Hsinchu, Taiwan 300, ROC} b_{Department of Information Science and Applications, Asia University, 500 Liufeng Road, Wufeng, Taichung, Taiwan 413, ROC}

Abstract

In the questionnaire analysis, how to find a statistically significant difference between two or more groups in a continuous measure is one of the major problems in researches. However, it is difficult for researchers to solve the issue of finding possible statistically significant difference. There are two causes of this issue. The one is that the process of finding the statistically significant differences is highly depen-dent on researchers’ intuition and experience, and the other is that the original questionnaire data may not be good enough to find the statistically significant differences.

In this paper, we build a data warehouse and a forward-chaining rule-base expert system with three kinds of indicators, Increase, StepDown, and Dice, for drilling down the data warehouse to assist researchers in exploring the data to select appropriate statistics meth-ods to find possible significant differences. The prototype of this expert system has been implemented, and the results of experiment about satisfaction survey showed finding the significant difference becomes easier, and users were interested in the idea of this system. Ó 2008 Elsevier Ltd. All rights reserved.

Keywords: Statistically signiﬁcant diﬀerence; Expert system; Data warehousing; On-line analytic processing (OLAP)

1. Introduction

Social science research is the use of scientiﬁc methods to investigate human behavior and social phenomenon (Black, 1999; Punch, 2005; Schutt, 2004). Since it is impos-sible for researchers to thoroughly observe the huge popu-lation for a behavior or a phenomenon, they usually use questionnaire survey instead of investigating the whole population.

Questionnaire survey is usually done by selecting some representative samples from population according to the sampling methods. For analyzing the questionnaire data, researchers can use not only descriptive statistics methods to describe the basic distribution of the samples, but also inferential statistics methods to infer the real human behav-ior and social phenomenon.

Significance of group differences is one of the four types of the research questions (Tabachnick & Fidell, 1996). For example, in a questionnaire survey for elementary school students, ‘‘Is there a significant difference between boys and girls about the hours they access the Internet every week?” is an interesting phenomenon that researchers want to know. In the questionnaire analysis, how to find a statis-tically significant difference between two or more groups in a continuous measure is one of the major research issues.

In the traditional quantitative research, the researchers will firstly propose several hypotheses of the subject according to their experiences, and then determine whether there exists a statistical significance in some hypothesis in a try-and-error manner. The quality of the result using the manner, of course, is in accordance with the quality of the hypotheses, and the process of finding the statistically significant differences is highly dependent on researchers’ intuition and experience. For example, in a questionnaire survey of elementary school students’ Internet usage behavior, a researcher might make a hypothesis, ‘‘There is a statistically significant difference between different gen-ders about the hours they access the Internet every week,”

*

Corresponding author. Address: Department of Computer Science, National Chiao Tung University, 1001 Ta Hsueh Road, Hsinchu, Taiwan 300, ROC.

E-mail address:sstseng@cis.nctu.edu.tw(S.-S. Tseng).

www.elsevier.com/locate/eswa Expert Systems with Applications 36 (2009) 2699–2710

Expert Systems with Applications

(2)

and then use appropriate inferential statistics method according to his/her knowledge of statistics to test this hypothesis. Without making the hypothesis, the diﬀerence can not be found even if it really exists. Therefore, how to acquire the knowledge and experience of senior researchers might be helpful for the junior researchers.

Besides, granularity of original questionnaire data may not be good enough to find the statistically significant dif-ferences. For example, in a questionnaire survey of elemen-tary school students’ Internet usage behavior, if there is no significant difference between different resident regions about the hours students access the Internet every week, the researcher might conclude there is no significant differ-ence between different resident regions. However, each region may contain several counties in geography. By dril-ling down the student’s resident dimension to lower levels of granularity, it is still possible to find a significant differ-ence between different counties.

When researchers analyze a questionnaire data, they have to make some hypotheses of the possible statistically significant differences from the data and use their knowl-edge like rules to select the appropriate statistics methods to test the chosen hypothesis according to the facts about the characteristics of data, for example data scale of attri-butes. Therefore, we acquire the knowledge and experience of senior researchers to construct a forward-chaining rule-base expert system, which can assist researches in making hypotheses of the possible statistically significant differences from the data and making selections of inferential statistic methods to test researchers’ hypotheses. Since the granular-ity of original data warehouse may not be good enough to help researchers finding the possible statistically significant differences, three indicators, Increase, StepDown, and Dice, are designed and metarules are included to determine the degree of indicators. Hence, in the expert system, a signifi-cant difference viewer is constructed based on the indicators and metarules to assist junior researchers in exploring data. For the need of selecting appropriate inferential statistic methods to test researchers’ hypothesis, the expert system is also designed to give suggestions for appropriate statistic methods to test hypotheses. Since researchers may be inter-ested in understanding why these methods are sugginter-ested by the expert system and how to use them, the expert system not only gives explanations about why these methods are appropriate to analyze the data, but also provides a learning platform for researchers who want to learn these methods in more detail. Finally, a satisfaction survey has been done to get user’ satisfaction of using this expert system. The results showed finding the significant difference becomes easier, and users were interested in the idea of this system. 2. Preliminary

2.1. Quantitative research

Quantitative research techniques (Black, 1999; Punch, 2005; Schutt, 2004; Zhang, Padmanabhan, & Tuzhilin,

2004) are part of primary research and the quantified data can be collected through structured interviews, experiments, or surveys. Quantitative research is all about quantifying relationships between variables. Variables are things like weight, performance, time, and treatment. Researchers mea-sure variables on a sample of subjects, which can be tissues, cells, animals, or humans, and express the relationship between variables using effect statistics, such as correlations, relative frequencies, or differences between means.

‘‘Hypothesis Testing” is the most popular statistics method to analyze the relationship between variables. And the researchers are most concerned about the differences between means, because it can immediately and effectively indicate the causality of the subject. If some variable like the gender of students versus the hours that students access the internet per week has a difference which is a statistical sig-nificance at a given 1 aconfident level, we said that there is a significant difference between different gender of students. Furthermore, the quantitative research researches aimed to determine the relationship between one thing and another in a population. Quantitative research designs are either descriptive (subjects usually measured once) or experimental (subjects measured before and after a treat-ment). An experiment establishes causality. For an accu-rate estimate of the relationship between variables, a descriptive study usually needs a sample of hundreds or even thousands of subjects; an experiment, especially a crossover, may need only tens of subjects.

2.2. Data warehouse and indicator

Data warehouse and OLAP (Berndt, Hevner, & Studn-icki, 2003; Chang, 2006; Chaudhuri & Dayal, 1997; Inmon, 1996) model the data as multidimensional database structure (Agarwal, Agrawal, & Deshpande, 1996; Sarawagi, 2000), and multidimensional database structure views data as data cube (Chen, 2003; Datta & Thomas, 1999; Gray, Chaudhuri, Bosworth, et al., 1997; Palpanas, Koudas, & Mendelzon, 2005). The data structure of data cube is generally repre-sented in a form of star schema, snowﬂake schema, or fact constellation schema (Han & Kamber, 2001), where the star schema is the most basic one, and the other two can be derived from it. To simplify our discussion, in this paper, a data cube is represented in the form of star schema.

The data warehouse supports the analysis tool OLAP, and users can use some OLAP operations, like roll-up, drill-down, dice, etc., to explore the data cubes. However, the exploring process is not automated, and users still need to explore the data cube by her/his intuition and experience. Sarawagi (Sarawagi, Agarwal & Megiddo, 1998) proposed a Discovery-driven Exploration of OLAP Data Cubes approach, which provides the following three kinds of pre-computed indicators to assist users to explore the data cubes. SelfExp: This indicates the degree of surprise of the cell value, relative to other cells at the same level of aggregation.

(3)

InExp: This indicates the degree of surprise somewhere beneath the cell, if we drill down from it.

PathExp: This indicates the degree of surprise for each drill-down path from the cell.

However, these indicators are focused on finding the exceptions in data cubes, not statistically significant difference.

2.3. NORM

In traditional forward-chaining rule-base expert system, the rule base consists of all rules and facts. The system needs to go through every matching rule when conducting inference for the proper result. This might become ineﬃ-cient when the number of rules and facts become large. Therefore, many researches aim to improve the mainte-nance of rule-based expert system by incorporating the objected-oriented approach.

We apply the knowledge model, NORM, to represent knowledge according to object-oriented concept. It is based on the principles about how people ponder and learn to acquire knowledge and contains knowledge classes and the relations between knowledge classes, including refer-ence, extension-of, trigger, and acquire. For example, in Trigger relationship, it triggers another knowledge class with current facts as knowledge transfer. In other words, the remnant knowledge in original knowledge class should not be considered. During inferring, present inference cess of the fact collection aborts, and a new inference pro-cess will start with the triggered knowledge class.

3. Idea

There are two main issues in assisting users in analyzing questionnaire data. When users start to analyze question-naire data, they explore the data first. However the granu-larity of original questionnaire data may not be good enough to explore the data. For instance, there is no signif-icant difference between different values in higher levels of granularity. But it is still possible to find a significant dif-ference in lower levels of granularity. Therefore, the first issue is how to easily explore the full data to make hypoth-eses of the possible statistically significant differences. In addition, after users make hypotheses of the possible statis-tically significant differences, they make a hypothesis and use the inferential statistics method to test this hypothesis. Since there are several methods can be used to test hypoth-eses, the methods user used to test hypotheses may not be appropriate. Therefore, the second issue is how to select appropriate inferential statistics methods to test hypothe-ses according to the data. The expert system providing assistants for these two issues can ease users finding the statistically significant differences from their questionnaire data.

For the ﬁrst issue, we interview the experts of data anal-ysis. According to their experience and knowledge how

they find the possible statistically significant differences from the original questionnaire data, they may use data warehouse and OLAP to explore the data. However, these two tools only support general viewer of data. The experts still have to operate these tools to find the possible statisti-cally significant difference according to their experiments and experience. Therefore, three kinds of indicator, Increase, StepDown, and Dice which can be used to help compute the degree of statistically significant difference are designed according to their exploring data experience using data warehouse and OLAP. Some metarules for these indicators are also defined to determine which indicator most likely has more degree of statistically significant dif-ference. Therefore, a significant difference viewer using these indicators is designed to assist users in exploring the data cubes in data warehouse to make hypotheses of possible statistically significant differences.

As we know, ontology can be used to describe the con-cepts and the relations between these concon-cepts. For the sec-ond issue, we interview the experts of data analysis to construct ontology to represent the domain experts’ knowl-edge about statistics methods selection.

With the ontology, the knowledge components are fur-ther acquired and transformed into NORM-based rule classes of the knowledge base. A basic course scheme is also transformed from a domain ontology using the Ontol-ogy-based Learning Sequences Construction Scheme pro-posed by Chen (Chen, Tseng, Liu, Chang, & Chen, 2005) and adaptive learning sequences are generated from domain ontology, students’ learning portfolios and infer-ence chains using the Ontology-based adaptive learning sequences construction algorithm proposed by Chang (2006). These two algorithms are used to generate the same learning courses as master teachers.

4. The knowledge in intelligent questionnaire analysis expert system

4.1. Statistically signiﬁcant diﬀerence indicator

Nowadays, researchers generally firstly collect the required data, and then find and learn the appropriate functions to explore and analyze questionnaire by utilizing database, excel, or SPSS softwares, but the process is very hard. Therefore, we apply data warehousing technology and use OLAP to explore the data on-line from various views. Although OLAP is easy to explore data, the previ-ous indicators are not suitable on finding statistically sig-nificant difference. After interviewing, three kinds of indicator are proposed to assist researchers explore the data cubes in data warehouse to find possible statistically significant differences.

Increase: This shows the degree of statistically signiﬁcant diﬀerence by changing the view of an additional dimension.

(4)

StepDown: This shows the degree of statistically signiﬁ-cant diﬀerence by changing the view of the stepping down dimension.

Dice: This shows the degree of statistically significant difference by changing the view of dicing for some value. Furthermore, thresholds of these indicators acquired from the experts are used to determine if the data has the possible statistically significant differences or not. If the degree of the indicator is over the thresholds, it means there exists possible statistically significant differences. In addi-tion, the experts also define some rules to determine which indicator should be the most possible statistically signifi-cant differences if there are two or more indicators over the thresholds. Two examples are given below.

IF degree of statistically significant difference in Increase > = 0.8 THEN the Increase indicator signals for having statistically significant difference.

IF an Increase indicator signals and a Dice indicator sig-nals THEN the Dice indicator cancels the signal.

4.2. Domain ontology

In the system, a domain ontology is utilized to represent domain experts’ knowledge.

As shown in Fig. 1, the example ontology describes three parts of concepts about statistics. The upper part describes the basic concepts and relations of statistics, the middle part gives the rules of selection about appropriate statistic methods to analyze signiﬁcance of group diﬀer-ences, and the bottom part is information which contains learning materials about each statistic method. The mean-ings of the relations are explained as follows:

A Kind of: Concept class A is a kind of concept class B means that A is a kind of B. For example, data with one continuous dependent variable is a kind of signiﬁcance of group diﬀerences.

A Part of: Concept class A is a part of concept class B means that A is a part of B. For example, Descriptive Statistics is a part of Statistics.

A Strategy of: Concept class A is a strategy of concept class B means that strategy B is appropriate to analyze data with B. For example, One-way ANOVA and T-Test are appropriate to analyze data with one discrete independent variable.

Contents of: Concept class A is contents of concept class B means that A is contents of B. For example, Contents of One-way ANOVA has learning materials about sta-tistic method, One-way ANOVA.

Pre-requisite constraint: If concept class A is a pre-requi-site to concept class B, then A should be learned before B. For example, Descriptive Statistics have Pre-requisite relation to Probability, so someone had better learn Descriptive Statistics before she/he wants to learn Probability.

As shown inFig. 1, the part of the ontology which is cir-cumscribed describes experts used factorial ANOVA to analyze the data with multiple discrete independent variables.

4.3. Rules transformation

The middle part of the ontology shown inFig. 1is trans-formed into seven rule classes as shown inFig. 2used to infer appropriate methods. In each rule class, it veriﬁes

(5)

some conditions and triggers to another rule class. The conditions used for inferring appropriate methods are the number of dimensions in data which is corresponding to the dependent variable in the middle part of ontology in

Fig. 1, the number of values in each dimension which is corresponding to independent variable in the middle part of ontology inFig. 1, and the data type of each dimension. Some examples of rules are listed below:

Rule 1: IF dimension number = 4 THEN data dimen-sion = Three or above.

Rule 2: IF data dimension = Three or above THEN class trigger = Class Three Dimensions or above.

Rule 3: IF data scale = Nominal or data scale = Ordinal THEN strategy = T-Test.

Rule 1 and Rule 2 are rules in the Statistics Methods rule class, and Rule 3 is a rule in the Two Values rule class. 5. Architecture of intelligent questionnaire analysis expert system

5.1. Data preprocessing

Before importing the questionnaire data, some related legacy database about geographical data, population data, etc. are required to be integrated ﬁrstly into a data warehouse.

In Fig. 3, there are two processes to integrate the imported data. First, the system import user’ data based on the metadata user imported with the questionnaire raw data. The metadata describes the formats of the ques-tionnaire data, for example, data type, scale, and hierarchy of attributes. Second, the system integrates the imported data and the other legacy data. This process is

accom-plished through the tool, Microsoft SQL Server 2005. Finally, the integrated data is stored in the data warehouse. 5.2. Knowledge transformation

To build up the experts’ knowledge in our expert sys-tem, the ontology needs to be transformed and stored into the knowledge base and content package repository through three processes. As shown in Fig. 4, the Knowl-edge Acquisition process transforms the ontology to the rule classes. There is an example in Sections 4.2 and 4.3. We transform the example ontology in Fig. 1to the rule classes in Fig. 2.

In order to provide learning sequences for each statistics method in the system, we reﬁned the OALSC algorithm (Chang, 2006). The Ontology-based adaptive learning sequences construction algorithm is used to generate adap-tive learning sequences. The inputs of the algorithm are domain ontology, students’ learning portfolios and infer-ence chains. The outputs of the algorithm are adaptive learning sequences. The main idea of OALSC algorithm is to integrate inference chains containing the suggested information of statistics methods with speciﬁc ontology consisting of statistics methods nodes and the related learn-ing concept nodes, and the students’ learnlearn-ing portfolios are used to make the learning sequences more adaptive through the OALSC algorithm.

The reﬁned OALSC process generates adaptive learn-ing sequences for each statistics method from the ontol-ogy consisting of statistics methods nodes and the related learning concept nodes. These learning sequences are stored in content package repository. As shown inFig. 5, an adaptive learning sequence for One-way ANOVA can be generated from the upper part of the example ontology in Fig. 1.

We also retrieve the contents of each statistics method in the ontology to get the materials of learning courses and the contents are stored in content package repository. In

Fig. 6, we retrieve the contents of One-way ANOVA from a rectangle node in the ontology inFig. 1.

5.3. System architecture

The architecture of the expert system shown inFig. 7is detailedly introduced as follows.

Fig. 2. Rule classes in the knowledge base of the system.

Import Data Data Integration Data Warehouse Questionnaire Raw Data Metadata Legacy Data Questionnaire Data Integrated Data

(6)

In order to assist users in exploring their questionnaire data, some results of descriptive statistics about the data, e.g., mean, standard deviation, can be provided via the data warehouse API, and OLAP is also provided to assist users in viewing the distributions of the data, to obtain the analysis reports.

Significant Difference Viewer is constructed to assist users in finding the possible statistically significant differ-ences. It generates three kinds of indicators based on the metarules and uses different degree of color to represent different degree of the possible statistically significant dif-ferences. It also provides functions corresponding to the three kinds of indicators to change view of the data.

Because users need suggestions to determine which sta-tistics methods are appropriate to test the hypotheses from experts, an inference engine is constructed to use the knowledge in the knowledge base and to infer the appropri-ate statistics methods like experts. The explanations about these results are also provided according to the rules used in the inference process.

Since users may be interested in understanding how to use them, a learning platform is also constructed to provide learning courses. In order to provide courses adapted to each user, the OALSC algorithm is used to generate the adaptive learning sequences based on the students’ proﬁle, ontology, and the method of inference results.

One-Way ANOVA

---A One-Way ---Analysis of Variance is a way to test the equality of three or more means at one time by using variances. Assumptions

The populations from which the samples were obtained must be normally or approximately normally distributed. The samples must be independent.

The variances of the populations must be equal. Hypotheses

The null hypothesis will be that all population means are equal, the alternative hypothesis is that at least one mean is different.

In the following, lower case letters apply to the individual samples and capital letters apply to the entire set co

...llectively. That is, n is one of many sample sizes, but N is the total sample size.

Fig. 6. The learning course about One-way ANOVA.

Knowledge Base (NORM) Content Package Repository Rule Classes Adaptive Learning Sequences Courses Ontology of Data Analysis Experts Knowledge Acquisition OALSC Algorithm Construct

Fig. 4. Knowledge transformation module of system.

Content Package Repository Adaptive Learning Sequences OALSC Algorithm Student Learning Profiles Knowledge Base (NORM) Inference Engine (DRAMA) Facts Inference Results Learning Users Data Warehouse Ontology of Data Analysis Learning Platform (SCORM RTE) Metarules Rule Base Significant Difference Viewer

User Interface Module

Data Warehouse AP1

Fig. 7. System architecture.

Statistics Descriptive Statistics Inferential Statistics ANalysis Of Variance One-way ANOVA

(7)

6. Implementation

We have implemented a prototype of the system using the following tools:

Data warehouse and OLAP

Microsoft SQL Server 2005 is used to help us construct data warehouse and OLAP, because it can provide friendly user interface for building data warehouse and OLAP.

Ontology

Prote´ge´ is used to build the ontology for the system, because it can provide complete functions to build ontol-ogy, and can generate an XML ﬁle about ontology.

Knowledge base and inference engine

DRAMA is used to construct knowledge base and infer-ence engine in the system. DRAMA is a rule-based, client-server tool/environment for KBS development. It can assist knowledge engineers in building up an expert system. Brieﬂy, DRAMA contains lots of innovative techniques including object-oriented technology, knowledge inheri-tance, etc. It also contains useful tools, like rule veriﬁcation tool, knowledge acquisition assistant tool and the inference server.

Learning platform

As we know, SCORM standard is most popular stan-dard of learning materials. Hence, learning platform is con-structed by SCORM RTE 2004.

User interface

Microsoft Visual Studio 2005 ASP.NET with C# is used to construct user interface.

The following ﬁgures are given to illustrate the opera-tions of the system.

After log-in the system, it shows the main page to users inFig. 8. Users can use functions by choosing the menu at the left part of the page.

First of all, we choose the Descriptive Statistic node to get descriptive statistic of our data here. We use the drop down list on the top to select the Health improve life style and health education course requirements questionnaire survey to analyze, and then select a basic data, Health Sit-uation, and a question, Stress Management, to analyze. We press the Observe button to get the results shown inFig. 9. Second, we choose Explore Questionnaire Data node to get statistical tables or diagrams about our data. Similar to Descriptive Statistic function, we select the questionnaire ﬁrst, and add an attribute, Health Situation, and a question in our data, Stress Management, to the table and the dia-gram respectively to get the analysis results as shown in

Fig. 10.

Third, as shown inFig. 11, we choose Significant Differ-ence node to get assistants for analyzing significant differ-ences in our data. It is similar to the two functions above; we choose our questionnaire and the question, Stress Management, to analyze first, and then we press the Observe button to begin analyzing. The system displays the analyzed results using different color. We compare the color of each cell in the table to see where might have sig-nificant differences. We can also click one cell to get further analysis. We press Suggestion for Appropriate Analysis Methods button to get suggestion from the system to ana-lyze these data here.

Fig. 12suggests the appropriate statistics methods. We can get information about why system suggests these meth-ods by pressing the Detail button or the learning materials about the method by pressing the Learn button. Figs. 13 and 14show the explanation and learning materials about One-way ANOVA, respectively.

7. Experiments

Two experiments are designed to evaluate if the system really provides users assistants for questionnaire analysis or not. The ﬁrst experiment is a case study to verify if

(8)

the analysis results and suggestions from the system are correct and appropriate, and a satisfaction questionnaire survey is designed in the second experiment to get satisfac-tion of users.

7.1. Case study

The questionnaire of the case study is the Health improve life style and health education course requirements questionnaire survey, which was proposed by a Sanitary and Health Care Center, to analyze students’ life style for planning sanitary and health activities in the future. It has two parts in the questionnaire, basic data part and life

style part. There are 19 attributes in the basic data part, e.g., gender, department, and score, and 24 questions in the life style part. Each question belongs to one of the items designed to be analyzed and the items are self-actualiza-tion, health responsibility, sports, nutriself-actualiza-tion, interpersonal relationship, and Stress Management. There are 1203 records in the database.

In Table 1, the analyzed results by the system shows schoolwork pressure, health situation, and comparison with same ages may have signiﬁcant diﬀerences in the self-actualization item. To analyze the attribute, school-work pressure, the system suggested One-way ANOVA and provided a learning sequence and learning course

Fig. 9. Some descriptive statistic of data computed by the expert system.

(9)

Fig. 11. Possible signiﬁcant diﬀerences analyzed by the expert system.

Fig. 12. Suggestion of appropriate methods from the expert system.

(10)

about One-way ANOVA for the junior researchers, Statis-tics -> Descriptive StatisStatis-tics -> Inferential StatisStatis-tics -> Analysis Of Variance -> One-way ANOVA as shown in

Fig. 5and (Fig. 6) respectively.

We compared these results with the results using T-test and One-way ANOVA. Most attributes really have signifi-cant differences. Some attributes, which are verified by T-test or One-way ANOVA, do not have significant differ-ences inTable 1, and we marked them a star sign after them. After analyzing the data, the reason why the results are ferent is because the numbers of records have significant dif-ference between different groups, and this can be improved by remodeled the indicators in the future. The researchers in the Sanitary and Health Care Center agreed those suggested methods are appropriate to analyze these data.

7.2. Satisfaction survey

A questionnaire survey was designed to evaluate the user satisfaction of this expert system. There are 3 parts total 11 questions in this questionnaire: system usage related 5 ques-tions (Q01–Q05), expected funcques-tions 4 quesques-tions (Q06– Q09), and total evaluation 2 questions (Q10–Q11). Each question was measured by the 5-point Likert scale ranging from 1 (strongly disagree) to 5 (strongly agree).

Two groups of people were involved in the experiment. The first group was composed of senior quantitative researchers, including 2 teachers who taught quantitative research class and 3 senior social science researchers. The second group was composed of 17 students who were studying the quantitative research class. The process of the experiment contains two steps. First, they used the expert system by a given data warehouse which will be described later to find all the significant differences in the data warehouse. Second, after using the system, they did the questionnaire.

A data warehouse was built for this project using the data in the first experiment, and the questionnaire data and some legacy data were integrated into a data cube. The data warehouse was given for these 28 persons, and hoped them find significant differences from this data ware-house. After using the system, a questionnaire survey was taken to measure the satisfying degree of this system.

The experiment results are shown inTable 2. The mean points of each question in the first group are lower than the mean points in the second group. The questions with lower mean values (smaller than 3.000) in the first group are Q01 (easy to use), Q02 (friendly user interface), and Q05 (effi-ciency), and, on the contrary, the questions with higher mean values (greater than 4.000) in the first group are Q03 (speed)

Fig. 14. Adaptive learning materials provided by the expert system.

Table 1

The analysis results of the case study

Item Possible attributes have signiﬁcant diﬀerence

Self-actualization School pressure Health situation Compare with same ages Health responsibility Resident* Native place*

Sports Resident Compare with same ages Health situation

Nutrition Resident* Health situation School pressure Native place*

(11)

and Q07 (hope to automate). On the other hand, the ques-tions with lower mean values (smaller than 3.700) in the sec-ond group are Q01 (easy to use) and Q02 (friendly user interface), and the questions with higher mean values (greater than 4.400) in the second group are Q05 (efficiency), Q06 (hope to suggest), and Q07 (hope to automate). Besides, in the comparison of the two groups, there are statistically significant differences in Q02 (friendly user interface), Q04 (correctness), Q05 (efficiency), Q06 (hope to suggest), and Q10 (will use this system again) between two groups.

The mean points of each question in the first group are lower than the mean points in the second group, which means the acceptability of senior quantitative researchers to the system was lower than the acceptability of junior quantitative researchers. Since the system is a prototype, Q01 (easy to use) and Q02 (friendly user interface) got the lower points in the two groups: however, this can be improved in the future. There are significant differences in Q04 (correctness), Q05 (efficiency), and Q10 (will use this system again), and the p value of Q05 is smaller than 0.01, especially. That also means the acceptability of senior researchers to the system was lower than the acceptability of junior researchers. Q06 (hope to suggest) got higher points in the second group, and Q07 (hope to automate) got higher points in both groups, which means they are interested in the idea of this system and they hope more useful functions can be added into the system. Generally speaking, they are satisfied with the system.

8. Conclusion

In the questionnaire analysis, how to find a significant difference between two or more groups in one measure is one of the major problems which social science researchers are concerned about. However, finding possible significant differences is difficult for social science researchers. In order to assist junior researchers in finding possible significant differences, in this paper, we build an expert system to help

users find the possible significant differences from the data cube. The expert system provides the Significant Difference Viewer to assist users in exploring data. The viewer uses three kinds of indicators designed according to the experts’ experiments and the metarules defined by experts. The expert system also provides suggestion to select appropri-ate statistics methods to test users’ hypotheses. The system not only gives explanations about the suggestions but also provides learning courses for these methods.

The prototype of the proposed expert system was also implemented, and the experiment about a case study and the satisfaction of the system was done. The case study shows the analysis results are almost correct and suggestions are appropriate by the system. The satisfaction survey shows the acceptability of senior quantitative researchers to the sys-tem was lower than that of junior quantitative researchers. They were interested in the idea of this system and they hope more useful functions can be added into the system. Gener-ally speaking, they were satisﬁed with the system.

There are some future works about this research. First, the Indicators proposed in this paper can be defined more deeply to increase the precision. Second, in addition to sig-nificant difference, this expert system may consider some different kinds of questionnaire analysis methods, like regression and correlation, to help social science research-ers make questionnaire analysis more easily.

Acknowledgments

This research was partially supported by National Science Council of Republic of China under the number of NSC95-2520-S009-007-MY3 and NSC95-2520-S009-008-MY3. References

Agarwal, S., Agrawal, R., Deshpande, P. M. et al. (1996). On the computation of multidimensional aggregates. In Proceedings of the 22nd international conference on VLDB (pp. 506–521), Mumbai, India. Table 2

The results of the questionnaire

No. Question First group Second group

Mean SD Mean SD t Sig. (p)

Q01 This system is easy to use 2.800 0.837 3.647 1.115 1.563 0.134

Q02 The user interface of this system is friendly 2.400 0.548 3.529 1.281 2.855* _0.011

Q03 The speed of this system is fast 4.200 0.447 4.059 0.827 0.362 0.721

Q04 This system can help me find significant difference correctly 3.000 0.707 4.177 1.185 2.753* _0.018

Q05 This system can help me find significant difference efficiently 2.800 0.447 4.412 1.064 3.257** _0.004

Q06 I hope this system can suggest me the appropriate inferentially statistical methods to test signiﬁcant diﬀerence

3.400 0.548 4.529 0.943 2.527* _0.020

Q07 I hope this system can test signiﬁcant diﬀerence automatically 4.200 0.837 4.647 0.606 1.334 0.197 Q08 I hope this system can provide the teaching material about the inferentially statistical

methods

3.200 0.837 4.294 1.160 1.950 0.065 Q09 I hope this system can add diﬀerent inferentially statistical methods 3.600 0.548 4.118 0.928 1.556 0.147 Q10 I will use this system again in the future 3.000 1.000 4.059 0.966 2.139* 0.045 Q11 Generally speaking, I am satisﬁed with this system 3.400 0.548 4.000 0.791 1.576 0.131

*

p < .05.

**

(12)

Berndt, D. J., Hevner, A. R., & Studnicki, J. (2003). The Catch data warehouse: Support for community health care decision-making. Decision Support Systems, 35(3), 367–384.

Black, T. R. (1999). Doing quantitative research in the social sciences: An integrated approach to research design, measurement and statistics. Sage Publications Ltd..

Chang, C. H. (2006). Building a learning-by-doing remedial tutoring system for DNS management. Master Thesis, National Chiao Tung University, Taiwan.

Chaudhuri, S., & Dayal, U. (1997). An overview of data warehousing and OLAP technology. ACM SIGMOD Record, 26(1), 65–74.

Chen, C. S. (2003). Design and implementation of a unifying intelligent DNS management system. PhD Dissertation, National Chiao Tung University, Taiwan.

Chen, R. Y., Tseng, S. S., Liu, C.L., Chang, C. S. & Chen, C. S. (2005). Learning sequences construction using ontology and rules. In Pro-ceedings of ICCE2005 (13th international conference on computers in education), November 28–December 2, 2005.

Datta, A., & Thomas, H. (1999). The cube data model: A conceptual model and algebra for on-line analytical processing in data ware-houses. Decision Support Systems, 27(3), 289–301.

Gray, J., Chaudhuri, S., Bosworth, A., et al. (1997). Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Mining and Knowledge Discover, 1(1), 29–53.

Han, J., & Kamber, M. (2001). Data mining: Concepts and techniques. Morgan Kaufmann.

Inmon, W. H. (1996). Building the data warehouse (2nd ed.). John Wiley & Sons Inc..

Palpanas, T., Koudas, N., & Mendelzon, A. (2005). Using datacube aggregates for approximate querying and deviation detection. IEEE Transactions on Knowledge and Data Engineering, 17(11), 1465–1477.

Punch, K. F. (2005). Introduction to social research (2nd ed.). Sage Publications Ltd..

Sarawagi, S. (2000). User-adaptive exploration of multidimensional data. In Proceedings of the 26th international conference on VLDB conference (pp. 307–316), Cairo, Egypt.

Sarawagi, S., Agrawal, R., & Megiddo, N. (1998). Discovery-driven exploration of OLAP data cubes. Research Report, IBM Almaden Research Center.

Schutt, R. K. (2004). Investigating the social world: the process and practice of research (4th ed.). Pine Forge Press.

Tabachnick, Barbara G., & Fidell, Linda S. (1996). Using multivariate statistics (3rd ed.). HarperCollins Publishers.

Zhang, H., Padmanabhan, B., & Tuzhilin, A. (2004). On the discovery of signiﬁcant statistical quantitative rules. In Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 374–383).