The definition of association names - Predefining the important agreements

Chapter 5. Discovering the Project-Based Knowledge Map

5.2. Predefining the important agreements

5.2.2. The definition of association names

Next, the association names are helpful for enriching the conceptual classification of topic names and improving the meaningful transformation of rule statements. The template of binary association is R(X, Y), where X is the antecedent topic name of the association, Y is the subsequent topic name of the association, and R is an abstract relationship for further definition of association name. According to the opinions from experts and senior project managers, four pre-defined association names are proposed in this work. The explanation is given below and the summarization is given in Table 7.

(1) use: It is meaningful when X is a person and Y is a tool.

(2) engage_in: It is meaningful when X is a person and Y is a standard, activity or goal.

(3) work_with: It is meaningful when X and Y are in the same category.

(4) assist_in: It is meaningful when X and Y are in the category set of Standard, Tool, Activity and Goal. X and Y are in the different categories, where X could be a standard, tool or activity.

Table 7. The pre-defined association names The category of

antecedent topic Association name The category of subsequent topic

Member use Tool

Member engage_in Standard, Activity, Goal Member

Standard Goal, Tool, Activity

Tool Goal, Standard, Activity

Activity

assist_in

Goal, Standard, Tool

5.3. Discovering knowledge with data mining approach

Two-phase data mining method is herein employed for knowledge discovery.

The first phase employs clustering mining method to cluster project attributes into small groups. The second phase aims to discover internal association relationships for each cluster. Mainly, vector model is the basic data representation formation for the knowledge discovering operations. Furthermore, with the rapidly growth of project knowledge, we employ constraint-based data mining for determining relevant part of project-based knowledge map users from various context perspectives. The advantage facilitates the knowledge support and exploitation of the project-based knowledge map.

5.3.1. Vector model

A project attributes Oj is associated with a multi-dimensional vector, that is, (w1j, w2j,… ,wkj) , a weight value wij≥ 0 denotes the importance of attribute i on project j. For simplicity, this work uses the values 1 and 0 for the weight value to indicate whether the attribute is important (presence) or not important (absence)

to the project, respectively. Each dimension in the vector stands for a distinct term in the defined-term space of project attributes.

Based on the consistent project attributes, one hundred and fifty seven attribute dimensions are used for representing each project object. Therefore, vector models are intrinsically suitable to stand for multi-dimensional data in a computational format [24]. As shown in Table 8, an excerpt of the vector model describes the important attributes for each project object. Consequently, the further clustering and association mining operations are promisingly dependent on the vector model.

Table 8. An excerpt of the vector model for representing project attributes

Project Project object John OLAP DBMS ISO Mary UML CRM KM

DW007 spec_dw.pdf 1 1 1 0 0 0 0 0

service_sys.doc 0 0 0 1 1 1 1 0

server_m1.sql 0 1 0 0 1 1 1 0

manual_sys.doc 0 0 0 0 1 0 1 0

long_sa.dpf 1 0 1 1 0 0 1 0

KK008 spec_user.pdf 0 1 1 0 1 0 0 1

spec_sys.doc 1 0 0 1 0 1 0 1 transfer.sql 1 0 1 0 0 0 0 1

5.3.2. Clustering process

Clustering mining is helpful for grouping ｀close＇ data together based on similarity measurement. Forming clusters is to form clusters based on the dissimilarity matrix. A threshold value is an important parameter to decide how

‘close’ the project attributes will form a cluster. Project attributes with distance values less than a threshold value are grouped into the same cluster. For saving the computation cost in the high dimensional vector model, Euclidean distance metric and single-link agglomerative clustering method are used for grouping the collection of project attributes.

Table 9. A dissimilarity matrix using the measure of Euclidean distance

KA001 KA002 KA011 KA100 KA101 KA109

KA001 0 4.9 5.6 3.4 2.2 5.7

KA002 0 5.5 4.5 4.6 5.8

KA011 0 5.5 5.6 2.3

KA100 0 3.2 5.9

KA101 0 5.6

KA109 0

According to the example of dissimilarity matrix given in Table 9, project attributes KA001, KA101, KA100 and KA002 could form a cluster, and project attributes KA011 and KA109 could form a cluster if the given threshold value was 5.0. The algorithm places each object in its own cluster and gradually merges these atomic clusters into larger and larger clusters. If the threshold values is greater than 6.0, all objects are grouped are in a single cluster. The clustering dendrogram/tree of different threshold values is shown as in Figure 8.

Figure 8. Agglomerative clustering forms clusters in various thresholds

As a result, one hundred and fifty-seven topic names are grouped into three clusters. The clustering outcome is shown in Table 10. Each cluster of topic names is prepared for discovering association patterns to construct the foundation of project-based knowledge map.

Table 10. The result of the clustering mining operation Cluster No. Content

#1 (83) John Smith, Mary Brown, Joyce English, Robert King, Stephen Adams, David Campbell, DBMS, OLAP, Java, Clustering, Neural Net, Fuzzy set, Active Service Page , Visual Basic, Flash, Genetic Analysis, ANOVA, Strength Weakness Opportunities Threats, Call center, Certificate Authorities, Conference, Marketing, Contest, Questionnaire, Sampling, Knowledge Management, Tax, Customer Relationship Management, Enterprise Resource Planning, Electronic Commerce, e-Learning, Point of sale, Logistics, Enterprise Information Portal, ISO, Unified Modeling Language, XML, Synchronized Multimedia Integration Language, Simple Object Access Protocol, Health Level 7, MP3, Document Object Model, Secure Sockets Layer, Scalable Vector Graphics, ebXML, Standard Generalized Markup Language, XML Access Control Language, Extensible User Interface Language, Geography Markup Language, Concurrent Versions System, Extensible HyperText Markup Language, Microsoft Extensible Application Markup Language, Agent System, Text Categorization, Text Mining, Business Process Reengineering, Web clipping, Web-based system,

Frank Hale, Sharon Regan, Lisa Taylor, Bobby Kao, Steven Thomas, Karen Lee, Nancy Rice,

Virtual Reality Modeling Language (VRML), Electronic Business, ActiveX, Distributed Component Object Mode (DCOM),

Enterprise Application Integration (EAI),

Universal Description Discovery and Integration (UDDI), Java Server Page (JSP),

Quality management, Executive information system (EIS) , Materials requirements planning (MRP),

Object Linking and Embedding (OLE),

Supply chain management (SCM), Electronic mail,

Computer-aided software engineering (CASE), index server, OSP (online service provider), Very large database (VLDB), Robert Green, Peter Martin

# 2 (55) John Smith, Mary Brown, Frank Hale, Warren Chen, Jennifer Liu, Authentication, Consumer Behavior, Decision Support Systems, Electronic Document Delivery, Fuzzy reasoning,

Information Acquisition, Instructional Strategy, Learning Outcome, Recency Frequency Monetary (RFM) model , Learning Process, Learning Context, Longest Detour Problem,

Material Requirement Planning, Proxy server, Political Behavior, Mobile Communication Networks, Spanning Sub-graph,

Real-Time Model Selection, Regular Graphs, Encryption scheme, Conflict Theory, Social Exchange, Problem-solving Technique, Template Recommendation, Decision Model,

Digital Library, Mobil Commerce, Graph Sandwich,

Home Location Register, Overflow Control, Homogeneous Set,

Idea-Generation Support System, Peer-to-Peer, Just-in-time, Message Recovery,

Shop Floor Control Information System, Mobility Management, Perfect Replacement, Credit Card Payment, Organizational Politics, Simulation Analysis, Strategy Contingency Theory, Structuration Theory, Task Technology Fit, Petri Net, Stephen Adams, David Campbell, Nancy Hopkins, June Matthews, Robert Liu

# 3 (19) Congestion Control, Technology Valuation, Active Delay Control, Automated Negotiation, Game Theory, Groupware Development, Trust Dynamics, Virtual Group, TCP flows, Multi-issue Negotiation, Technology Transfer, Information Systems Strategic Planning (ISSP), Interpretive Scheme, Structural Equation Modeling (SEM),

Bobby Kao, Steven Thomas, David Campbell, Joyce English, Karen Lee

5.3.3.Context information service

As interacting with users, context information service summarized the important context conditions from project context, including the context types of role, location and organization. The description is shown in Table 11.

Table 11. The context conditions in the user selection

Context type Description Conditions

Role normal or customary title of a person in the project team.

database administrator systems analyst project manager, sales manager commercial designer senior programmer junior programmer consultant

software engineer auditor

Location the site or place of a certain object in the project, such as a person, or an activity.

Organization the company or organization who supports the projects or activities.

Bank

The context types of role, location and organization are important for describing the background of the project developers. Therefore, the user can chose the relevant context conditions for explicitly indicating the user interests.

Instead of intentionally entering keywords, context information service offers an easy yet useful selection menu for users to perceive user information needs.

Notably, the result of user selection is therefore used as the essential criteria for determining the relevant part of project-based knowledge map in the following association mining operations.

5.3.4. Association rule mining

Based on a cluster of similar project attributes, the further internal associations are rather valuable for developing project-based knowledge map. In order to extract the useful associations to meet user current information needs, user context is therefore applied as the important criteria in the association mining operation. According to the context conditions selected by users, various constraints are therefore established for extracting the useful associations to support different situations.

Figure 9. Context in constraint-based association mining

Since the user selection is proposed for improving the understanding with users, this work provides three-level association rule mining for extracting useful associations patterns from various context perspectives. As shown in Figure 9, various constraint-based association mining methods are used for extracting

context-independent associations, context-relevant associations and context-specific associations to support various information needs from different context perspectives. The details are explained as follows.

z The first phase applies Apriori algorithm for mining context-independent associations. The outcome offers overall associations for users without differentiating user situation [32].

z The second phase employs rule-constraint mining operation for selecting context-relevant associations. The consequence provides generally workable rules based on the outcome of user selection. The advantage saves the cost of manual separation for users.

z The third phase employs data-constraint mining operation for discovering context-specific associations for users. The outcome extracts various specific associations based on different context perspectives. The advantage supports users with pertinent associations to learn the pertinent rules directly and efficiently for facilitating further project development.

5.3.5.Context-independent association mining

In the first phase, we aim to discover general association patterns without considering user selection. Each project object in the occurrence layer of project-based knowledge map is linked to some proper topic names and is regarded as a transaction of topic names. Collecting the transactions of topic names for mining the topic names mostly appeared together in project objects is helpful for discovering useful associations. Therefore, Apriori algorithm, the well-known association mining method, is herein applied for extracting context-independent associations [1].

Figure 10. Apriori association rule mining

Let D be the collection of the transactions of topic names in the cluster. The operation of Apriori association mining generates CIA, the fundamental set of context-independent associations. The procedure is shown in Figure 10, and the effect generates a set of fundamental associations for further constraint-based association mining operations. Accordingly, more than twenty association rules are extracted from the cluster 1 as the support rate is 20%, as shown in Table 12.

Table 12. The set of extracted context-independent associations

OLAP ⇒ DBMS KM ⇒ XML Sharon ⇒ Lisa

ISO ⇒ EPR Karen Lee ⇒ JSP SOAP ⇒ DOM

DBMS ⇒ Call center EIS ⇒ KM XACL ⇒ XUL

Call center ⇒CRM GML ⇒ GIS ActiveX ⇒ OLE

David ⇒ Flash ebXML ⇒ EB SCM ⇒ CASE

John ⇒ Java Web clipping ⇒ Agent MP3 ⇒ SMIL

John ⇒ Mary MRP ⇒ SCM SGML ⇒ HTML

GA⇒ Frank DCOM ⇒ OLE Marketing ⇒ SWOT

Procedure Apriori association mining

Input: (D: the collection of transactions of topic names, minimum support threshold)

Ouput: (CIA: the collection of context-independent associations in D)

C_k: Candidate itemset of size k;

Lk : frequent itemset of size k;

for (k = 1; Lk !=; k++) do

Ck+1 = candidates generated from Lk;

for each transaction t ∈ D do increment the

count of all candidates in Ck+1 that are contained in t L_k+1 = candidates in Ck+1 with min_support

end for;

Return CIA= ∪_k L_k;

The benefit covers the overall associations to the utmost, but the outcome easily confuses users without further separation and classification. When the project schedule is tight, the use has no idea which association rule is most productive or practicable. Some research works ordered the association rules by the support or confidence rate for assisting users to make the proper choice.

However, it is not really efficient to support users. Therefore, the context-oriented solutions, context-relevant association mining and context-specific association mining, are proposed for further selection and differentiation to support users based on various context perspectives.

5.3.6.Context-relevant association mining

Then, the result of the user selection which indicates user information needs is applied as filtering constraints. For each association in the set of context-independent associations, if the context of the involved topics satisfies the context conditions of user selection, then the association will be added to the set of context-relevant associations. The procedure is shown in Figure 11.

Figure 11. The procedure of context-relevant association mining

Let CIA be the set of context-independent associations which has been extracted in previous phase and US be the set of context conditions specified by

Procedure Context_relevant association mining

Input: (US: the set of context conditions of user selection, CIA: the set of context-independent associations) Output: (CRA: the set of context-relevant associations)

CRA = {}

for each context-independent association Rk of CIA do if Rk satisfies a context condition contained in US then Rk is added to CRA

return CRA;

the user. For example, the result of user selection is DBA, Taipei and Government to indicate the context conditions of role, location and organization respectively. The operation of context-relevant association mining generates CRA, the set of context-relevant associations. The advantage separates the relevant associations for users according to the user selection. The conceptual outcome of the context-relevant association mining operation is given in Figure 12. From the collection of context-independent associations extracted in previous phase, the selected context conditions of DBA, Taipei and Government are applied for determining the context-relevant associations which satisfy the context conditions.

Figure 12. The conceptual outcome of context-relevant associations

The result of the context-relevant mining procedure is shown in Table 13.

Based on the outcome in Table 12, the selected context conditions is the important criterion for separating the relevant associations. As a result, the context-relevant associations contains (OLAP, DMBS) (ISO, ERP), (DBMS, Call center), (Call center, CRM),(John, Java),(John, Mary) and (KM,XML).

Table 13. The result of the context-relevant association mining procedure Context-independent

Associations Project Context (context type)

(OLAP, DMBS)

Context type User Selection Context-relevant Associations

role DBA

location Taipei organization type Government

(OLAP, DMBS) (ISO, ERP) (DBMS, Call center)(Call center, CRM)(John, Java)(John, Mary) (KM,XML)

5.3.7.Context-specific association mining

In the third phase, context-specific association mining applies data constraints as the major guidance for extracting internal associations from context-specific topic names. First operation selects the topic names which satisfy the context conditions of user selection from all topic names in the project-based knowledge map to generate the set of context-specific topic names. Then, Apriori algorithm is applied again for extracting the context-specific associations from the set of context-specific topic names. The procedure is shown in Figure 13.

Figure 13. The procedure of context-specific association mining

Let US be the set of context conditions specified by the user and D be the collection of transactions of topic names. Let For example, the result of user selection is DBA, Taipei and Government to indicate the context conditions of role, location and organization type, respectively. The operation of context-specific association mining first generates CSD, the collection of context specific transactions of topic names, and then generates CSA, the set of context-specific associations. The advantage purposefully discovers the certain associations which directly related to user selection for efficiently user-dependent

Procedure Context_specific association mining Input: (US: the set of context conditions of user selection,

(D: the collection of transactions of topic names) Output: (CSA, the set of context-specific associations)

CSD: the collection of context specific transactions of topic names CST = {}

for each transaction Tk of D do

if Tk satisfies a context condition contained in US then Tk is added to CSD

end for

Call Apriori procedure to extract CSA, the set of context specific association patterns from CSD.

return CSA;

knowledge support. The conceptual outcome is shown in Figure 14. Based on the context-specific topic names which satisfy the selected context conditions, the association mining is applied again for further mining the hidden associations.

Figure 14. The conceptual outcome of context-specific associations (a)

Based on the cluster 1, the topic names whose project context satisfy the selected context conditions are collected for forming the set of context-specific topic names. Then, association rule mining method is further applied for extracting the internal association patterns. The result is given in Table 14.

Table 14. The process result of context-specific associations (a)

Topics Project context

(context type) Topics Project Context (context type)

s Sales Manager (role) Taipei (location),

The shadow area in above table.

(UML, Mary), (ISO, KM), (Tax, ISO), (Tax, CA)

(DBMS, Call center),(OLAP, DMBS), (John, Mary)

Apriori algorithm

Furthermore, each context condition in user selection is the criteria to select precisely interesting transactions for Apriori association mining. Therefore, data constraint is applied again for processing context-specific association mining with small size of context conditions. First operation selects the topic names which satisfy single context condition of user selection from the set of context-specific transactions of topic names. Second operation is also to apply Apriori algorithm again to extract the context-specific associations. The procedure is to repeat above operations for all context conditions in user selection.

The conceptual outcome is shown in Figure 15. We apply the role of DBA as the single criteria for separating the DBA-specific topics, including ISO, DBMS, OLAP, Mary, and UML. Next, the new association of (ISO, DBMS) is found for reminding users the specific relationship between ISO and DBMS. The process is similar in the context conditions of Taipei and Government.

Figure 15. The conceptual outcome of context-specific associations (b)

Based on the cluster 1, the context-specific association is given in Table 15.

The sets of DBA-specific topics, Taipei-specific topics and Government-specific topic are selected separately. Next, employing association rule mining in each set is helpful extracting the context-specific associations separately according to the context conditions of DBA, Taipei, and Government. Comparing with the result of context-relevant association mining, several new associations are extracted from various perspectives, such as the associations of (ISO, DBMS), (CA, Call center), (ISO, Mary) and (Tax, ISO).

The advantage is helpful for intentionally focusing on certain associations for one certain condition. The user can pay more attention to the association between ISO and DMBS if the user cares about the condition of being a DBA.

Precisely locating the certain associations is useful for learning the special solutions for special condition rapidly.

Table 15. The process result of context-specific associations (b)

User Selection Context-specific Topics Context-specific Associations Taipei (location) DBMS, John, Mary, CA,

Call center

5.4. Transforming binary associations into rule statements

The association type in the framework of project-based knowledge map is proposed as binary associations, instead of complicated n-ary associations. The formal definition for binary associations is proposed in Figure 16. The irreflexive, non-transitive and strictly anti-symmetric properties reserve the consistence of the binary associations and the consistent rule transformation in this work.

Figure 16. The proposed definition for binary associations

For example, (XML, HTML) is an extracted association, and the properties and rule transformation are explained as below:

z irreflexive: The association of (XML, HTML) does not imply that the associations of (XML, XML) and (HTML, HTML) exist. The advantage reduces the redundant rules.

z non-transitive: If the association of (HTML, Flash) is also extracted

在文檔中發掘情境導向之知識地圖以管理專案知識 (頁 53-0)