Thesis Organization - 整合多資料來源可篩除項目集

CHAPTER 1 Introduction

1.2 Thesis Organization

The rest of this thesis is organized as follows. We review the background and the related works including different algorithms for erasable-itemset mining, an example for META erasable-itemset mining algorithm are described in Chapter 2. The proposed approach of multi-sources mining algorithm for erasable itemsets and an example for the proposed algorithm are described in Chapter 3. In addition, Chapter 4 involves the experimental environment, experimental setting, and experimental results. Finally, the conclusion and future of this work are given and discussed in Chapter 5.

CHAPTER 2 Review of Related Work

In this chapter, we will quick review some related researches that include the erasable itemset mining.

2.1 Erasable-Itemset Mining

Deng et al. defined the problem of erasable itemset (EI) mining in 2009, and the problem originates from production planning associated with a company or factory that produces many different types of products. Each product is created by some components and in order to produce all the products, the factory has to purchase and store these component items. In an economy or financial crisis, the factory cannot keep to purchase all the necessary items for current production need; therefore, the managers should consider their production plans to ensure

the stability of the factory operation. The problem is to find out the itemsets that can be eliminated but do still keep the factory’s profit as the expectation that we

settled as a threshold when we find the itemset. The information which found out by erasable itemset process support managers to make decision to renew

production plan.

For another case that we assume that manager team want to renew or expand the products of factory production but that factory cannot support the requirement then manager team need to find out what products need to be removed from the product list from current production. In this situation, the managers can use EI mining to find EIs, and replace them with the new products while keeping the factory running and control of the factory’s profit. With erasable

mining, the managers can introduce new products list that factory can meet the expectation.

2.2 The Erasable-Itemset Mining Algorithm

Deng et al. defined the problem of erasable itemset mining in 2009, and also proposed the META algorithm, an iterative approach that uses a level-wise

search for EI mining, which is also adopted by the Apriori-based algorithm in frequent pattern mining. This approach also uses the property: ‘if itemset A is non-erasable and B is a superset of A, then B must also be non-non-erasable’ to reduce

the search space. The level-wise-based iterative approach finds erasable (k + 1) itemsets by making use of erasable k-itemsets. The details of the

level-wise-based iterative approach are as follows.

To find the set of erasable level 1-itemsets as E1, then E1 is used to find the set of erasable level 2-itemsets E2, which is used to find E3, and so on, until no more erasable level k-itemsets can be found. The finding of each EI requires one scan of the dataset.

The META algorithm is stated as follow.

The META algorithm:

Input: A product database FPT (Factory Product Table) with products and their

gain values, and to give an erasable ratio threshold t, and a set of all items I.

Output: The set of erasable itemsets for the database FPT.

Step 1: Calculate the total gain value GV of the product database FPT as follows:

𝐺𝑉 = ∑ 𝑑_𝑖. 𝑉𝑎𝑙

𝑑_𝑖∈𝐷

Where di is one of products in FPT.

Step 2: List the items appearing in the product database FPT as the candidate

erasable level 1-itemsets L1.

Step 3: Set k = 1, where k records the number of items in the itemsets currently being processed.

Step 4: For each k-itemset s, calculate its gain value as follows:

𝑔𝑣(𝑠) = ∑ 𝑑_𝑖. 𝑉𝑎𝑙

{𝑑𝑖| 𝑠∩𝑑𝑖.𝐼𝑡𝑒𝑚𝑠 ≠⏀}

Step 5: Put a candidate erasable k-itemset s in Lk into the set (Ek) of erasable k-itemsets if its gain value is smaller than or equal to the threshold  t.

Step 6: Form the candidate erasable level (k+1)-itemsets Lk+1 from the k-itemsets in Ek in a way similar to that in the Apriori algorithm.

Step 7: Set k = k+1.

Step 8: Repeat Steps 4 to 7 until no new candidate erasable itemsets are

generated.

Step 9: Output all the updated erasable itemsets generated so far.

After Step 9, the final set of erasable itemsets for the product database can be found out.

2.3 An Example for Erasable-Itemset Mining

An example is given below to illustrate the mentioned erasable-itemset mining algorithm. Consider the product database in Table 2.1 FPT database. It consists of eight products and seven items from item {A, B, C, D, E, F, G, H}.

Assume the maximum gain ratio threshold t is set at 35%. Then algorithm proceeds as follows.

Table 2.1 The FPT database

Step 2: The items from item A to item H appearing in the product database FPT

are collected as the candidate level 1-itemsets, L1.

Step 3: The variable k is set as 1, that k records the number of items in the

itemsets currently being processed.

Step 4: The gain value of each 1-itemset in L1 is calculated. Take item {A} as an example. It appears in d1 and d2 in the product database. Its gain value is thus 200+1000, which is 1200. The gain values of the other items can be found in the same way. The results are shown in Table 2.2.

TABLE 2.2:THE GAIN VALUES OF LEVEL1-ITEMSETS IN L1

Step 5: These candidate erasable level 1-itemsets are then checked for whether

their gain values are smaller than or equal to the maximum gain threshold T t, which is 2420  0.35= 847. In this example, the set of erasable level

1-itemsets includes {C}, {D}, {E}, {F} and {H} satisfy the condition and are then put into the erasable level 1-itemsets E1. The results are shown in Table 2.3.

Table 2.3: The erasable level 1-itemsets with their gain values in E1

Itemset Gain Value

Step 6: The candidate erasable level 2-itemsets are then formed from the

erasable level 1-itemsets in E1. In this example, the following ten candidate level 2-itemsets are generated: {CD}, {CE}, {CF}, {CH}, {DE}, {DF}, {DH}, {EF}, {EH} and {FH}.

Step 7: The varaibale k is then increased to 2.

Step8: In this example, since ten candidate erasable itemsets are generated in

Step 6, Steps 4 to 7 are repeated as follows. The gain values of the ten candidate level 2-itemsets are calculated and compared with the maximum gain threshold in Steps 4 and 5. Take the 2-itemset {CD} as an example.

Since all the products d3 to d8 contain at least C or D, the gain values of these products are then added as the gain of {CD}, which is 750. The value is larger than the maximum gain threshold 847 and thus {CD} is a level 2-erasable itemset. The other candidate 2-itemsets can be processed in a similar way. Finally, the set of erasable 2-itemsets and the gain value are less than the threshold shown in Table 2.4.

Table 2.4: The gain values of the mined erasable level 2-itemsets in E2

Itemset Gain Value

CD 750

CE 700

CF 600

CH 720

DF 600

EF 700

FH 820

Then in Step 6, the only candidate erasable level 3-itemset, {CDF}, is generated and forms L3. The variable k then becomes 3 and Steps 4 to 7 are executed again. The gain value of {CDF} is 750, smaller than the maximum gain threshold. Thus, it is an erasable itemset.

Step 9: All the erasable itemsets generated so far are output. The results are shown in Table 2.5.

T^ABLE2.5:THE FINAL ERASABLE ITEMSETS IN THIS EXAMPLE

Itemset Gain Value

C 250

D 600

E 600

F 350

H 470

CD 750

CE 700

CF 600

CH 720

DF 600

EF 700

FH 820

CDF 750

CHAPTER 3 A Multiple Sources Data Mining Algorithm for Mining Erasable Itemsets

In this chapter, we will present the proposed approach for incrementally mining erasable itemsets. This chapter are organized as follows. The main idea about the proposed algorithm is described in Section 3.1. The notation used is listed in Section 3.2. The proposed TEIMA (Erasable Itemset- Multi-Sources Data Mining) erasable mining algorithm is stated in Section 3.3. Finally, an example for the proposed TEIMA erasable mining algorithm is given to explain the execution process in Section 3.4.

3.1 Main Idea

When companies or factories need to merge and the decision maker need to make decision to close some factories or stop some products in some factories, at first we are thinking about two factories merging and we can get the related individual factories EI data of those two factories, then split them to three item

sets as MEI, Q and R and do them with three cases:

Case 1: MEI set is including the all erasable itemset existing in both factories

product database and they are erasable for both factories, so for future merged factory that all items in MEI set will still be erasable items.

Case 2: Q set is including the erasable those just appear in the first factory (FPT1) but not appear in the second factory (FPT2), for each item in Q set that its erasable gain value from the first erasable itemset (EI1) is the gain value from FPT1 then we no need to rescan the gain value again on FPT1 and just need to scan it on FPT2 database to get its actual gain value then add with the value from same item in EI1 then we can get the merged gain value then compare with merged threshold value to see if it is erasable or not. It is easy to understand from META algorithm that will also can early remove the items in Q if it is non-erasable then its child item will also be erasable then more items can be skipped to scan the FPT2 database and reduce more time spent.

Case 3: Same as Q set that R set is including the erasable those just appear in the second factory (FPT2) but not appear in the first factory (FPT1), so for each item in R set that its erasable gain value from the first erasable

itemset (EI2) is the gain value from FPT2 then we no need to rescan the gain value again on FPT2 and just need to scan it on FPT1 database to get its actual gain value then add with the value from same item in EI2 then we can get the merged gain value then compare with merged threshold value to see if it is erasable or not. It is easy to understand from META algorithm that will also can early remove the items in R set if it is non-erasable then its child item will also be erasable then more items can be skipped to scan the FPT1 database and reduce more time spent.

3.2 The Terms

In this section, we introduce some concepts and terms that are used in this thesis.

Term 1: Factory Products Table (FPT):

The manufacturing company may own many different production factories those maybe locate on different countries, areas or cities places. They produce different products with different cost and gain values those we can list them as the Factory Products Table (FPT) for each production site or factory as follows:

Table 3.1: Factory Products Table (FPT) from Factory 1 as FPT1. product that is made by different components. For example, in this case, product

(ABE) is made by components A, B and E, and its PID is D01. For Factory 1, the product (ABE), its PID will plus factory ID (01) as D0101 lists in Factory 1’s FPT.

For same product and producing in different factories that base on different cost and profit, so that will have different gain value between different factories for same product.

For example, the gain value of item (ABE) in Factory 1 is 200 and 300 in Factory 2.

Term 2: Erasable Item sets (EI):

For each production factory that can be mined out its erasable item sets table that lists the erasable items with its gain value lower than the given threshold for individual site business analysis.

The EI is meaning that we can remove those EI components and products from the production line to meet the production down-sizing target and it can still keep operation with profitable. The EI table as below, EI1 that was mined from Factory1 base on FTP1 bases on the threshold 35%.

Table 3.2: Erasable table from Factory1 as EI1. EI1

Term 3: Component item list (I) and Non-appear component item list (NI):

The manufacturing factories are using different components lists for production, so there are some components different between factories, then for

Factory 1 (2) that we can get the component item (I) as,

Table 3.3: Component Item I1 of Factroy1.

Table 3.4: Component Item I2 of Factroy2.

In some cases, we have to merge some production manufactories due to

production scale downsizing, product class reducing or trading factories between companies. There is an issue/problem on how we can get the merged erasable item sets efficiently to figure out how arrange the production between operating sites to meet the business target. Here, we propose an efficient algorithm to get the merged EI and FPT efficiently for business decision making. The execution details of the proposed algorithm are described in the next section.

3.3 The Proposed Two-factory Erasable-itemset Merging Algorithm (TEIMA)

In this section, the proposed two-factory erasable-itemset merging algorithm (TEIMA) is described in details. The whole algorithm is stated below:

The Two-factory Erasable-itemset Merging Algorithm (TEIMA):

INPUT: A company with two factories, each of which has its own factory product

table (FPT) and total profit value (TP), a threshold ratio t for erasable itemsets, and the erasable itemsets (EI) in each factory.

OUTPUT: The merged erasable itemsets from the 2 factories.

Step 1: Initially set the final merged erasable set MEI as .

Step 2: Calculate the merged total profit value threshold (MT) as (TP1+TP2)*t,

where TP1 and TP2 are the total profits of the two factories and t is the threshold ratio.

Step 3: Find NI1 = I2 – I1, and NI2 = I1 – I2, where I1 and I2 are the sets of components (items) appearing in Factories 1 and 2 respectively, NI1 is the set of components (items) not appearing in Factory 1 but appearing in Factory 2, and NI2 is the set of components (items) not appearing in Factory 2 but appearing in Factory 1.

Step 4: Divide the erasable itemsets in both EI1 and EI2 into the following three cases:

Case 1: Set MEI = (EI1 ∩ EI2) ∪(EI1 ∩ NI2) ∪ (EI2 ∩ NI1), where each itemset in MEI exists in both EI1 and EI2 and is certainly a final erasable itemset.

Case 2: Set Q = (EI1 ∪ NI1) – EI2, where each itemset in Q exists in EI1 or NI1 but not in EI2, and may or may not be a final erasable itemset.

Case 3: Set R = (EI2 ∪ NI2) – EI1, where each itemset in R exists in EI2 or NI2 but not in EI1, and may or may not be a final erasable itemset.

Step 5: (for Case 1) Calculate the merged gain value of each itemset e ∈ MEI, which is a final erasable itemset, as follows:

e.GainValue = e.GainValue1 + e. GainValue2

Where e.GainValue1 and e.GainValue2 are the gain values of the erasable itemset e in EI1 and EI2, respectively.

Step 6: Set k = 1, where k records the number of items in the itemsets currently being processed.

Step 7: (for Case 2). For each k-itemset e ∈ Q, which is an erasable k-itemset

in EI1 but not in EI2, do the following sub-steps:

Step 7-1: Set its reduced itemset e’ = e - NI2;

Step 7-2: If e’ is , set the gain value of e’ in the second factory product table (FPT2) = 0; otherwise scan FPT2 to get the gain value of e’

in FPT2.

Step 7-3: For e’, to calculate the merged gain value of e as:

e.GainValue = e.GainValue1 + e’.GainValue2.

Step 7-4: Add e to MEI if e.GainValue is less than or equal the merged total profit value threshold (MT).

Step 8: (for Case 3) For each k-itemset e ∈ R, which is an erasable k-itemset in

EI2 but not in EI1, do the following sub-steps:

Step 8-1: Set its reduced itemset e’ = e - NI1.

Step 8-2: If e’ is , set the gain value of e’ in the first factory product table

(FPT1) = 0; otherwise scan FPT1 to get the gain value of e’ in

FPT1.

Step 8-3: Calculate the merged gain value of e as:

e.GainValue = e.GainValue1 + e’.GainValue2.

Step 8-4: Add e to MEI if e.GainValue is less than or equal the merged total profit value threshold (MT).

Step 9: If k = 1, set NI1 = NI1 ∩ MEI¹ and NI2 = NI2 ∩ MEI¹, where MEI¹ is the 1-itemsets in MEI.

Step 10: Set k = k + 1.

Step 11: Generate the candidate k-itemsets C^k from MEI^k-1, where MEI^k-1 is the (k-1)-itemsets in MEI.

Step 12: Remove from Q the k-itemsets which are not in C^k, and add the k-itemsets in C^k to Q which contains at least one item in NI1 with the other items not NI1 forming a (k-1)- itemset in Q. Set the gain value of the added k-itemset = the gain value of the corresponding (k-1)-itemset in Q.

Step 13: Remove from R the k-itemsets which are not in C^k, and add the k-itemsets in C^k to R which contains at least one item in NI2 with the other items not NI2 forming a (k-1)-itemset in R. Set the gain value of the added k-itemset = the gain value of the corresponding (k-1)-itemset in R.

Step 14: If there are no k-itemsets in both Q and R, then output MEI as the final

results, otherwise, repeat Steps 7 to 14.

3.4 An Example of the Proposed Algorithm (TEIMA )

In this section, a simple example is given to show how the proposed algorithm can be easily and efficiently used to find out the merged erasable itemsets from a two-factory environment. Assume there is one company which owns two manufacture factories denoted as Factory 1 and Factory 2. Each factory has its own factory product table (FPT), which records the products it produces, the items (components) for manufacturing the products, and the values the products make. Assume the two factory product tables respectively for the two factories are shown in Table 3.5 and Table 3.6.

Table 3.5: The factory product table for Factory 1

Table 3.6: The factory product table for Factory 2 FPT2

Assume the threshold of erasable itemsets for both the factories are 0.35.

For Factory 1, the total values of the products are 2420. The value threshold for erasable itemsets is 2420*0.35, which is 847. The erasable itemsets for Factory1

can thus be obtained by the batch approach and the results are shown in Table 3.7.

Table 3.7: The erasable itemsets obtained for Factory1 EI1

Similarly, the total values of the products for Factory 2 are 3050. The value threshold for erasable itemsets is thus 3050*0.35, which is 1067.5. The erasable itemsets for Factory 2 are thus obtained as shown in Table 3.8.

Table 3.8: The erasable itemsets obtained for Factory 2

With these individual data for the two factories, the decision makers at the headquarters may want to query about the final erasable itemsets when considering Factory1 and Factory 2 together. For this case, the proposed

two-factory erasable-itemset merging algorithm (TEIMA) proceeds as follows.

Step 1: The final merged erasable set MEI is initially set as .

Step 2: Since the threshold ratio is 35%, the merged total profit value threshold (MT) is calculated as:

MT = (2420+3050)*0.35 = 5470*0.35 = 1914.5.

Step 3: The set of components (items) not appearing in Factory 1 but appearing in Factory 2, and the set of components (items) not appearing in Factory 2 but appearing in Factory1 are found. In this example, the set of components in the two factories are: I1 = {A, B, C, D, E, F, G, H} and I2 = {A, B, C, D, E, F, G}, respectively. Thus, NI1 = I2 – I1=, and NI2 = I1 – I2 = {H}.

Step 4: The erasable itemsets in both EI1 and EI2 are divided into the three cases – MEI, Q and R as follows.

Case 1: MEI = (EI1 ∩ EI2)∪ (EI1 ∩ NI2) ∪ (EI2 ∩ NI1). The result is {{C}, {D}, {CD}} ∪ {{H}} ∪ , which is {{C}, {D}, {H}, {CD}}. The result with the

gain values for the example is shown in Table 3.9. All these itemsets are certainly erasable after the merge.

Table 3.9: The itemsets in Case 1 MEI

Itemset Gain Value in FPT1

Gain Value in FPT2

C 250 700

D 600 750

F 350 1050

H 470 0

CD 750 750

Case 2: Q = (EI1 ∪ NI1) – EI2, where each itemset in Q exists in EI1 or NI1

but not in EI2. The result for the example is shown in Table 3.10. All these itemsets are not certainly erasable after the merge.

Table 3.10: The itemsets in Case 2

Case 3: R = (EI2 ∪ NI2) – EI1, where each itemset in R exists in EI2 or NI2

but not in EI1. The result for the example is shown in Table 3.11. All these itemsets are not certainly erasable after the merge.

在文檔中整合多資料來源可篩除項目集 (頁 16-0)