• 沒有找到結果。

A Partial View Materialization Approach in Data Warehouse Environment

N/A
N/A
Protected

Academic year: 2021

Share "A Partial View Materialization Approach in Data Warehouse Environment"

Copied!
20
0
0

加載中.... (立即查看全文)

全文

(1)1. Workshop on Databases and Software Engineering. A Partial View Materialization Approach in Data Warehouse Environment Huei-Huang Chen. Ting-Yi Chen. hhchen@cse.ttu.edu.tw cti@it01.cse.ttu.edu.tw Department of Computer Science and Engineering, Tatung University No. 40, Chung-Shan N. Rd., Sec. 3, Taipei, 104, Taiwan, R.O.C. Tel: 886-2-25925252 ext.3295 Fax: 886-2-25925252 ext.2288. Abstract In a data warehouse environment (DWE), users access very large databases to carry out strategic analysis for maintaining business competitiveness by executing OLAP queries. Therefore, efficient query processing becomes a critical issue. The storage space of the dedicated OLAP server is smaller than the total size of all data cubes usually. That is, only parts of the data cubes can be stored. Materialized views selection is the most important decisions in building a Data Warehouse. At the heart of all OLAP or multidimensional data analysis applications is the ability to simultaneously aggregate across many sets of dimensions. Computing multidimensional aggregates is a performance bottleneck for these applications. In this paper, we address an approach that is combined materialized views selection and computing the views for implementation of materialized views. The Partial View Materialization Approach includes two phases of views selection and views materialization. It also includes the advantages of Extended Reverse Progressive View Materialization Algorithm (ERPVMA) and Overlap Method, so the approach can effectively improve the data warehouse performance. Keywords: On-Line Analytical Processing; Data Cube; Data Cube Lattice; Materialized View; Partially-Materialized Lattice; Overlap Method..

(2) 2. 1. Introduction In many enterprises, Decision Support Systems (DSS) play an important role for businesses. The main reason for DSS’s popularity is that DSS are the key to gaining competitive advantage for business. Many enterprises have built or are building unified decision-support databases called data warehouses on which users can carry out their analysis. Since queries to data warehouse tend to be queried which identify trends in large multidimensional data, these queries typically make large use of aggregations. This leads to a necessity of multidimensional data analysis: On-Line Analytical Processing. OLAP is a technique that guarantees extremely fast response time for multidimensional queries in data warehouse. Two commonly used techniques to speed up OLAP queries are materialized views and indices. But the storage space of the dedicated OLAP server may be smaller than the total size of all data cubes. That is, only parts of the data cubes can be stored, and so we need to select suitable data cubes that can minimize the query cost and can used to answer more queries. Materialized views selection is the most important decisions in building a Data Warehouse. At the heart of all OLAP is the ability to simultaneously aggregate across many sets of dimensions. Computing multidimensional aggregates is a performance bottleneck for OLAP. So how to improve the question to increase the OLAP performance is one of the most significant studies of the subject. Hence, in this paper, we address an approach that is combined materialized views selection and computing the views for views materialization. The Partial View Materialization Approach includes the advantages of ERPVMA and Overlap Method, so we expecting the approach can improve the data warehouse performance, and let the performance optimizations..

(3) 3. We consider a warehouse of retail information, with point-of-sale (pos) data from one thousand of stores. The point of sale data is stored in the warehouse in a large pos table, called a fact table. It contains a tuple for each item sold in a sales transaction. Each tuple has the format: pos (storeID, itemID, quarter, qty, price). Let the stores and items tables contain store information and item information, respectively. The key of stores is storeID, and the key of items is itemID. stores(storeID, city, region). items (itemID, name, category, cost). The rest of the thesis is organized as follows. In Chapter 2, we survey the related works that is used by the Approach that we address. In Chapter 3, we detail explain the Partial View Materialization Approach that we proposed. In Chapter 4, we experiment on Extended Progressive View Materialization Algorithm and Extended Reverse Progressive View Materialization Algorithm. Then, analysis and explain what are advantages of the approach that we present. In Chapter 5, we conclude this thesis and point out the future work that has arisen from this area.. 2. Related Works 2.1. The Lattice Framework . Lattice Model [4]: We can aggregate any combinations of the three dimensions, stores, items, quarter, and obtain totally eight possible views (group-by or cubes). The relation of these eight views can be modeled as a cube-lattice shown in Figure 2.1.. . Dimension Hierarchies and Lattices [4]: The various dimensions represented by the group-by attributes of a fact table often are organized into dimension hierarchies. The hierarchies are very important to OLAP queries, because they.

(4) 4 (storeID, itemID, quarter). (storeID,itemID). (storeID,quarter). (itemID, quarter). (storeID). (itemID). (quarter). (). Figure 2.1: Data cube lattice of pos. (itemID, storeID). storeID. city. itemID. region. category. none. none. (a) stores. (b) items. Figure 2.2: Hierarchies for the stores and items. (category, storeID). (itemID, city). (storeID). (category, city). (itemID, region). (city). (category, region). (itemID). (region). (category). none. Figure 2.3: Combining two hierarchical. realize the “drill-down” and “roll-up” operations. With the presence of hierarchies, the lattice diagram becomes more complex. For example, consider two dimensions: store and item, whose hierarchies are shown in Figure 2.2. The resulting diagram of combining these two dimensions with hierarchies is shown in Figure 2.3. . Generalized Cube Views and Partially Materialized Lattices [6]: Some views may do aggregation on columns used as dimension attributes in other views. We will call these views generalized cube views. A partially materialized lattice is obtained by removing some nodes of the lattice, to represent the fact that the corresponding views are not being materialized. When a node n is removed, all.

(5) 5. incoming and outgoing edges from node n are also removed, and new edges are added between nodes above and below node n .. 2.2. Progressive View Materialization Algorithm (PVMA) The benefit and cost metrics that discussed in the paper [8] use in selecting views to materialize are base on the frequency of updates and accesses on each view and the view size. The PVMA assumes that the selection of each materialized view is done independently. It also assumes that there is no space constraint in the warehouse and that the OLAP uses relational database systems (ROLAP). The benefit benefit k (v ) of selecting a view v is considered for all views v that are not in the set of selected views in iteration k . The profit benefit k (v) of a view v is the subtraction of benefit and cost of view v . The view that yields the maximum positive profit is selected to materialize. The search terminates when it is not possible to increase the profit further. . Data Cube Lattices: As discussed in the paper [4], a data cube can be presented by using a lattice. For example, consider that there are 2 dimensions: Product and Region as a sample. The representation of these two dimensions as a lattice is shown in Figure 2.4. The box in Figure 2.4, a number at lower left represents a view size and a number at lower right is the frequency of queries on the view. P1.R1 1,000,000. 1. P2.R1 600,000. 2. P3.R1 100,000. 5. P1.R2 700,000. 2. P2.R2 4,00,000. 3. P3.R2 80,000. 10. P1.R3 1,00,000. 4. P2.R3 70,000. 1. P3.R3 2,000. 20. P1.R4 30,000. 7. P2.R4 5,000. 30. P3.R4 100. Figure 2.4: Lattice of data cube [8].. 1.

(6) 6. . Nearest Materialized Parent View (NMPV): NMPVk (v) = min( R(v s ), R(v)). * ∀v s ∈ S , where v s  → v. * where v s  → v means that view v s is one of the parents of view v , and S. represents a collection of views which the PVMA algorithm decides to materialize. . Benefit Calculation: The benefit benefit k (v) of view v in iteration k is: ( R( NMPV (v) − R(v)) benefit k (v) = ( ∑ fp )Trba bf p∈child ( v )∪ v where Trba is the time for one random block access, and bf is the blocking factor of the view. The view child (v) represents child views, whose NMPV is v.. . Cost Calculation: Each change to the fact table in OLAP involves update to each view - all dimensions and all levels of each dimension are affected. We recommend that when aggregated views are created, their primary key indexes are created as well. We assume that their primary key indexes are implemented as B + trees. We derive the time estimation for each update operation as follows, where f represents the frequency of each operation on a view. . Insert and Delete: (i) read a block of a view, (ii) rewrite a block of the view, (iii) read a leaf node (block), and (iv) rewrite the leaf node. Thus total cost for the insertion of one row is 4 f (rba ) .. . Update: (i) read a leaf node, (ii) read a block of a view, and (iii) rewrite the block of a view. Thus total update cost for one update is 3 f (rba ) .. We can derive a formula of the cost of update operations on view v as cos t = (∑ 4 N i f + ∑ 4 N i f + ∑ 3 N u f u )Trba . Trba is the time for one random i∈I. block access.. d ∈D. u∈U.

(7) 7. . Profit Calculation: PVMA calculates benefit and cost for each view in each iteration. The function profit k (v) that is the profit of view v , and is defined as profit k (v) = benefit k (v) − cos t (v) .. 2.3. Overlap Method . Choosing a Parent to Compute a View [1]: Each view in the view DAG (Directed Acyclic Graph) has more than one parent from which it could be computed. We need to choose one of these parents thus converting the DAG to a rooted tree. The root of the tree is the base view and each view’s parent is the view to be used for computing it. For example, one possible tree for computing the DAG in Figure 2.5 is as shown in Figure 2.6.. . Choosing a Set of Views for Overlapped Computation: The next step is to choose a set of views that can be computed concurrently within the memory constraints. To compute a view in memory, we need memory equal to the size of its partition. We assume that we have estimates of sizes of the views. We have shown that finding an overall optimal allocation scheme for our view tree is NP-hard. So, instead of trying to find the optimal allocation we do the allocation by using the heuristic of traversing the tree in a breadth first (BF) search order: . Views to the left have smaller partition sizes, and require less memory. So consider these before considering views to the right.. . Views at a higher level tend to be bigger. Thus, these should be given higher priority for allocation than views at a lower level in the tree.. . Example Computation of a CUBE: Consider the CUBE to be computed on {A, B, C, D}. The tree of views and the estimates of the partition sizes of the views are shown in. If the memory available is 25 pages, BF allocation will generate three subtrees, each of which is computed in one pass. These subtrees are shown.

(8) 8 (A, B, C, D). (A, B, C, D). (A, B, C). (A, B). (A, B, D). (A, C, D). (B, C, D). (A, C). (A, D). (B, C). (B, D). (A). (B). (C). (D). (A, B, C) [1]. (A, B) [1]. (C,D). (A, B, D) [1]. (A, C) [1]. (A, D). (A) [1]. (B) [1]. (A, C, D) [10]. [1] (B, C). [5]. (C) [1]. (B, C, D) [10]. (B, D) [1]. (C,D) [40]. (D) [5]. (). (). [...] indicates estimated partition size in number of pages. Figure 2.5: Sort orders enforced on the views.. Figure 2.6: Estimates of partition sizes. (A, B, C, D). (A, B, C) [1]. (A, B) [1]. (A, B, D) [1]. (A, C, D) [10]. (B, C, D) [10]. (A, C) [1]. (A, D) [5]. (A) [1] (C, D) [10]. (B, C, D) [10]. (B, C) [1]. (B, D) [1]. (C, D) [40]. (C) [1]. (B) [1]. (D) [5]. (). Figure 2.7: Steps of the algorithm [1].. in Figure 2.7. In the second and third steps the views (B, C, D) and (C, D) are allocated 10 Pages as there are 9 sorted runs to merge.. 3. Partial View Materialization Approach In order to combine the views selection and views materialization to improve the performance of OLAP or multidimensional data analysis applications in data warehouse environment. We address an approach that combines materialized views.

(9) 9. selection and computing the views for implementation of materialized views. Figure 3.1 illustrated the Partial View Materialization Approach that we proposed in this paper. First, we build the EGCVL from the base tables. Second, through ERPVMA we select the appropriate set of materialized views. Third, according to the set, we can build Partially-Materialized Lattice as the input in the Overlap Method. Then we utilize Overlap Method to compute these views.. 3.1. View Selection 3.1.1. Extended Generalized Cube Views Lattices (EGCVL) Base on the notion as mentioned in the Section 2.1, we can get the generalized cube views lattices (GCVL). Then we let GCVL combine with the data cube lattice that is MV1. MV2. M V3. M V 4 ..... ......... ........ ......... ......... ........ ......... ........ ..... ..M V m M a te r ia liz e d V ie w s. V ie w M a te r ia liz a t io n : O v e r la p M e t h o d. P a r ti a lly -M a t e r ia liz e d L a t tic e. V1. V2. V3. V 4 ... ......... ........ ......... ........ ......... ......... ........ ......... ..... Vm. T h e v ie w s to b e m a t e r i a li z e d. V ie w S e le c ti o n : ERPVMA. E G C V L a t tic e. BT1. BT2. BT3. B T 4 ....... ........ ......... ........ ......... ........ ......... ......... .... B a s e T a b le s. Figure 3.1: The Partial View Materialization approach.. BTn.

(10) 10. mentioned in Section 2.2. The new data cube lattice that is produced by us to be called Extended Generalized Cube Views Lattices (EGCVL). Figure 3.2 shows the EGCVL for pos. A number at upper middle of the box in Figure 3.2 represents the index of a view. A number at lower left of the box in Figure 3.2 represents the size of a view. A number at lower right of the box in Figure 3.2 is the frequency of queries on the view. The top view (storeID, itemID, quarter) corresponds to the fact table. EGCVL is our main framework to analysis and selection materialized views.. 3.1.2. Cost Model The cost model in our approach is base on that is proposed by [8]. It has described in Section 2.2. But we modify the cost model a little bit. It is described as follows. . Benefit Calculation: Sometimes, the child views of view v are not all have the same Nearest Materialized Parent View with v in a data cube lattice when materialized view selection every time. Before we discuss the new benefit formula in the EPVMA, let us introduce a notion Different Nearest Materialized Parent View Child Views (DNMPVC). If v was materialized, we say that the view vc is one of the child views of v , but NMPV (vc ) ≠ NMPV (v) . The DNMPVC( v ) represents the set of vc . Contrary, the Same Nearest Materialized Parent View Child Views (SNMPVC) means if v was materialized, the view vc is one of the child views of v , and NMPV (vc ) = NMPV (v) . The SNMPVC( v ) represents the set of vc . Hence, the new benefit calculation is: benefit k (v ) = ((. ( R ( NMPV (vc )) − R (v )) bf vc ∈DNMPVC ( v ). ∑. ∑ fp ) + (. p∈child ( vc ) ∪vc. ( R ( NMPV (v )) − R (v )) bf. ∑ fp. p∈SNMPVC ( v ) ∪v. ))Trba. (3.1).

(11) 11 0 store ID , ite m ID , qu arter 1,000,000 1. 1 storeID , item ID 250,000 10. 4 store ID , ca te gory 1 00,000 25. 9 storeID 100 0 6. 5 city, ite m ID 62,500 3. 10 city, category 25 ,0 00 45. 15 city 250. 4. 2 storeID , category,qua rter 400 ,00 0 30. 6 store ID , qu arter 4,000 20. 11 regio n, item ID 12,500 1. 16 regio n, category 5 ,0 00 30. 18 regio n, quarter 20 0 3. 1. 21 cate go ry 100. 5. 8 re gion, item ID , qua rter 50,000 2. 13 region , category, quarter 2 0,000 50. 17 item ID. 20 region 50. 7 city, catego ry, qu arter 1 00,000 30. 12 city, qua rter 1,000 1. 250. 3 city, item ID , quarter 25 0,000 1. 14 item ID , quarte r 1,000 2. 19 category, quarter 400 10. 22 quarte r 40. 4. 50. 23 all( ). Figure 3.2: Extended Generalized cube views lattices for pos.. . Cost Calculation: Due to the source changes are loaded into the warehouse at regular intervals, usually once a day. So we consider the size of views and frequency of queries on views is happen between two update operations. So we remove the factor which frequency of update operations on each view. Hence, the new cost calculation is: cos t = (∑ 4 N i + ∑ 4 N i + ∑ 3 N u )Trba i∈I. . d ∈D. (3.2). u∈U. Profit Calculation: The profit calculation is not change, it is like Eq. (3.3). profit k (v) = benefit k (v) − cos t (v). (3.3). 3.1.3. Extended Progressive View Materialization Algorithm (EPVMA) The algorithm of PVMA [8] assumes that (1) the selection of each materialized view.

(12) 12. is done independently; (2) there is no space constraint in the warehouse and (3) that the OLAP uses relational database systems (ROLAP). In our algorithm, we still retain the assumption (1) and (3). But we modify the assumption (2), we add the storage space constraint in our algorithm. Due to at the actual situation, storage space is limited. Thus, the space condition is needed to consider. In this paper, the first algorithm that we propose is called Extended Progressive View Materialization Algorithm (EPVMA). It is described in Figure 3.3. EPVMA takes the tactic that selects the largest profit view to materialize every time. It assumes that only has the top view (fact table) to be materialized initial. In EPVMA, calculating the views size is represented by C (x) . The parameter x is represent a view or the set of the views. Initial:. MVs = {v0} /* MVs :views which should be materialized ; v1 :The top view NR = φ /* NR :views which should not be materialized. cos t = (∑i∈I 4 N i + ∑ d∈D 4 N d + ∑u∈U 3 N u )Trba. S=. Total storage space. CandidateViews = φ /* CandidateV iews :views which can be selection to materialized) Algorithm: 01 WHILE C ( MVs) < S 02 BEGIN FOR all views v 03 BEGIN 04 IF ( (v ∉ MVs) & (v ∉ NR) & (C ( MVs) + C (v) < S ) ) THEN 05 06 07 08. Add. v. into CandidateV iews. 09. END IF End FOR IF CandidateViews. 10 11 12. Exit the while loop END IF FOR all views v in CandidateV iews. 13. BEGIN. = φ THEN. ∑. ( R ( NMPV (vc )) − R (v )) bf. 14. benefit k (v ) = ((. 15. profit k (v) = benefit k (v) − cos t (v). 16 17 18 19. vc ∈DNMPVC ( v ). IF ( benefit k (v) < 0 ) THEN Add v into NR Remove v from CandidateV iews END IF. ∑ fp ) + (. p∈child ( vc )∪vc. ( R ( NMPV (v)) − R (v)) bf. ∑ fp. p∈SNMPVC ( v )∪v. ))Trba.

(13) 13 END FOR Bview = φ. 20 21. /* Bview :a view with maximum profit. Bvalue = 0 /* Bvalue :profit of Bview 23 FOR all views v in CandidateV iews BEGIN 24 IF ( Bview = φ ) THEN 25 26 Bview = v Bvalue = profit k (v) 27 END IF 28 IF ( Bvalue < profit k (v) ) THEN 29 Bview = v 30 Bvalue = profit k (v) 31 END IF 32 END FOR 33 Add Bview to MVs 34 CandidateViews = φ 35 36 END of While Loop 22. Figure 3.3: EPVMA Algorithm.. 3.1.4. Reverse Progressive View Materialization Algorithm (RPVMA) In EPVMA, it calculate the benefit consider the sum of query frequencies on the views that may be materialized and its child views. This may be generated puffiness issue when we calculate the profit in order to selection the materialized view. In order to solve this issue, we address the Reverse Progressive View Materialization Algorithm (RPVMA). It is described in Figure 3.4. The difference between RPVMA and EPVMA is RPVMA assumes that all of the views are materialized initially. A heuristic is to select the smallest profit view to be removed every time. In RPVMA, NPV (MVs ) represent whether MVs has any view’s profit smaller than zero or not. The algorithm has two stages. First, it removes the smallest profit every time until NPV ( MVs ) ≠ True . Second, When MVs has not any view’s profit smaller than zero, but C ( MVs ) > S , it is continue remove the view that has the smallest profit until C ( MVs ) < S . Initinal:. MVs = {all views} /* MVs :views which should be materialized NR = φ /* NR :views which should not be materialized. cos t = (∑i∈I 4 N i + ∑d ∈D 4 N d + ∑u∈U 3N u )Trba. S = Total storage space.

(14) 14 CandidateViews = φ /* CandidateV iews :views which can be selection to remove from MVs ) Algorithm: WHILE NPV ( MVs) = True 01 BEGIN 02 FOR all views v 03 04 05 06 07 08 09 10. BEGIN IF ( (v ∈ MVs) & (v ∉ NR) ) THEN Add. Exit the while loop END IF. 12 13 14 15. Bview = φ. 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46. into CandidateViews. END IF End FOR IF MVs = {top view} THEN. 11. 16. v. /* Bview :a view with minimum profit Bvalue = ∞ /* Bvalue :profit of Bview FOR all views v in CandidateV iews BEGIN. ( R( NMPV (v)) − R(v)) ∑ p∈child (v)∪v f p )Trba bf profit k (v) = benefitk (v) − cos t (v). benefit k (v) = (. IF ( Bview = φ ) THEN. Bview = v ,. Bvalue = profit k (v) END IF IF ( Bvalue > profit k (v ) ) THEN. Bview = v. Bvalue = profit k (v) END IF END FOR Remove Bview from MVs Add Bview into NR. CandidateViews = φ. END of while loop WHILE C ( MVs) > S BEGIN FOR all views v BEGIN IF ( (v ∈ MVs) & (v ∉ NR) ) THEN Add v into CandidateViews END IF End FOR IF MVs = {top view} THEN Exit the while loop END IF Bview = φ /* Bview :a view with minimum profit. Bvalue = ∞ /* Bvalue :profit of Bview FOR all views v in CandidateV iews BEGIN ( R ( NMPV (v)) − R (v)) benefitk (v) = ( ∑ p∈child (v)∪v f p )Trba bf. 47. profit k (v) = benefitk (v) − cos t (v). 48. IF ( Bview = φ ) THEN.

(15) 15 49 50 51 52 53 54 55 56 57 58 59 60. Bview = v ,. Bvalue = profit k (v) END IF IF ( Bvalue > profit k (v ) ) THEN. Bview = v. Bvalue = profit k (v) END IF END FOR Remove Bview from MVs Add Bview into NR. CandidateViews = φ. END of while loop. Figure 3.4: Reverse EPVMA Algorithm. 3.1.5. Extended RPVMA (ERPVMA) Although RPVMA solves the puffiness issue, but its utility rate of storage space is not good. Because the algorithm is doing the while loop until there is not any view’s profit smaller than zero in MVs , the surplus storage space maybe enough to add another views that good for query. In order to make up for the drawback, we provide the algorithm that combines the RPVMA and EPVMA is called Extended Reverse Progressive View Materialization Algorithm (ERPVMA). It is described in Figure 3.7. The algorithm runs RPVMA first, and then utilizes the EPVMA.. 3.2. View Materialization 3.2.1. Building a Partially-Materialized Lattice When we process the ERPVMA after, we can get the partial views that will be materialized. Then we build the Partially-Materialized Lattice according to the result. About the notion of Partially-Materialized Lattice was described in Section 2.1. 3.2.2. View Materialization with Overlap method At the heart of all OLAP or multidimensional data analysis applications is the ability to simultaneously aggregate across many sets of dimensions. Computing multidimensional aggregates is a performance bottleneck for these applications. The.

(16) 16. Overlap Method [1] is a fast algorithm for computing a collection of group-bys. The method in [1] focus on a special case of the aggregation problem-computation of the CUBE operator. The CUBE operator requires computing group-bys on all possible combinations of a list of attributes, and is equivalent to the union of a number of standard group-by operations. In this paper, we select the Overlap Method to compute the Partially-Materialized Lattice. We use the partially materialized lattice as the input of the overlap method. About the steps that are how to run the Overlap Method was described in Section 2.3.. 4. Experiments and Results Analysis 4.1. Experiments for EPVMA and ERPVMA 4.1.1. Experiments Data The data of these experiments are base on motivation example. In our experiments, we used conditions shown as follows: Table 4.1: The execution conditions of experiments. Lattice Update operations Block size Record size (fact table) Trba. Figure 3.3 Table 4.2 4096 bytes 100 bytes 16(msec). Table 4.2: Description of update operation for experiments. Operation Insert I 1. Number of updated rows 3500. Delete D1. 2500. Update U 1. 4000. Running these algorithms before, we calculate the blocking factor (bf), cost, and the range of total storage space:.

(17) 17. bf = 4096 / 100 = 40 cos t = (4 × 3500 + 4 × 2500 + 3 × 4000) × 16 = 576000 0 ≤ total storage space ≤ 1283254. 4.1.2. Experiments Result The graph in Figure 4.1 shows the total query cost of the two algorithms. We can see the total query cost in ERPVMA is almost smaller than in EPVMA. But has an exception in increasing rate of storage space at 20%. Although, in the experiments result, it still has some exceptions. We can’t sure that ERPVMA is the best. But it is true that provide the better performance. So, in our opinion that is using the algorithm that we propose to select materialized views is better than others. And it provides user a new selection for materialized views selection. Table 4.3 makes some comparison with PVMA[8], EPVMA and ERPVMA.. Total query cost. Comparison of total query cost 160000000 150000000 140000000 130000000 120000000 110000000 100000000 90000000 80000000 70000000 60000000 50000000 40000000 30000000 20000000 10000000 0. ERPVMA EPVMA. 5. 10. 15. 20. 25. 30. 35. 40. 45. 50. 55. 60. 65. 70. 75. 80. 85. 90. 95 100. Size of storage space (% of total views size). Figure 4.1: Make a comparison between the size of storage space and total query cost..

(18) 18. Table 4.3: makes some comparison with PVMA, EPVMA and ERPVMA. PVMA. EPVMA. ERPVMA. Frequency of selection queries. Yes. Yes. Yes. Cost of update operations. Yes. Yes. Yes. Storage space condition. No. Yes. Yes. Puffiness question. Yes. Yes. No. Size of storage space ( % of total views size) ≥ 30%. ---. Total query cost Total query cost is is bigger than smaller than ERPVMA. EPVMA.. 4.2. Analysis of Partial View Materialization Approach The approach includes two phases. In the views selection phase, we use the ERPVMA algorithm, the advantages in ERPVMA as the mention in Section 4.2. In the views materialization phase, we use Overlap Method. Its advantages are described as follows. The Overlap Method is proposed in [1] is a sort-base overlap method. It includes the optimizations share-sorts, smallest-parent, cache-results, and amortize-scans. The Overlap Method computes the view that try to minimize the number of disk accesses by overlapping the computation of the various views. They make use of partially matching sort orders to reduce the number of sorting steps required. And the Overlap Method is a multi-pass method. In each pass, a set of views is selected for computing under memory constraints. These views are computed in an overlapped manner. The tuples generated for a view are used to compute its descendents in the DAG. This pipelining reduces the number of scans needed. The process is repeated until all views get computed.. 5. Conclusions In conclusions, the Extended Reverse Progressive View Materialization Algorithm.

(19) 19. (ERPVMA) considers the frequency of selection queries and update cost. It is far better in situations that involve storage space condition. And we also solve the puffiness issue that PVMA maybe happens. That let our algorithm is more perfect. And in the experiments result, it proves the ERPVMA has the better performance than other algorithms for materialized views selection. In the views materialization phase of the Partial View Materialization Approach, we use the Partially-Materialized Lattice as the input in the overlap method. Letting views selection and views materialization to combine to generate the Partial view materialization approach. The Overlap Method includes the optimizations Smallest-parent, Cache-results, Amortize-scans, and Share-sorts. It has presented one particular sorting based scheme called Overlap. This scheme overlaps the computation of different views and minimum the number of scans needed. The Partial View Materialization Approach includes the advantages of ERPVMA and Overlap Method, so the approach can let the performance of OLAP is optimization. It maybe selects queries that do not require scanning of views by instead using indexes, and in our paper, we do not consider the index question. In order to group by attributes further along the dimension hierarchy, the fact table must be joined with the dimension tables before doing the aggregation. But our approach does not deal with the factor. So, they are the factors that have to be added in our future work.. References [1] S. Agarwal, R. Agrawal, P. M. Deshpande, A. Gupta, J. F. Naughton, R. Ramakrishnan, and S. Sarawagi, “On the Computation of Multidimensional Aggregates”, Proceedings of the 22nd VLDB Conference, pp.506 - 521 Mumbai (Bombay), India, 1996..

(20) 20. [2] P. M. Deshpande, S. Agarwal, J. F. Naughton, and R. Ramakrishnan, “Computation of Multidimensional Aggregates”, Technical Reprot-1314, University of Wisconsin-Madison, 1996. [3] J. Han and M. Kamber, “Data Mining: Concepts and Techniques”, Morgan Kaufmann Publishers, 2001. [4] V. Harinarayan, A. Raharaman, and J. Ullman, “Implementing Data Cubes Efficiently”, Proceedings of ACM SIGMOD 1996 International Conference on. Management of Data, pp.205 - 216, Montreal, Canada, June 1996. [5] P. Hass, J. F. Naughton, S. Seshadri, L. Stokes, “Sampling-Based Estimation of the Number of Distinct Values of an Attribute”, In Proceedings of the 21st VLDB. Conference, pp.311 – 322, 1995. [6] I. S. Mumick, D. Quass, and B. S. Mumick, “Maintenance of Data Cubes and Summary tables in a Warehouse”, Proceedings of ACM SIGMOD 1997. International Conference on Management of Data, pp.100 - 111, Arizona, United States, 1997. [7] T. Teorey, “Data Modeling and Design”, Third edition, Morgan Kaufmann Publishers, Inc., 1999. [8] H. Uchiyama, K. Runapongsa, and T. J. Teorey, “A Progressive View Materialization Algorithm”, Proceedings of the second ACM international. workshop on Data warehouse and OLAP, pp.36 - 41, Kansas City, Missouri, United States, 1999..

(21)

數據

Figure 2.2: Hierarchies for the stores and items  Figure 2.3: Combining two hierarchical
Figure 2.4: Lattice of data cube [8].
Figure 2.5: Sort orders enforced on the views. Figure 2.6: Estimates of partition sizes
Figure 3.1 illustrated the Partial View Materialization Approach that we  proposed in this paper
+6

參考文獻

相關文件

In response to the changing needs of society, the rapid development of science and technology, the views of stakeholders collected through various surveys and

Teachers may consider the school’s aims and conditions or even the language environment to select the most appropriate approach according to students’ need and ability; or develop

Now, nearly all of the current flows through wire S since it has a much lower resistance than the light bulb. The light bulb does not glow because the current flowing through it

• Is the school able to make reference to different sources of assessment data and provide timely and effective feedback to students according to their performance in order

In practice, ρ is usually of order 10 for partial pivot

(3) Juelang Daosheng’s view of Zhuangzi as an esoteric Confucian and Confucius as the great systematizer of the three traditions stands in sharp contrast to the views of

• To consider the purpose of the task-based approach and the inductive approach in the learning and teaching of grammar at the secondary level.. • To take part in demonstrations

• In the present work, we confine our discussions to mass spectro metry-based proteomics, and to study design and data resources, tools and analysis in a research