XML-Relational資料庫系統之模型對應綱要設計

全文

(1)XPred: A New Model-Mapping-Schema-Based Approach for Real-Time Access to XML Data. Advisor: Prof. Jun Wu By: Shang-Yi Huang. A thesis submitted to the Department of Information Management of the National Pingtung Institute of Commerce in partial fulfillment of the requirements for the Degree of Master of Information Management. Pingtung, Taiwan, R.O.C. January 2008.

(2) Acknowledgements During the development of the thesis, I received a lot of assistance and encouragement from many people. First of all, I must thank my supervisor Prof. Jun Wu. During the composition of the thesis, he provided me with great suggestions and enlightening viewpoints. Although I met with many setbacks which made me dejected in the process of writing this thesis, he always encouraged me to complete my work. so, without his patient guidance, timely correction, and excellent traning, I could not finish the thesis. In particular, I wish to thank my committee members, Dr. Chin-Fu Kuo and Dr. YiMing Tai. Their professional corrections and suggestions made the thesis better. Additionally, I would like express my sincere gratitude to my classmates and my junior in the National Pingtung Institute of Commerce. They always help me whenever I had problems or difficulties. Finally, I wish to give my deep apprecitation to my family for their endless support and encouragement during these two years. Shang-Yi Huang Taiwan January 2008. I.

(3) Abstract Extensible markup language (XML) has become an active research topic in recent years. Many excellent model-mapping-schema-based approaches have been proposed to translate and manipulate XML documents in relational databases. However, most previous work has a potential performance problem for retrieving XML data from a relational database, because a large number of join operations are needed. In this thesis, we propose a new model-mappingschema-based approach, called XPred, to reduce significant join costs for processing various types of user queries. The basic idea is to store the structural information distributely into nodes to reduce the number of join operations when processing user queries. In particular, for every node in a given XML document, we store its predecessor’s information within itself. It can eliminate the join operation for parent-child traversing such that the performance of query processing can be improved. The capability of our proposed approach was verified by a series of simulation experiments based on the XMark [1, 2], for which we have some encouraging experimental results.. Keywords: XML, Relational Databases, Model Mapping Schema, Query Processing. II.

(4) 摘要近年來 Extensible Markup Language(XML)已成為非常熱門的研究主題，大部分的文件格式皆採用 XML 做為電子資料傳遞之標準。例如：近幾年政府極力推行的公文電子化與電子公文傳遞，皆以 XML 做為傳遞的標準。如何針對 XML 文件有效的儲存與存取顯得越來越重要。由於 XML 文件具有半結構化 (semi-structured) 之特性，因此早期使用者針對 XML 文件進行查詢時，需要將 XML 文件開啟之後再針對整份文件全文搜尋，查詢結果往往造成耗費大量的時間成本。為了因應大量 XML 文件之管理需求，許多研究針對 XML 資料的儲存及存取等相關議題提出各種有效的方法，例如 Edge [10]、 Monet [13]、 XRel [12] 與 XParent [11, 14] 等。但現有的各項方法在處理 XML 資料查詢時皆有 join 操作的次數過於頻繁的問題，有鑒於 join 操作的高成本容易導致整體查詢效能降低，所以本論文提出一個新的模式對應方法(稱為 XPred )以降低查詢時的 join 次數。本論文亦採用 XMark [1, 2] 檢驗本研究所提出之方法的效能，而實驗結果顯示其優於其它的模式對應方法。關鍵詞：XML、關聯式資料庫、模式對應與查詢處理. III.

(5) Contents. 1 Introduction. 1. 2 Related Work and Motivation. 4. 2.1. XML Data Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 4. 2.2. Model-Mapping Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . .. 6. 2.3. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 9. 3 XPred - Translation and Manipulation of XML Documents. 12. 3.1. XPred Schema. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 12. 3.2. XML Data Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 16. 3.3. XML Data Manipulations . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 19. 3.3.1. XML Query Processing . . . . . . . . . . . . . . . . . . . . . . . . . .. 19. 3.3.2. XML Data Insertion and Deletion . . . . . . . . . . . . . . . . . . . .. 21. 4 Performance Evaluation. 23. 4.1. Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 23. 4.2. Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 24. 4.3. Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 25. 4.3.1. Data Bulk Load . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 25. 4.3.2. Query processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 28. IV.

(6) 5 Conclusion. 36. A Raw Data of XML Data Loading. 39. B Raw Data of XML Data Querying. 45. C SQL Commands of Query Templates. 59. V.

(7) List of Figures 1.1. An example XML document. . . . . . . . . . . . . . . . . . . . . . . . . . . .. 2. 2.1. The corresponding XML data graph of the XML document in Figure 1.1. . .. 5. 2.2. The four tables of the XML data graph in Figure 2.1 under the XParent schema.. 8. 2.3. An example XQuery command. . . . . . . . . . . . . . . . . . . . . . . . . .. 9. 2.4. The corresponding SQL command of the XQuery command in Figure 2.3 under the XParent schema. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 10. 3.1. The three tables of the XML data graph in Figure 2.1 under the XPred schema. 13. 3.2. The corresponding SQL command of the XQuery command in Figure 2.3 under the XPred schema. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 15. 3.3. An example SQL command under the XParent and XPred schema. . . . . .. 19. 3.4. An example XQuery command in Figure 2.1 and its corresponding SQL command. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 19. 4.1. The database description of an XML document generated by the XMark. . .. 24. 4.2. (a) The average ResponseT ime, (b) the average CP UT ime, (c) the average I/OT ime, and (d) the detailed results of translating and storing data under the XParent schema and the XPred schema, when DataSize were 1MB, 5MB, 10MB, 15MB , 20MB, and 25MB. . . . . . . . . . . . . . . . . . . . . . . . .. 4.3. 27. (a) ResponseT ime, (b) CP UT ime, (c) NumP hysicalIO, and (d) NumLogicalIO of query processing under the XParent schema and the XPred schema, when DataSize was 1MB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. VI. 30.

(8) 4.4. (a) ResponseT ime, (b) CP UT ime, (c) NumP hysicalIO, and (d) NumLogicalIO of query processing under the XParent schema and the XPred schema, when DataSize was 5MB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 4.5. 31. (a) ResponseT ime, (b) CP UT ime, (c) NumP hysicalIO, and (d) NumLogicalIO of query processing under the XParent schema and the XPred schema, when DataSize was 10MB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 4.6. 32. (a) ResponseT ime, (b) CP UT ime, (c) NumP hysicalIO, and (d) NumLogicalIO of query processing under the XParent schema and the XPred schema, when DataSize was 15MB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 4.7. 33. (a) ResponseT ime, (b) CP UT ime, (c) NumP hysicalIO, and (d) NumLogicalIO of query processing under the XParent schema and the XPred schema, when DataSize was 20MB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 4.8. 34. (a) ResponseT ime, (b) CP UT ime, (c) NumP hysicalIO, and (d) NumLogicalIO of query processing under the XParent schema and the XPred schema, when DataSize was 25MB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 35. A.1 The result of experiments detail for translating and storing XML data under (a) the XParent schema and (b) the XPred schema, when DataSize was 1MB. 39 A.2 The result of experiments detail for translating and storing XML data under (a) the XParent schema and (b) the XPred schema, when DataSize was 5MB. 40 A.3 The result of experiments detail for translating and storing XML data under (a) the XParent schema and (b) the XPred schema, when DataSize was 10MB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 41. A.4 The result of experiments detail for translating and storing XML data under (a) the XParent schema and (b) the XPred schema, when DataSize was 15MB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 42. A.5 The result of experiments detail for translating and storing XML data under (a) the XParent schema and (b) the XPred schema, when DataSize was 20MB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. VII. 43.

(9) A.6 The result of experiments detail for translating and storing XML data under (a) the XParent schema and (b) the XPred schema, when DataSize was 25MB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 44. B.1 The result of experiments detail for query template Q1 under the XParent schema. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 45. B.2 The result of experiments detail for query template Q1 under the XPred schema. 46 B.3 The result of experiments detail for query template Q2 under the XParent schema. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 47. B.4 The result of experiments detail for query template Q2 under the XPred schema. 48 B.5 The result of experiments detail for query template Q3 under the XParent schema. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 49. B.6 The result of experiments detail for query template Q3 under the XPred schema. 50 B.7 The result of experiments detail for query template Q4 under the XParent schema. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 51. B.8 The result of experiments detail for query template Q4 under the XPred schema. 52 B.9 The result of experiments detail for query template Q5 under the XParent schema. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 53. B.10 The result of experiments detail for query template Q5 under the XPred schema. 54 B.11 The result of experiments detail for query template Q6 under the XParent schema. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 55. B.12 The result of experiments detail for query template Q6 under the XPred schema. 56 B.13 The result of experiments detail for query template Q7 under the XParent schema. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 57. B.14 The result of experiments detail for query template Q7 under the XPred schema. 58 C.1 (a) The SQL command of query template Q1 under the XParent schema. (b) The SQL command of query template Q1 under the XPred schema. . . . . .. VIII. 59.

(10) C.2 (a) The SQL command of query template Q2 under the XParent schema. (b) The SQL command of query template Q2 under the XPred schema. . . . . .. 60. C.3 (a) The SQL command of query template Q3 under the XParent schema. (b) The SQL command of query template Q3 under the XPred schema. . . . . .. 61. C.4 (a) The SQL command of query template Q4 under the XParent schema. . .. 62. C.5 (b) The SQL command of query template Q4 under the XPred schema. . . .. 63. C.6 (a) The SQL command of query template Q5 under the XParent schema. (b) The SQL command of query template Q5 under the XPred schema. . . . . .. 64. C.7 (a) The SQL command of query template Q6 under the XParent schema. (b) The SQL command of query template Q6 under the XPred schema. . . . . .. 65. C.8 (a) The SQL command of query template Q7 under the XParent schema. (b) The SQL command of query template Q7 under the XPred schema. . . . . .. IX. 66.

(11) Chapter 1 Introduction Extensible Markup Language (XML)[3] is widely accepted as a universal standard for information exchange. Figure 1.1 is an example XML document which is a bibliography of videos and movie stars. When there are more and more data represented as XML documents, the need of storing them persistently in a database has increased rapidly. In recent years, with the popularity of relational database systems (RDBS), many researchers and vendors (including Oracle, Sybase, IBM, and Microsoft) have proposed their approaches based on RDBS[4, 5, 6, 7, 8, 9, 10, 11, 12] to store and manipulate XML data as relational tables. These RDBS-based approaches need schema definitions to translate XML documents into relational tables. Such schema definitions can be classified into two categories[12]: structuremapping schema and model-mapping schema. Under the structure-mapping schema[4, 5, 6, 7, 8, 9], an XML document is translated into relational tables based on its structure (i.e., document type descriptor (DTD)). As a result, XML documents might has different schemas. It creates additional complexity for managing different structured XML documents with different logical and physical designs of RDBS. On the contrary, the model-mapping schema[10, 11, 12] provides the same set of schema definitions to translate different structured XML documents into relational tables. This approach can support any sophisticated applications and well-formed XML documents, even though they do not have any DTD’s. The Edge[10], Monet[13], XRel[12] and XParent[11, 14] are example schemas based on the model-mapping schema. In particular, Jiang et al.[11, 14] shows that the performance of the XParent schema is better than other model-mapping schemas.. 1.

(12) <Bib> <Video Year="2007"> <Title>I Am Legend</Title> <Cast_ID>1</Cast_ID> <Director>Francis Lawrence</Director> <Length>1hrs. 40min.</Length> </Video> <Video Year="2007"> <Title>Alvin and the Chipmunks</Title> <Cast_ID>2</Cast_ID> <Director>Tim Hill</Director> <Length>1hrs. 32min.</Length> </Video> <Video Year="2007"> <Title>The Perfect Holiday</Title> <Cast_ID>3</Cast_ID> <Director>Lance Rivera</Director> <Length>1hrs. 36min.</Length> </Video> <Movie_star SID="1"> <Name>Will Smith</Name> <Gender>male</Gender> <Birthday>25 September 1968</Birthday> <Awards>31 wins, 58 nominations</Awards> </Movie_star> <Movie_star SID="2"> <Name>Jason Lee</Name> <Gender>male</Gender> <Birthday>25 April 1970</Birthday> <Awards>2 wins, 11 nominations</Awards> </Movie_star> <Movie_star SID="3"> <Name>Gabrielle Union</Name> <Gender>female</Gender> <Birthday>29 October 1972</Birthday> <Awards>5 wins, 11 nominations</Awards> </Movie_star> </Bib> Figure 1.1: An example XML document.. 2.

(13) This thesis explores the performance issues of XML data access over relational databases. It is motivated by the potential performance problems of model-mapping-schema-based approaches, due to large number of join operations are needed for processing user queries. In this thesis, we propose a new model-mapping-schema-based approach, called XPred, to reduce significant join costs for processing various types of user queries. The basic idea is to store the structural information distributely into nodes to reduce the number of join operations when processing user queries. In particular, for every node in a given XML document, we store its predecessor’s information within itself. It can eliminate the join operation for parent-child traversing such that the performance of query processing can be improved. The major contributions of this thesis were two fold: (1) Generalization: XPred can support any sophisticated applications, and every XML documents have the same set of relation schemas. (2) Efficiency: It greatly reduces the number of join operations that are needed for processing user queries. Furthermore, it also reduces the number of logical and physical I/O activities. The capability of our proposed approach was verified by a series of simulation experiments based on the XMark[1, 2], for which we have some encouraging experimental results. The rest of this thesis is organized as follows: Chapter 2 summarizes related work on model-mapping schema and discusses the potential performance problem of approaches based on model-mapping schemas, due to significant join costs. Such a problem motivates this research work. Chapter 3 presents a new model-mapping schema, called XPred. It also provides the methodologies of data manipulations of XML data which are stored in relational databases. Chapter 4 reports our results from a series of simulation experiments. Chapter 5 is the conclusion and future work.. 3.

(14) Chapter 2 Related Work and Motivation In this chapter, a common representation of the structure of an XML document and related model-mapping schemas are presented. We also provide examples to illustrate the major drawback of model-mapping schemas, which is a large number of join operations are needed to process user queries. Such a drawback motivates this research.. 2.1. XML Data Graph. Graph representation is the most common way to express and to analysis the structure of an XML document. Based on the well-known XPath[15] data model, Jiang, et al.[11] proposed a graph representation, called XML data graph, to represent the hierarchical structure of elements or attributes in an XML document. In particular, the XML data graph is a data model that models an XML document as an ordered tree. XML data graph has four types of nodes: element, attribute, text and IDREF. Figure 2.1 shows an example XML data graph of the XML document shown in Figure 1.1. Element nodes, attribute nodes, text nodes and IDREF nodes are represented as circles, triangles, rectangles and octangle, respectively. In Figure 2.1, the root node is a virtual node pointing to the root element of an XML document. Each node represents either an element or an attribute, and each edge represents the relationship between two nodes in the structure of an XML document. The label of an edge is the name of the corresponding node pointed by the edge. A element node represents element in an XML document, it can has arbitrary number of element nodes or text nodes as its children nodes.. 4.

(15) 3. ar Ye. t_ Cas. ID. 6. 7. 9. le. Tit. 11. 12. D Lengt ir h ec to r 13 15. 16. ar Ye. o. le Tit. Vide. 14. 17. eo id. 18. D Lengt ir h ec to r. V. 19 21. 1. e. ta. am. ie _s. N. ov. 22. SID. M r. er. 23. Ge nd. 20. Aw. 24. ard s. M ov ie_st. 25. ar. 27. 28. SID. am. e. ie_st. N. M ov ar. er 29. Ge nd. le. Tit. 10. ar Ye. 8. o. Bib. 26. Aw ar. ds. 30. 31. 33. 34. SI D. am N. e. er nd. 5 t_ID. 5. D Lengt ir h ec to r. Cas _ID. 4. 2. Vide. Cast. 35. Ge. 0. 32. Text IDREF. T R. 36. 37. Attribute A. Aw a rd s. Element. E. y da rth Bi. y da rth Bi. y da rth Bi. Figure 2.1: The corresponding XML data graph of the XML document in Figure 1.1..

(16) A text node or an element node can only have one parent node. Text nodes and attribute nodes are leaf nodes with values. The IDREF nodes are used to represent intra-document references.. 2.2. Model-Mapping Approaches. The model-mapping schema is a reltional database schema which maps XML data model to relational data model. It can support any sophisticated applications and different structured XML data with the same set of schema definitions. Many researchers have proposed modelmapping-schema-based approaches[10, 8, 11, 14] for providing efficient access to XML data. Three well-known model-mapping schemas are often referenced in the research of XML storage systems: Edge[10], Monet[8], and XParent[11, 14]. Furthermore Jiang, et al. [11] shows that the XParent schema outperforms other model-mapping schemas (including the Edge schema and the Monet schema). Therefore, we only introduce the XParent schema. The XParent schema stores the node information of the XML data graph of an XML document into four tables: LabelPath(ID,Len,Path) DataPath(Pid,Cid) Element(PathID,Did,Ordinal) Data(PathID,Did,Ordinal,Value) Table LabelPath stores the information of label-paths of an XML data graph. The attributes ID and Len denote the unique ID and the length (the number of edges in the label-path) of each label-path, respectively. The attribute Path denotes the name of the corresponding label-path which is a sequence of node names in the label-path. Table DataPath stores the information of parent-child relationships of an XML data graph. The attributes Pid and Cid denote the node number of the corresponding parent node and child node of an edge, respectively. Table Data stores the information of the nodes of the XML data graph if the corresponding elements or attributes in an XML document have a value. The attributes PathID and Did denote the ID of label-path (i.e., the foreign key of the ID in Table LabelPath) and the node number of the node, respectively. The attribute Ordinal denotes the ordinal. 6.

(17) number of the node among its sibling-nodes with the same name. The attribute Value denotes the value of the node, where the value of a node is the value of the corresponding element or attribute in the XML document. Table Element is very similar with Table Data, except that Table Element stores the information of all nodes in an XML data graph, and Table Element does not store the value of each node. Figure 2.2 shows a database instance of Figure 2.1 under the XParent schema. In the next section, we shall use an example to illustrate the major drawback of the XParent schema which motivates this research.. 7.

(18) Did 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37. ID Len Path 1 1 ./Bib 2 2 ./Bib/Video 3 3 ./Bib/Video/@Year 4 3 ./Bib/Video/Title 5 3 ./Bib/Video/Cast_ID 6 3 ./Bib/Video/Director 7 3 ./Bib/Video/Length 8 2 ./Bib/Movie_star 9 3 ./Bib/Movie_star/@SID 10 3 ./Bib/Movie_star/Name 11 3 ./Bib/Movie_star/Gender 12 3 ./Bib/Movie_star/Birthday 13 3 ./Bib/Movie_star/Awards Table LabelPath. Did PathID Ordinal Value 3 3 1 2007 4 4 1 I Am Legend 5 5 1 1 6 6 1 Francis Lawrence 7 7 1 1hrs. 40min. 9 3 1 2007 10 4 1 Alvin and the Chipmunks 11 5 1 2 12 6 1 Tim Hill 13 7 1 1hrs. 32min. 15 3 1 2007 16 4 1 The Perfect Holiday 17 5 1 3 18 6 1 Lance Rivera 19 7 1 1hrs. 36min. 21 9 1 1 22 10 1 Will Smith 23 11 1 male 24 12 1 25-Sep-68 25 13 1 31 wins,58 nominations 27 9 1 2 28 10 1 Jason Lee 29 11 1 male 30 12 1 25-Apr-70 31 13 1 2 wins,11 nominations 33 9 1 3 34 10 1 Gabrielle Union 35 11 1 female 36 12 1 29-Oct-72 37 13 1 5 wins,11 nominations Table Data. PathID Ordinal 1 1 2 1 3 1 4 1 5 1 6 1 7 1 2 2 3 1 4 1 5 1 6 1 7 1 2 3 3 1 4 1 5 1 6 1 7 1 8 1 9 1 10 1 11 1 12 1 13 1 8 2 9 1 10 1 11 1 12 1 13 1 8 3 9 1 10 1 11 1 12 1 13 1 Table Element. Pid Cid 1 2 2 3 2 4 2 5 2 6 2 7 1 8 8 9 8 10 8 11 8 12 8 13 1 14 14 15 14 16 14 17 14 18 14 19 1 20 20 21 20 22 20 23 20 24 20 25 1 26 26 27 26 28 26 29 26 30 26 31 1 32 32 33 32 34 32 35 32 36 32 37 Table DataPath. Figure 2.2: The four tables of the XML data graph in Figure 2.1 under the XParent schema.. 8.

(19) 2.3. Motivation. In this thesis, a user query for manipulate XML data is represented in the FLWOR expression which is one of the well-known XQuery[16] representation. The FLWOR expression is very popular because it similar to SQL commands. The FLWOR expressions are often useful for executing joins between two or more XML documents and for restructuring the result of a query. The FLWOR expressions consist of five clauses: FOR, LET, WHERE, ORDER BY and RETURN clauses. The FOR clause introduces a variable (with $ prefix) with its path and the specific filename of the XML document. The LET clause can introduce additional variables (also with $ prefix). The variable introduced by a FOR or LET clause means a set of nodes with specific path in the XML document. In other words, the FOR and LET clauses in a FLWOR expression generate an sequence of tuples of bound variables, called the tuple stream. The WHERE clause can specify a conditional expression that is evaluated once for each of these variables binding nodes. The ORDER BY clause is used to sort the tuple stream , and the RETURN clause is the format of results. When a model-mapping schema (e.g., XParent schema) is adopted to store XML data, the FLWOR expressions in the XQuery commands must be translated into SQL commands to manipulate data in a relational database. The following example shows a user query in XQuery and its corresponding SQL commands when the XParent schema is adopted.. XQuery FOR $x IN document("video.xml")/Bib LET $y=$x/Movie_star WHERE $y/Name="Will Smith" AND $y/@SID=$x/Video/Cast_ID AND $x/Video/@Year="2007" ORDER BY $x/Video/Title RETURN $x/Video/Title AND $x/Video/Director. Figure 2.3: An example XQuery command.. Example 1 An example XQuery command and its corresponding SQL command under the XParent schema. Figure 2.3 shows an XQuery command for retrieving data from the XML document showed in Figure 1.1. The XQuery command is to find the title and director of videos 9.

(20) which are acted by Will Smith and published in 2007. Figure 2.4 shows the corresponding SQL command under the XParent schema. Such a SQL command has 17 equijoins and 8 selections which is a heavy cost query. XParent-SQL. SELECT D4.Value, D5.Value FROM LabelPath LP1, Data D1,DataPath DP1, DataPath DP2, LabelPath LP2, Data D2, LabelPath LP3, Data D3, DataPath DP3, DataPath DP4, LabelPath LP4, Data D4, DataPath DP5, LabelPath LP5, Data D5, DataPath DP6, LabelPath LP6, Data D6 WHERE LP1.ID=D1.PathID AND LP1.Path='./Bib/Movie_star/Name' AND D1.Value='Will Smith' AND D1.Did=DP1.Cid AND DP1.Pid=DP2.Pid AND DP2.Cid=D2.Did AND LP2.ID=D2.PathID AND LP2.Path='./Bib/Movie_star/@SID' AND LP3.ID=D3.PathID AND LP3.Path='./Bib/Video/@Year' AND D3.Value='2007' AND D3.Did=DP3.Cid AND DP3.Pid=DP4.Pid AND DP4.Cid=D4.Did AND LP4.ID=D4.PathID AND LP4.Path='./Bib/Video/Title' AND DP4.Pid=DP5.Pid AND DP5.Cid=D5.Did AND LP5.ID=D5.PathID AND LP5.Path='./Bib/Video/Director' AND DP5.Pid=DP6.Pid AND DP6.Cid=D6.Did AND LP6.ID=D6.PathID AND LP6.Path='./Bib/Video/Cast_ID' AND D2.Value=D6.Value ORDER BY D4.Value ASC. Figure 2.4: The corresponding SQL command of the XQuery command in Figure 2.3 under the XParent schema.. 10.

(21) It is clear that the XParent schema is very simple and easy to implement. However, it needs a large number of join operations to complete a user query. Particularlly, it needs to locate all of the nodes that correspond to elements or attributes that appear in a query. Such a job needs a large number of join operations to verify the relationships (e.g., the parent-child relationships) between nodes. Notice that when user queries become more complex than that in Example 1, the more number of join operations are needed to complete the user queries. Because join operations are very costly, the number of joins for a query is proportional to the processing time of the query. Such an observation motivates this research. In this thesis, we shall propose a new model-mapping schema based on a trade-space-for-time strategy to reduce significant join costs for processing various types of user queries, such that better performance can be achieved.. 11.

(22) Chapter 3 XPred - Translation and Manipulation of XML Documents In this chapter, we will propose a new model-mapping schema, called XPred, to reduce a potential large number of join operations when processing XML queries. Algorithms to translate and manipulate XML documents over relational database systems are also provided.. 3.1. XPred Schema. The XPred schema is a model-mapping schema which can support any sophisticated application. The rationale behind our proposed XPred schema is to store the structural information distributely into nodes to reduce the number of join operations when processing user queries. In particular, for every node in a given XML document, we store its predecessor’s information within itself. It can eliminate the join operation for parent-child traversing such that the performance of query processing can be improved. In other words, the XPred schema reduces a potential larger number of join operations for processing user queries. The database schema of XPred is as follows: Path(PathID, Length, LabelPath) Node(NodeID, PathID, Ordinal, PredID) Data(NodeID, PathID, Ordinal, PredID, Value). 12.

(23) PathID Length LabelPath 1 1 ./Bib 2 2 ./Bib/Video 3 3 ./Bib/Video/@Year 4 3 ./Bib/Video/Title 5 3 ./Bib/Video/Cast_ID 6 3 ./Bib/Video/Director 7 3 ./Bib/Video/Length 8 2 ./Bib/Movie_star 9 3 ./Bib/Movie_star/@SID 10 3 ./Bib/Movie_star/Name 11 3 ./Bib/Movie_star/Gender 12 3 ./Bib/Movie_star/Birthday 13 3 ./Bib/Movie_star/Awards Table Path. NodeID 1 2 8 14 20 26 32. PathID Ordinal PredID 1 1 2 1 1 2 2 1 2 3 1 8 1 1 8 2 1 8 3 1 Table Node. NodeID PathID Ordinal PredID Value 3 3 1 2 2007 4 4 1 2 I Am Legend 5 5 1 2 1 6 6 1 2 Francis Lawrence 7 7 1 2 1hrs. 40min. 9 3 1 8 2007 10 4 1 8 Alvin and the Chipmunks 11 5 1 8 2 12 6 1 8 Tim Hill 13 7 1 8 1hrs. 32min. 15 3 1 14 2007 16 4 1 14 The Perfect Holiday 17 5 1 14 3 18 6 1 14 Lance Rivera 19 7 1 14 1hrs. 36min. 21 9 1 20 1 22 10 1 20 Will Smith 23 11 1 20 male 24 12 1 20 25-Sep-68 25 13 1 20 31 wins,58 nominations 27 9 1 26 2 28 10 1 26 Jason Lee 29 11 1 26 male 30 12 1 26 25-Apr-70 31 13 1 26 2 wins,11 nominations 33 9 1 32 3 34 10 1 32 Gabrielle Union 35 11 1 32 female 36 12 1 32 29-Oct-72 37 13 1 32 5 wins,11 nominations Table Data. Figure 3.1: The three tables of the XML data graph in Figure 2.1 under the XPred schema.. Table Path stores the information of label-paths of an XML data graph, where the labelpath of a node is a sequence of node names from the root to the node. The attribute PathID denotes the unique ID of each label-path. The attribute LabelPath denotes the name of the corresponding label-path which is a sequence of node names in the label-path. The attribute Length denotes the length of the corresponding label-path which is calculated from root to it. Table Node stores the information of elements and attributes of an XML document. The attribute NodeID denotes an unique ID of each node. The attribute PathID denotes the ID of the label-path (i.e., the foreign key of the PathID in Table Path) of each node. The attribute ordinal denotes the ordinal number of the corresponding node among nodes with the same name and connected to the same source node. Finally, the attribute PredID is the NodeID of its predecessor node (i.e., its parent node). Table Data is the same as the Table Node, except there is an additional attribute Value to store the value of a node. Where the value of a node 13.

(24) is the value of the corresponding element or attribute in the XML document. Figure 3.1 shows the corresponding three tables of the XML document in Figure 2.1 under the XPred schema. The design of our proposed XPred schema is motivated by a potential performance problem of approaches based on model-mapping schemas. Typically, a user query must perform many join operations to locate parent-child relationships among an XML document stored in a relational database, as shown in Example 1. Such a problem has significant impact on performance when processing user queries, due to the cost of join operations are heavy. In this thesis, a trad-space-for-time strategy is adopted to reduce the number of join operations for processing a user query. Particularly, we add an attribute PredID in Tables Node and Data to provide a direct reference to their immediate predecessor nodes (i.e., parent nodes). Such an approach is easy to find elements or attributes that have the same predecessor node in an XML document. Because the cost of join operations are heavy, by reducing the number of join operations could significantly improvement on the performance of an XML storage system. Example 2 showed that the number of join operations is reduced when XPred schema is adopted.. 14.

(25) XPred-SQL SELECT D4.Value, D5.Value FROM Path P1, Data D1, Data D2, Path P2, Path P3, Data D3, Data D4, Path P4, Data D5, Path P5, Data D6, Path P6 WHERE P1.PathID=D1.PathID AND P1.LabelPath='./Bib/Movie_star/Name' AND D1.Value='Will Smith' AND D1.PredID=D2.PredID AND P2.PathID=D2.PathID AND P2.LabelPath='./Bib/Movie_star/@SID' AND P3.PathID=D3.PathID AND P3.LabelPath='./Bib/Video/@Year' AND D3.Value='2007' AND D3.PredID=D4.PredID AND P4.PathID=D4.PathID AND P4.LabelPath='./Bib/Video/Title' AND D4.PredID=D5.PredID AND P5.PathID=D5.PathID AND P5.LabelPath='./Bib/Video/Director' AND D5.PredID=D6.PredID AND P6.PathID=D6.PathID AND P6.LabelPath='./Bib/Video/Cast_ID' AND D2.Value=D6.Value ORDER BY D4.Value ASC. Figure 3.2: The corresponding SQL command of the XQuery command in Figure 2.3 under the XPred schema.. Example 2 A corresponding SQL command of the XQuery command in Figure 2.3 under the XPred schema. Figure 3.2 shows a SQL command corresponding to the XQuery command in Firgure 2.3 under the XPred schema. When the XPred is adopted, there are 11 equijoins and 8 selections. Compare to that in Example 1, it greatly reduce the number of join operations from 17 to 11 (i.e., 35.2%). As you can see that the XPred schema can greatly reduce the number of join operations when processing user queries. This is because the XPred stores an addition information into nodes: its predecessor node’s ID. It simplifies the cost of searching a node’s predecessor. 15.

(26) node which is common in many model-mapping-schema-based approaches. In the following sections, we will provide an algorithm for translating an XML documents into a relational database under XPred schema. Related algorithms for manipulating XML data are also provided.. 3.2. XML Data Translation. In order to translate XML documents into a relational database (using the XPred schema for database schema). Our approach is to adopt an ordinary DOM parser (e.g., Xerces) to parse XML documents, and translate each node (i.e., element or attribute) of the DOM tree into records of tables in a relational database. Because DOM provides us with an object model that can model any XML document (regardless of how it is structured), it gives us an easy way to access the content of XML documents. We must point out that a similar approach for translating XML documents could be done by using a SAX paper. The proposed translation algorithm is called XML2RDB. For a given XML document, we construct a DOM tree for it through an ordinary DOM parser. We then visit each node of the DOM tree to generate and execute a sequence of SQL commands (e.g., the CREATE TABLES and the INSERT commands) to store data in a relational database. The proposed translation algorithm XML2RDB is as follows: Algorithm XML2RDB Input: an XML document Output: generate and execute a series of SQL commands begin 1:. P athID := 0. 2:. NodeID := 0. 3:. DOMT ree := the DOM tree of the input XML document. 4:. generate and execute SQL commands that create the Table Path, the Table Node and the Table Data. 5:. for each nodei ∈ DOMT ree do. 6:. NodeT ypei :=GetNodeT ype(nodei ). 7:. NodeNamei := the name of nodei 16.

(27) 8:. LabelP athi := LabelP athP arent(nodei ) + ′ /′ +NodeNamei. 9:. Lengthi := count the number of the symbols of oblique line in the LabelP athi. 10: 11:. if labelP athi not exists in the Table Path then generate and execute a SQL command that insert a tuple(P athID++,Lengthi ,LabelP athi ) into the Table Path. 12:. end if. 13:. P athIDi := GetP athID(LableP athi ). 14:. NodeIDi := NodeID++. 15:. P redIDi := GetP arentID(nodei ). 16:. Ordinal(nodei ) :=1. 17:. if P athIDi is corresponding with PathID of SiblingNodei then. 18:. Ordinal(nodei ) := Ordinal(SiblingNodei ) +1. 19:. end if. 20:. if NodeT ype is an element type then. 21:. generate and execute a SQL command that insert a tuple (NodeIDi , P athIDi ,Ordinal(nodei ),P redIDi) into the Table Node. 22:. end if. 23:. if NodeT ype is an Text type then. 24:. NodeV aluei := the value of nodei. 25:. generate and execute a SQL command that insert a tuple (NodeIDi , P athIDi ,Ordinal(nodei ),P redIDi,NodeV aluei ) into the Table Data. 26:. end if. 27:. for each attributej ∈ nodei do. 28:. AttributeNamej := the name of attributej. 29:. LabelP athj := LabelP athP arent(nodei ) + ′ /@′ + AttributeNamej. 30:. Lengthj := count the number of the symbols of oblique line in the LabelP athi. 31:. if LabelP athj not exists in the Table Path then. 32:. generate and execute a SQL command that insert a tuple (P athID++ ,Lengthj , LabelP athj ) into the Table P ath. 33:. end if. 34:. P athIDj := GetP athID( LabelP athj ). 35:. NodeIDj := NodeID++ 17.

(28) 36:. P redID(j) := GetP arentID(nodej ). 37:. Ordinal(nodej ) :=1. 38:. if P athIDj is corresponding with PathID of SiblingAttributeNodej then Ordinal(nodej ) := Ordinal(SiblingAttributeNodej ) +1. 39: 40:. end if. 41:. NodeV aluej := the value of attributej. 42:. generate and execute a SQL command that insert a tuple (NodeIDj ,P athIDj ,Ordinal(nodej ),P redIDj ,NodeV aluej ) into the Table Data end for. 43: 44:. end for. end. Let n be the number of nodes (i.e., element, attribute, text and IDREF) in an XML document. The time complexity of XML2RDB is O(n). Figure 3.1 shows a case for the storage of the XML data graph, shown in Figure 1.1. As shown in the algorithm of XML2RDB, a DOM tree is constructed through DOM parser processing the XML document first after we input it. Then we generate and execute SQL commands that create the Table Path, the Table Node and the Table Data in RDBS. A method of visiting an all DOM tree that we use is called depth first search (DFS). New path found should be inserted into the Table Path in the process of visiting an all DOM tree. We must get a type of the node while visiting each nodes in the DOM tree. If a type of the node that we visited is an element type, we must insert a tuple that included NodeID, PathID, Ordinal and PredID into the Table Node. If this type of the node visited is a attribute type or text type, this tuple which included NodeID, PathID, Ordinal, PredID and Value is inserted into the Table Data. With the proposed algorithm, XPred stores additionally an ID of the predecessor node for each node. The additional storage will take an advantage for searching the sibling nodes that have the same predecessor node. As shown in Figure 2.4 and Figure 3.2, XParent needs at least two join operations for searching the sibling nodes but XPred only needs one join operations. For example we want to search the Value of the sibling node of the node whose NodeID is equalled six in Figure 2.1. According to Figure 3.3(a) and Figure 3.3(b), we know that XPred need less join operations than XParent for searching the sibling nodes.. 18.

(29) XParent-SQL. XPred-SQL. SELECT D1.Value FROM DataPath DP1, DataPath DP2, Data D1 WHERE DP1.Cid=6 And DP1.Pid=DP2.Pid And DP2.Cid=D1.Did. SELECT D1.Value FROM Data D1,Data D2 WHERE D1.NodID=6 And D1.PredID=D2.PredID. ( a). ( b). Figure 3.3: An example SQL command under the XParent and XPred schema.. 3.3. XML Data Manipulations. Typically, data stored in database systems is often manipulated (including insertion, deletion, updating and querying) by users. In this section, we will provide a discussion of how to perform XML data manipulations under the XPred schema. There are some query language that can be used to manipulate XML data, such as XQuery[16] and XPath[15]. We only focus on the parsing and processing of XQuery commands, because its popularity. When an XQuery command is requested by a user, it must be translated into corresponding SQL commands. It is because we store XML data in a relational database.. 3.3.1. XML Query Processing. We use the following example to explain how an XQuery command is translated into corresponding SQL commands. XQuery. XPred-SQL. FOR $x IN document("video.xml")/Bib/Video LET $y=$x/Director WHERE $y="Lance Rivera" ORDER BY $x/Title RETURN $x/Title. SELECT D2.Value FROM Path P1, Data D1, Data D2, Path P2 WHERE D1.PathID=P1.PathID AND P1.Path='./Bib/Video/Director' AND D1.Value='Lance Rivera' AND D1.PredID=D2.PredID AND D2.PathID=P2.PathID AND P2.Path='./Bib/Video/Title' ORDER BY D2.Value ASC. (a). (b). Figure 3.4: An example XQuery command in Figure 2.1 and its corresponding SQL command.. 19.

(30) Example 3 Query Translation(from XQuery to SQL commands) Consider the XML document in Figure 1.1, Figure 3.4(a) is the XQuery command that gets the title of video which is directed by Lance Rivera. In order to process this query in a relational database, we must translate the XQuery command into a corresponding SQL cmmand. Two label-paths are involved in this example, ” ./Bib/Video/Director ” and ” ./Bib/Video/Title ”. The condition is on the former label-path, and the results are from the latter label-path. The result set contains titles of those videos directed by Lance Rivera. We must search all (Director, Title) pairs to select titles of the videos directed by Lance Rivera from all (Director, Title) pairs for this query, because all data are stored in these tables in a RDBS. Besides, the pairs must be for the same video, and the director must be Lance Rivera. Figure 3.4(b) shows the corresponding SQL command under the XPred schema. An algorithm that can translate the XQuery command into corresponding SQL command is described as follows: First, we must know which tables are needed to join. Whenever a label-path exists, it means two tables (i.e., the Table Path and Data or the Table Node) are needed to join. If a label-path connects with a element node, the table Path and Node are joined together. If a label-path connects with a text or an attribute node, the table Path and Data are joined. When many label-paths are involved, we must decide that which tables should be join with their relationships between these label-paths. If the relationship is sibling between two label-paths, we only need a equijoin that two Table Data or Node are joined together with the attribute PredID. The relationships between these label-paths can be translated into the FROM clauses in a SQL command. Next, the WHERE clauses in the XQuery command can be translated into the WHERE clauses in the SQL command. For example 3, the third row of statement (i.e.,WHERE $ y=”Lance Rivera”) in Figure 3.4(a) can be translated into a selection (i.e., WHERE D1.Value=’Lance Rivera’) in the WHERE clause in Figure 3.4(b). The ORDER-BY clauses in the XQuery command can be translated into the ORDER-BY clauses in the SQL command. For example 3, the fourth row of statement (i.e., ORDER BY $ x/Title) in Figure 3.4(a) can be translated into the ninth statement (i.e., ORDER BY D2.Value ASC) in the SELECT clause in Figure 3.4(b). Finally, the RETURN clauses in the XQuery command can be translated into the SELECT clauses in the SQL command. For example 3,the fifth row of statement (i.e., RETURN $ x/Title) in Figure 3.4(a) can be translated into the first statement (i.e., SELECT D2.Value) in the. 20.

(31) SELECT clause in Figure 3.4(b).. 3.3.2. XML Data Insertion and Deletion. In this section, we will explain that how the XPred schema perform the insertion and deletion operations of XML data in a relational database. Operation of the insertion and the deletion can be represented as follows: InsertNode(P ath(Nodei ),P redID(Nodei),V alue(Nodei )). DeletNode(P ath(Nodei ),NodeID(Nodei )). Note that the deletion of a Nodei must delete all of its descendant nodes. The operations of a insertion and deletion of XML data in a relational database under the XPred schema are as follows: XPred Insertion Approach: Step1: We must find out an ID of a path (i.e., P athIDi ) of the node Nodei in the Table Path according th the assigned P ath(Nodei ). If it doest’t exist in the Table Path, we must insert an record of the new path into the Table Path. Step2: At first we must judge the type of the node Nodei whether the element type or the text type is. If the type of the node Nodei is the element type, we will insert an new tuple into the Table Node. We assign a new sequence number NodeIDi to Nodei . Give the ID of the predecessor node Nodej of the node Nodei (i.e., P redIDi) according to the assigned P redID(Nodei). We find out an maximum ordinal MaxOrdinali of all tuples where the ID of the predecessor node Nodej of the node (i.e., PredID) and the PathID of the node Nodei are equal to the P redIDi) and the P athIDi in the Table Node. we set the value of the ordinal of the node Nodei (i.e., Ordinali ) that is equal to the MaxOrdinali adding one. Then we insert a tuple (NodeIDi , P athIDi , Ordinali , P redIDi) into the Table Node. If the type of the node Nodei is the text type, we will insert an new tuple into the Table Data. Inserting data inot the Table Data and Node are very similar, except that the Table Data must be inserted the attribute value of the node Nodei .. 21.

(32) XPred Deletion Approach: Step1: If the node Nodei that we want to delet is only a text node, we select and delete the label-path that is only owned by the node Nodei in the Table Path. If the node Nodei deleted is a element node, we select and delete the lable-paths that only arrive at the node Nodei and its all descendant nodes Nodej in the Table Path. The foregoing are assigned to P ath(Nodei ). Step2: If the deleted node Nodei is only a text node, We find out an tuple in the Table Data, where PathID is equal to P athIDi and NodeID is equal to NodeIDi . Then we must delet the tuple that are selected in the Table Data. If the deleted node Nodei is a element node, we must find out all tuple of the node Nodei and its descendant nodes Nodej in the Table Node or the Table Data, where PathID is equal to P athIDi or P athIDj and NodeID is equal to NodeIDi or P redIDi. Then we delet the tuples that we find out in the Table Node or Data. The foregoing are assigned to NodeID(Nodei ).. 22.

(33) Chapter 4 Performance Evaluation The experiments described in this chapter are meant to assess the capabilities of the proposed approach in storing and manipulating of XML documents for an XML storage system. The performance of simulation experiments of the XPred schema was compared to the XParent schema in a relational database management system. The experiments are conducted on a PC with a Intel(R) CoreT M 2 Quad 2.40GHz processor and DDR2 memory with 1024MB capacity that its speed is 800MHZ, and the relational database management system is Oracle10g. All algorithms of the XPred schema are implemented in C++. We use Xerces for C++ parse XML documents. The experimental results are based on two performance diagnostic tools in Oracle10g, which are SQL trace and TKPROF. The rest of this chapter are to describe performance metrics, data sets, and experimental results.. 4.1. Performance Metrics. The primary performance metric is the response time of an operation, referred to as ResponseT ime. The response time of each operation Oi is a time interval from start to finish. Let StartT imei and F inishT imei be the time of Oi starts and finishes, respectively. ResponseT ime of an operation Oi is calculated as F inishT imei − StartT imei . Other interested metric are CP UT ime, I/OT ime, NumP hysicalIO and NumLogicalIO. The CP UT ime and I/OT ime are the CPU computation time of an operation and the I/O execution time of an operation, respectively. The NumP hysicalIO and NumLogicalIO are total number of data blocks physically reading from disk and total number of buffers retrieved from memory, respectively. 23.

(34) site. regions. people. {africa, asia,...}. person. open_auctions. closed_auctions. category. closed_auction. homepage. creditcard id profile name. annotation. price. itemref. description. edge person from. reserve. description. name. description. buyer. income. item. categories. catgraph. to. id open_auction. mailbox annotation mail. initial. bidder. itemref. description increase personref person. Figure 4.1: The database description of an XML document generated by the XMark.. 4.2. Data Sets. The testing data are generated by an XML document generator, called XMLgen, which is provided by the XML benchmark project (XMark ) [1, 2]. We also made some modifications to the XMLgen for generating different context of XML documents. Figures 4.1 shows a database description corresponded with the XML document generated by the XMLgen. The size of XML documents generated by the XMLgen are 1MB, 5MB, 10MB, 15MB, 20MB and 25MB. The size of an XML document under evaluation is denoted as DataSize. In our experiments, we used seven benchmark query templates which were provided by the XMark. These query templates which were used in our experiments are summarized as follows: • Q1: Return the name of the item with ID ′ itemX′ registered in NorthAmerica. • Q2: How many items are listed on all continents? • Q3: List the names of persons and the number of items they bought. • Q4: List the names of persons and the names of the items they bought in Europe. 24.

(35) • Q5: Return the names of all items whose description contain the word ′ good′. • Q6: Print the keywords in emphasis in annotations of closed auction. • Q7: Which persons don’t have a homepage? Although researchers have been proposed various model-mapping schemas (e.g., the Edge schema, the XRel schema, and the XParent schema), we only compare our work (i.e., the XPred schema) with the XParent schema. It is because Haifeng etc., [11] shows that the effectiveness of the XParent schema is better than other model-mapping schemas.. 4.3. Experimental Results. The purpose of this section is to evaluate the performance of data access operations of an XML storage system under the XParent schema and the XPred schema. In the rest of this section, we present results from a series of data translation and query processing.. 4.3.1. Data Bulk Load. Figure 4.2 (a) shows the average ResponseT ime for translating XML documents into records in an RDBS under the XParent and the XPred schemas, when DataSize were 1MB, 5MB, 10MB, 15MB, 20MB and 25MB. The DataSize is the size of XML documents generated by the XMLgen. It was shown that the XPred schema greatly outperforms the XParent schema. In other words, the time for translating XML documents takes shorter than that under the XParent schema. Figure 4.2 (b) and (c) shows the average CP UT ime and I/OT ime for translating XML documents into records in an RDBS under the XParent and the XPred schemas, when DataSize were 1MB, 5MB, 10MB, 15MB, 20MB and 25MB. We analyze further the result of the experiment for data bulk load. We found that the XPred schema greatly outperforms the XParent schema for translating XML documents into the RDBS schema because XPred only stores three table included the Table Path, the Table Node and the Table Data, as shown in Figure 4.2 (b) and (c). In particular, the XPred schema reduced maximally 60.9% time for translating XML documents into the RDBS, when DataSize was 10MB, but the more the DataSize of XML documents were stored, the less the time of. 25.

(36) translating and storage were reduced. We find that the range of improvement in the time of translating and storing decreased progressively after DataSize was 10MB in the Figure 4.2 (b) and (c). Figure 4.2 (d) shows the detailed result of experiment for data bulk load.. 26.

(37) 1000.00. 2,800.00. 800.00. CPUTime (Second). ResponseTime (Second). XParent 3,500.00. 2,100.00. 1,400.00. 700.00. 600.00. 400.00. 200.00. 0.00. 0.00. 1MB. 5MB. 10MB. 15MB. 20MB. 25MB. 1MB. DataSize (MB). 5MB. 10MB. 1MB 2000. 5MB 1500. 10MB 1000. 15MB 500. 20MB 0 5MB. 10MB. 20MB. 25MB. (b). 2500. 1MB. 15MB. DataSize (MB). (a). I/OTime (Second). XPred. 15MB. 20MB. 25MB. DataSize (MB). 25MB. (c). ResponseTime CPUTime (S) I/OTime (S) ResponseTime CPUTime (S) I/OTime (S) ResponseTime CPUTime (S) I/OTime (S) ResponseTime CPUTime (S) I/OTime (S) ResponseTime CPUTime (S) I/OTime (S) ResponseTime CPUTime (S) I/OTime (S). (S). (S). (S). (S). (S). (S). XParent 34.15 2.42 31.73 161.17 11.64 149.53 537.24 137.86 399.38 1,497.66 289.30 1,208.36 2,220.80 512.75 1,708.05 3,149.02 872.50 2,276.52. XPred 14.43 1.93 12.50 67.49 8.09 59.40 209.59 46.75 162.84 956.91 277.69 679.22 1,522.72 504.96 1,017.76 2,337.23 915.53 1,421.70. (d). Figure 4.2: (a) The average ResponseT ime, (b) the average CP UT ime, (c) the average I/OT ime, and (d) the detailed results of translating and storing data under the XParent schema and the XPred schema, when DataSize were 1MB, 5MB, 10MB, 15MB , 20MB, and 25MB.. 27.

(38) 4.3.2. Query processing. Figure 4.3 (a) and (b) show the ResponseT ime and CP UT ime of query processing, when DataSize was 1MB. All queries under the XPred schema outperform than that under the XParent schema, because the number of equijoins needed by the XPred schema is less than that by the XParent schema. Figure 4.3 (c) and (d) show the NumP hysicalIO and NumLogicalIO of query processing, when DataSize was 1MB. We can find that the number of physical I/O activities of query processing are less under the XPred schema than that under the XParent schema as shown in Figure 4.3 (c). It is also because the number of join operations for query processing needed by XPred is less than XParent. In the same way, the number of logical I/O activities of query processing is almost less under the XPred schema than under the XParent schema in Figure 4.3 (e), especially Q4. It is because that the 19 equijoins needed by XParent is more than 13 equijoins needed by XPred. However, We find that The NumLogicalIO of Q7 for XPred is more than that for XParent as shown in Figure 4.3 (d). Processing Q7 with except operator needs heavy I/O activities under the XParent schema and the XPred schema. XPred need the heavier cost of logical I/O activities for Processing Q7 than XParent, but XPred spends less cost of physical I/O activities than XParent. We know that a speed of physical I/O activities always are the slower than logical I/O activities, hence the efficacy of the whole I/O activities including physical I/O activities and logical I/O activities under the XPred schema outperforms that under the XParent schema for processing Q7. Figure 4.4 (a) and (b) show ResponseT ime and CP UT ime of query processing under the XParent schema and the XPred schema, when DataSize was 5MB. The result of the experiment is very similar to Figure 4.3 (a) and (b) when DataSize was 5MB. When data size is increased, the ResponseT ime and CP UT ime are also increased. Figure 4.4 (c) and (d) show NumP hysicalIO and NumLogicalIO of query processing under the XParent schema and the XPred schema, when DataSize was 5MB. It is similar to Figure 4.3 (a) and (b). It is also because that the number of join operations for query processing needed by XPred is less than that by XParent. Figure 4.5 (a) and (b) show ResponseT ime and CP UT ime of query processing under the XParent schema and the XPred schema, when DataSize was 10MB. These queries. 28.

(39) under the XPred schema outperform than the XParent schema, especially Q4 and Q7. The ResponseT ime of processing Q4 and Q7 are reduced 44% and 42% under the XPred schema respectively. It is also beause that XPred spend less the cost of equijoins than XParent. Figure 4.5 (c) and (d) show NumP hysicalIO and NumLogicalIO of query processing under the XParent schema and the XPred schema, when DataSize was 10MB. It is very simialr to Figure 4.4 (c) and (d). Figure 4.6 , Figure 4.7 and Figure 4.8 show the ResponseT ime, CP UT ime, I/OT ime, NumP hysicalIO and NumLogicalIO of query processing under the XParent schema and the XPred schema, when DataSize was 15MB, 20MB and 25MB. These results are similar to Figure 4.5. The experimental results in Figures 4.3, 4.4, 4.5, 4.6, 4.7 and 4.8 show that the XPred schema outperforms the XParent schema for almost all queries, where XPred is only slightly better than XParent for Q2 and Q6. The performance differences were caused by the number of joins for query processing under the XParent schema and the XPred schema. In particular, the XPred schema and the XParent schema needed 13 and 19 equijoins for Q4, respectively. It is because the XPred schema uses the method to stores additionally the NodeIDs owned by the predecessor nodes of each node while storing the information of each node. The method will reduce the number of join operations occurred because of searching the sibling nodes.. 29.

(40) 0.350. 0.480. 0.300. 0.400. 0.250 CPUTime (Second). ResponseTime (Second). XParent. 0.560. 0.320. 0.240. 0.200. 0.150. 0.160. 0.100. 0.080. 0.050. 0.000. 0.000 Q1. Q2. Q3. Q4. Q5. Q6. Q7. Q1. Q2. Query Templates. Q3. Q4. Q5. Q6. Q7. Q6. Q7. Query Templates. (a). (b). 300. 19,000. 240. 15,200. The Number of Logical I/O Activites. The Number of Physical I/O Activites. XPred. 180. 120. 60. 0. 11,400. 7,600. 3,800. 0 Q1. Q2. Q3. Q4. Q5. Q6. Q7. Q1. Query Templates. Q2. Q3. Q4. Q5. Query Templates. (c). (d). Figure 4.3: (a) ResponseT ime, (b) CP UT ime, (c) NumP hysicalIO, and (d) NumLogicalIO of query processing under the XParent schema and the XPred schema, when DataSize was 1MB.. 30.

(41) 2.000. 1.500. 1.500 CPUTime (Second). ResponseTime (Second). XParent. 2.000. 1.000. 0.500. 1.000. 0.500. 0.000. 0.000 Q1. Q2. Q3. Q4. Q5. Q6. Q7. Q1. Q2. Query Templates. Q3. Q4. Q5. Q6. Q7. Q6. Q7. Query Templates. (a). (b). 1,300. 85,500. 1,040. 68,400. The Number of Logical I/O Activites. The Number of Physical I/O Activites. XPred. 780. 520. 260. 0. 51,300. 34,200. 17,100. 0 Q1. Q2. Q3. Q4. Q5. Q6. Q7. Q1. Query Templates. Q2. Q3. Q4. Q5. Query Templates. (c). (d). Figure 4.4: (a) ResponseT ime, (b) CP UT ime, (c) NumP hysicalIO, and (d) NumLogicalIO of query processing under the XParent schema and the XPred schema, when DataSize was 5MB.. 31.

(42) 3.000. 3.200. 2.400. CPUTime (Second). ResponseTime (Second). XParent. 4.000. 2.400. 1.600. 0.800. 1.800. 1.200. 0.600. 0.000. 0.000 Q1. Q2. Q3. Q4. Q5. Q6. Q7. Q1. Q2. Q3. Q4. Q5. Q6. Q7. Q6. Q7. Query Templates. Query Templates. (a). (b). 2,250. 166,500. 1,800. 133,200. The Number of Logical I/O Activites. The Number of Physical I/O Activites. XPred. 1,350. 900. 450. 0. 99,900. 66,600. 33,300. 0 Q1. Q2. Q3. Q4. Q5. Q6. Q7. Q1. Query Templates. Q2. Q3. Q4. Q5. Query Templates. (c). (d). Figure 4.5: (a) ResponseT ime, (b) CP UT ime, (c) NumP hysicalIO, and (d) NumLogicalIO of query processing under the XParent schema and the XPred schema, when DataSize was 10MB.. 32.

(43) 5.000. 4.400. 4.000. CPUTime (Second). ResponseTime (Second). XParent. 5.500. 3.300. 2.200. 1.100. 3.000. 2.000. 1.000. 0.000. 0.000 Q1. Q2. Q3. Q4. Q5. Q6. Q7. Q1. Q2. Query Templates. Q3. Q4. Q5. Q6. Q7. Q6. Q7. Query Templates. (a). (b). 3,500. 247,000. 2,800. 197,600. The Number of Logical I/O Activites. The Number of Physical I/O Activites. XPred. 2,100. 1,400. 700. 0. 148,200. 98,800. 49,400. 0 Q1. Q2. Q3. Q4. Q5. Q6. Q7. Q1. Query Templates. Q2. Q3. Q4. Q5. Query Templates. (c). (d). Figure 4.6: (a) ResponseT ime, (b) CP UT ime, (c) NumP hysicalIO, and (d) NumLogicalIO of query processing under the XParent schema and the XPred schema, when DataSize was 15MB.. 33.

(44) 6.500. 5.600. 5.200. CPUTime (Second). ResponseTime (Second). XParent. 7.000. 4.200. 2.800. 1.400. 3.900. 2.600. 1.300. 0.000. 0.000 Q1. Q2. Q3. Q4. Q5. Q6. Q7. Q1. Q2. Query Templates. Q3. Q4. Q5. Q6. Q7. Q6. Q7. Query Templates. (a). (b). 4,500. 320,000. 3,600. 256,000. The Number of Logical I/O Activites. The Number of Physical I/O Activites. XPred. 2,700. 1,800. 900. 0. 192,000. 128,000. 64,000. 0 Q1. Q2. Q3. Q4. Q5. Q6. Q7. Q1. Query Templates. Q2. Q3. Q4. Q5. Query Templates. (c). (d). Figure 4.7: (a) ResponseT ime, (b) CP UT ime, (c) NumP hysicalIO, and (d) NumLogicalIO of query processing under the XParent schema and the XPred schema, when DataSize was 20MB.. 34.

(45) 8.000. 7.200. 6.400. CPUTime (Second). ResponseTime (Second). XParent. 9.000. 5.400. 3.600. 1.800. XPred. 4.800. 3.200. 1.600. 0.000. 0.000 Q1. Q2. Q3. Q4. Q5. Q6. Q7. Q1. Q2. Query Templates. Q3. Q4. Q5. Q6. Q7. Q6. Q7. Query Templates. (a). (b). 6,500. 5,200 The Number of Logical I/O Activites. The Number of Physical I/O Activites. 422,000. 3,900. 2,600. 1,300. 337,600. 253,200. 168,800. 84,400. 0 Q1. Q2. Q3. Q4. Q5. Q6. Q7. 0 Q1. Query Templates. Q2. Q3. Q4. Q5. Query Templates. (c). (d). Figure 4.8: (a) ResponseT ime, (b) CP UT ime, (c) NumP hysicalIO, and (d) NumLogicalIO of query processing under the XParent schema and the XPred schema, when DataSize was 25MB.. 35.

(46) Chapter 5 Conclusion In this thesis, we explores the performance issues on the access of XML documents over relational databases. A new model-mapping schema, called XPred, is proposed to manipulate XML documents over relational databases. The XPred schema add an additional attribute (i.e., predecessor node’s ID) into nodes of the table Node and Data. It was shown that a large number of join operations could be avoided due to the number of searching of parent-child relationship is reduced. In addition, the required time for the storing of XML documents into databases could be reduced because of a less amount of data being stored, compared to other schemas. The capability of the proposed methodology and algorithms were verified by a series of simulation experiments under different patterns of XML documents generated by the XML benchmark project(XMark)[1, 2], for which we have some encouraging experimental results. The XPred schema also improve the speed of translating and storing XML documents into a relational database. For the future research, we shall further explore the possibility in reducing the CPU time consumption for query processing. We will also try to balance the cost of space and time for extending our results to grand-parent-child relationships in order to speed up. XML data indexing is also an important topic for future XML applications. We will work on such a direction.. 36.

(47) Bibliography [1] A.R. Schmidt, F. Waas, M.L. Kersten,D. Florescu, I. Manolescu, M.J. Carey, and R. Busse, ”XMark X An XML Benchmark Project” , http://monetdb.cwi.nl/xml/, 2003. [2] A.R. Schmidt, F. Waas, M.L. Kersten, D. Florescu, I. Manolescu, M.J. Carey, and R. Busse, ”The XML benchmark project,” Technical report INS-R0103, CWI, The Netherlands. April 30, 2001. [3] World Wide Web Consortium, http://www.w3.org/, 2007. [4] Sybase Corporation, ”Using XML with the Sybase Adaptive Server SQL Databases,” Technical Whitepaper, August 21, 1999. [5] Michael Rys, ”Microsoft SQL Server 2000 XML Enhancements,” Microsoft Support WebCast, April, 2000. [6] J. Kyung-Soo, ”A Design of Middleware Components for the Connection between XML and RDB,” In Proceeding of the IEEE International Symposium on Industrial Electronics (IEEE ISIE), Pusan, KOREA, pp.1753-1756, 2001. [7] R. Bourret, C. Bornhovd and A. Buchmann, ”A Generic Load/Extract Utility for Data Transfer Between XML Documents and Relational Databases,” Second International Workshop on Advance Issues of E-Commerce and Web-Based Information Systems (WECWIS), Milpitas, California, pp.134-143, June 8-9, 2000. [8] Albrecht Schmidt, Martin Kersten, Menzo Windhouwer and Florian Waas, ”Efficient Relational Storage and Retrieval of XML Documents,” In Proceeding of the 3rd International Workshop on the Web and Databases (WebDB), Dallas, Texas, May 18-19, 2000. [9] Ismailcem Budak Arpinar, John Miller, and Amit P. Sheth, ”An Efficient Data Extraction and Storage Utility for XML Documents,” In Proceedings of the 39th Annual ACM Southeast Conference (ACMSE), Athens, Georgia, pp. 293-295, March 2001. [10] D. Florescu and D. Kossman, ”A Performance Evaluation of Alternative Mapping Schemes for Storing XML Data in a Relational Database,” INRIA Research Report No.3680, Rocquencourt, France, May 1999. [11] H. Jiang, H. Lu,W.Wang, and J. X. Yu., ”Path Materialization Revisited: An Efficient Storage Model for XML Data,” In Procedings of Australasian Database Conference (ADC), 2002. [12] YoshiKawa, M. and Amagasa, T., ”XRel: A Path-Based Approach to Storage and Retrieval of XML Documents Using Relational Databases,” ACM Transactions on Internet Technology (TOIT), Vol. 1, No. 1, pp.110-141, August 2001. 37.

(48) [13] A. Schmidt, M. Kersten, M. Windhouwer, F. Waas., ”Efficient relational storage and retrieval of XML documents”, Proc of the 3rd International Workshop on the Web and Databases (WebDB),pp. 47-52, 2000. [14] Haifeng Jiang, Hongjun Lu, Wei Wang, Jeffrey Xu Yu, ”XParent: An Efficient RDBMSBased XML Database System,” The 18th International Conference on Data Engineering (ICDE 2002), San Jose, California, February 26 - March 1, 2002. pp. 335-336. [15] J. Clark, and S. DeRose, ”XML path language (XPath)”, In W3C Recommendation 16, http://www.w3.org/TR/xpath, 1999. [16] World Wide Web Consortium (W3C), ”XQuery 1.0: An XML Query Language,” http://www.w3.org /TR/2003/WD-xquery-20030502, 2003.. 38.

(49) Appendix A Raw Data of XML Data Loading NO 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 AVG. ALLTime(S) 31.36 32.03 33.94 34.80 31.67 36.69 33.20 37.44 36.13 34.69 30.70 39.61 34.00 36.98 33.16 32.31 33.78 34.63 31.72 34.16 34.15. I/OTime(S) 28.99 29.49 31.38 32.35 29.35 34.52 30.43 34.94 33.17 32.23 28.16 36.39 31.12 33.76 30.45 30.31 31.12 32.36 29.43 31.83 31.73. PocessTime(S) 2.36 2.54 2.56 2.45 2.32 2.17 2.77 2.50 2.95 2.46 2.55 3.22 2.89 3.23 2.71 2.01 2.66 2.27 2.29 2.33 2.56. LabelPathIOTime(S) 0.24 0.22 0.22 0.18 0.31 0.27 0.25 0.33 0.27 0.20 0.19 0.30 0.25 0.31 0.25 0.25 0.41 0.22 0.24 0.31 0.26. ElementIOTime(S) 11.35 10.65 12.40 13.04 10.82 12.50 11.36 15.14 12.58 12.20 10.44 12.06 11.77 12.36 11.19 11.45 11.96 11.29 10.85 12.42 11.89. DataIOTime 7.59 8.48 8.03 8.26 8.17 9.89 8.01 8.29 8.62 8.17 7.56 12.39 8.16 8.96 8.10 7.59 8.11 10.12 8.15 8.54 8.56. DataPathIOTime(S) 9.82 10.14 10.73 10.86 10.04 11.87 10.80 11.18 11.71 11.65 9.97 11.64 10.94 12.12 10.91 11.02 10.65 10.72 10.20 10.55 10.88. (a) NO ALLTime(S) 1 14.64 2 12.30 3 13.25 4 13.14 5 12.48 6 15.14 7 14.22 8 13.67 9 14.38 10 14.83 11 14.09 12 14.95 13 15.25 14 16.09 15 15.48 16 16.83 17 16.06 18 14.20 19 13.73 20 13.86 AVG 14.43. I/O Time(S) 12.49 10.71 11.55 11.51 11.19 13.50 12.31 11.93 12.64 12.67 10.82 12.88 12.93 14.46 13.49 14.45 13.89 12.46 11.75 12.34 12.50. Pocess Time(S) 2.15 1.59 1.71 1.63 1.30 1.64 1.90 1.75 1.74 2.16 3.28 2.08 2.32 1.64 1.99 2.37 2.17 1.75 1.98 1.52 1.93. Path Table I/O Time(S) 0.17 0.22 0.22 0.18 0.24 0.28 0.14 0.20 0.33 0.23 0.22 0.33 0.39 0.25 0.28 0.17 0.39 0.27 0.28 0.19 0.25. Node Table I/O Time(S) 3.56 2.90 3.40 3.29 3.20 3.74 3.78 3.57 3.76 4.01 3.52 3.98 3.63 6.04 3.98 3.90 4.08 4.13 3.07 3.53 3.75. Data Table I/O Time(S) 8.76 7.58 7.93 8.04 7.75 9.49 8.40 8.16 8.55 8.42 7.08 8.57 8.91 8.16 9.23 10.38 9.42 8.06 8.40 8.62 8.50. (b) Figure A.1: The result of experiments detail for translating and storing XML data under (a) the XParent schema and (b) the XPred schema, when DataSize was 1MB.. 39.

(50) NO ALLTime(S) I/OTime(S) 1 208.42 189.52 2 153.86 143.69 3 156.31 146.75 4 182.83 170.46 5 156.31 145.29 6 151.09 140.76 7 146.02 135.54 8 147.22 137.90 9 160.75 149.02 10 151.17 142.34 11 175.53 161.08 12 161.75 150.93 13 155.80 144.53 14 161.13 146.95 15 165.55 154.09 16 174.11 160.18 17 178.98 164.70 18 151.19 141.21 19 141.19 131.61 20 144.27 134.12 AVG 161.17 149.53. PocessTime(S) 18.90 10.17 9.56 12.37 11.02 10.33 10.47 9.32 11.73 8.83 14.45 10.82 11.27 14.18 11.46 13.93 14.29 9.97 9.57 10.14 11.64. LabelPathIOTime(S) 0.31 0.15 0.21 0.35 0.11 0.23 0.25 0.28 0.36 0.34 0.25 0.40 0.34 0.42 0.36 0.41 0.34 0.36 0.23 0.16 0.29. ElementIOTime(S) 67.76 53.65 58.84 63.23 57.34 51.67 52.30 52.96 56.67 54.66 60.19 57.18 54.26 57.66 56.11 59.36 60.37 53.80 51.13 51.58 56.54. DataIOTime 54.23 36.70 37.10 48.61 39.76 40.30 36.04 37.74 38.52 36.02 45.57 39.84 39.45 38.51 40.93 44.04 43.95 37.55 34.32 37.38 40.33. DataPathIOTime(S) 67.23 53.18 50.61 58.28 48.07 48.56 46.96 46.92 53.47 51.32 55.08 53.51 50.47 50.36 56.70 56.37 60.04 49.50 45.94 45.00 52.38. (a) NO ALLTime(S) I/OTime(S) 1 72.83 63.34 2 66.63 58.84 3 67.27 58.58 4 67.63 59.30 5 67.00 58.84 6 66.06 58.41 7 66.55 59.79 8 69.61 61.12 9 64.33 57.08 10 66.88 57.85 11 66.36 59.16 12 64.38 57.12 13 68.94 61.10 14 70.89 62.74 15 62.95 55.89 16 71.72 62.07 17 67.30 58.77 18 71.22 62.43 19 60.94 54.34 20 70.27 61.28 AVG 67.49 59.40. PocessTime(S) 9.49 7.79 8.69 8.33 8.16 7.65 6.76 8.49 7.25 9.03 7.20 7.26 7.83 8.15 7.06 9.65 8.52 8.78 6.59 8.99 8.08. PathTableI/OTime(S) 0.23 0.27 0.27 0.25 0.36 0.25 0.36 0.35 0.17 0.30 0.25 0.23 0.32 0.33 0.25 0.26 0.25 0.30 0.30 0.31 0.28. NodeTableI/OTime(S) 20.47 17.08 17.13 17.53 17.99 16.82 17.02 18.78 16.02 17.56 17.78 16.77 16.62 18.60 16.13 17.85 17.47 18.12 14.83 16.57 17.36. DataTableI/OTime(S) 42.63 41.49 41.18 41.52 40.49 41.35 42.41 42.00 40.89 39.99 41.14 40.12 44.17 43.81 39.51 43.95 41.05 44.02 39.22 44.40 41.77. (b) Figure A.2: The result of experiments detail for translating and storing XML data under (a) the XParent schema and (b) the XPred schema, when DataSize was 5MB.. 40.

(51) NO ALLTime(S) I/O Time(S) 1 536.45 3.55.37 2 431.91 385.97 3 426.73 343.72 4 550.36 482.06 5 484.03 412.92 6 747.67 536.49 7 441.81 379.93 8 441.25 392.96 9 460.44 391.36 10 596.69 497.40 11 651.98 371.30 12 552.19 400.96 13 878.14 612.43 14 503.55 412.97 15 481.80 396.24 16 695.70 449.40 17 555.41 411.82 18 444.02 366.68 19 423.14 362.54 20 441.44 380.42 AVG 537.23 420.40. Pocess Time(S) 181.09 45.94 83.01 68.29 71.11 211.18 61.88 48.29 69.08 99.29 280.68 151.23 265.71 90.58 85.55 246.30 143.58 77.33 60.60 61.01 120.09. LabelPath IO Time(S) 0.30 0.36 0.25 0.49 0.35 0.95 0.31 0.61 0.38 0.39 0.58 0.57 0.26 0.58 0.67 0.37 0.55 0.50 0.49 0.34 0.46. Element IO Time(S) 132.50 133.79 124.07 183.49 151.75 185.37 129.78 143.90 143.28 220.87 118.27 138.90 154.19 147.70 147.91 143.19 138.35 135.12 126.20 148.03 147.33. Data IO Time 97.07 114.72 98.86 111.41 115.85 171.32 92.19 107.04 115.26 124.35 138.17 124.47 114.95 115.69 104.89 142.92 120.35 101.00 121.99 101.25 116.69. DataPath IO Time(S) 125.50 137.10 120.55 186.68 144.98 178.85 157.65 141.42 132.45 151.79 114.29 137.03 343.03 148.99 142.77 162.92 152.58 130.06 113.86 130.80 152.66. (a) NO ALLTime(S) I/O Time(S) 1 169.66 147.87 2 192.77 165.68 3 199.28 159.10 4 167.17 139.20 5 176.36 146.14 6 214.28 177.12 7 218.72 174.94 8 189.92 159.03 9 183.83 157.16 10 211.80 155.84 11 428.08 133.87 12 166.03 140.90 13 185.52 148.43 14 279.91 248.41 15 214.91 167.65 16 248.13 210.75 17 183.05 155.77 18 182.88 151.66 19 203.75 171.44 20 175.72 145.77 AVG 209.59 162.84. Pocess Time(S) 21.79 27.09 40.18 27.97 30.22 37.16 43.78 30.89 26.67 55.96 294.21 25.13 37.09 31.50 47.26 37.38 27.28 31.21 32.31 29.95 46.75. Path Table I/O Time(S) 0.39 0.94 1.20 0.33 0.55 1.50 0.83 0.85 1.77 0.98 1.11 0.58 1.27 1.82 1.11 1.45 0.61 0.70 0.81 0.41 0.96. Node Table I/O Time(S) 43.46 45.40 42.17 40.17 40.07 45.52 45.78 47.76 43.89 42.62 47.23 43.59 37.95 75.33 43.14 54.01 41.31 42.19 44.33 41.70 45.38. Data Table I/O Time(S) 104.02 119.34 115.73 98.70 105.52 130.10 128.33 110.43 111.51 112.23 85.53 96.73 109.22 171.26 123.40 155.29 113.85 108.77 126.30 103.67 116.50. (b) Figure A.3: The result of experiments detail for translating and storing XML data under (a) the XParent schema and (b) the XPred schema, when DataSize was 10MB.. 41.

(52) NO ALLTime(S) I/O Time(S) 1 1381.31 1160.71 2 1456.13 1209.74 3 1451.13 1167.92 4 1411.92 1090.87 5 1562.67 1282.33 6 1545.58 1230.69 7 1560.25 1258.01 8 1537.42 1283.08 9 1485.13 1017.94 10 1487.30 1195.87 11 1351.03 975.60 12 1530.38 1225.42 13 1464.03 1256.62 14 1777.53 1244.33 15 1441.09 1173.01 16 1387.03 1181.17 17 1542.95 1244.53 18 1568.16 1392.26 19 1505.81 1331.71 20 1506.38 1245.47 AVG 1497.66 1208.36. Pocess Time(S) 220.61 246.39 283.21 321.05 280.34 314.89 302.24 254.34 467.18 291.43 375.43 304.96 207.41 533.21 268.08 205.86 298.43 175.90 174.11 260.91 289.30. LabelPath IO Time(S) 2.00 1.18 3.69 2.63 0.62 1.50 0.87 0.83 1.80 1.28 2.62 0.99 0.91 1.05 1.24 0.57 1.80 0.78 2.39 0.95 1.48. Element IO Time(S) 405.32 385.23 373.92 307.84 590.37 408.00 389.39 409.64 359.97 381.58 364.96 369.37 320.60 405.79 315.70 434.76 344.95 452.74 451.22 413.85 394.26. Data IO Time 303.48 363.31 287.76 316.91 248.79 335.07 424.56 510.00 299.06 389.42 274.54 404.69 395.99 382.40 378.70 332.49 385.63 378.46 370.93 349.40 356.58. DataPath IO Time(S) 449.90 460.02 502.54 463.50 442.55 486.12 443.19 362.61 357.11 423.59 333.48 450.38 539.13 455.09 477.38 413.35 512.15 560.28 507.17 481.26 456.04. (a) NO ALLTime(S) I/O Time(S) 1 994.69 749.22 2 1003.45 757.99 3 908.44 656.79 4 830.73 581.53 5 867.67 647.14 6 909.28 602.65 7 949.19 660.28 8 1007.36 661.12 9 931.16 592.63 10 918.94 581.07 11 887.23 723.51 12 913.25 583.06 13 826.27 446.89 14 908.19 726.18 15 1279.11 891.93 16 912.67 750.36 17 1124.50 737.48 18 996.16 747.30 19 928.23 669.45 20 1041.77 817.81 AVG 956.91 679.22. Pocess Time(S) 245.47 245.47 251.64 249.21 220.54 306.64 288.91 346.24 338.53 337.87 163.73 330.20 379.37 182.01 387.18 162.31 387.02 248.85 258.78 223.96 277.70. Path Table I/O Time(S) 2.09 1.60 7.30 5.61 1.13 2.58 3.97 3.25 2.82 11.60 6.58 6.09 1.25 3.72 2.18 3.55 1.20 6.92 3.53 7.23 4.21. Node Table I/O Time(S) 172.72 246.09 150.09 154.93 217.33 211.04 131.12 165.77 137.01 155.42 207.60 205.21 123.52 131.66 204.44 215.94 184.30 193.72 164.37 229.40 180.08. Data Table I/O Time(S) 574.40 510.31 499.41 420.99 428.68 389.03 525.18 492.10 452.80 414.06 509.33 371.76 322.12 590.80 685.31 530.87 551.98 546.66 501.55 581.17 494.93. (b) Figure A.4: The result of experiments detail for translating and storing XML data under (a) the XParent schema and (b) the XPred schema, when DataSize was 15MB.. 42.

(53) NO ALLTime(S) I/O Time(S) 1 2241.91 1733.26 2 2128.30 1438.00 3 2157.97 1606.65 4 2172.00 1718.50 5 2082.58 1624.51 6 2188.88 1743.61 7 2080.53 1606.72 8 2056.08 1600.16 9 2188.83 1775.33 10 2118.56 1732.12 11 2050.27 1647.37 12 2694.80 2019.61 13 2343.42 1762.80 14 2434.44 1960.03 15 1961.55 1464.05 16 2227.00 1683.14 17 2193.31 1505.60 18 2396.09 1829.89 19 2299.47 1709.59 20 2400.13 2000.04 AVG 2220.80 1708.05. Pocess Time(S) 508.64 690.29 551.32 453.50 458.07 445.26 473.81 455.92 413.50 386.44 402.90 675.19 580.62 474.40 497.50 543.86 687.71 566.20 589.88 400.09 512.76. LabelPath IO Time(S) 3.22 1.33 5.51 1.56 1.56 2.97 1.81 2.16 1.19 1.66 1.63 3.16 0.97 1.53 1.05 1.61 1.26 2.07 5.33 2.16 2.19. Element IO Time(S) 583.47 455.21 618.24 556.62 632.08 669.32 512.28 539.41 552.85 497.91 576.71 667.58 558.72 691.37 436.51 573.96 485.84 614.25 491.44 653.57 568.37. Data IO Time 509.88 482.10 420.51 466.35 406.97 545.13 495.11 481.09 518.29 532.10 417.85 562.19 476.90 586.71 413.32 559.93 467.09 496.85 439.47 567.65 492.27. DataPath IO Time(S) 636.69 499.37 562.40 693.97 583.91 526.20 597.52 577.51 703.00 700.45 651.18 786.68 726.22 680.42 613.16 547.64 551.41 716.73 773.36 776.66 645.22. (a) NO ALLTime(S) I/O Time(S) 1 1408.36 843.28 2 1497.11 948.63 3 1516.45 980.89 4 1517.42 1125.48 5 1477.78 1074.97 6 1613.64 1064.22 7 1577.77 1030.93 8 1369.50 890.53 9 1505.25 1030.78 10 1462.70 1083.59 11 1137.19 802.53 12 1407.05 1024.65 13 1478.83 1082.65 14 1532.86 1131.87 15 1289.06 617.18 16 1549.64 1070.30 17 2034.27 1214.98 18 1716.27 1184.17 19 1599.20 1008.83 20 1764.09 1144.73 AVG 1522.72 1017.76. Pocess Time(S) 565.08 548.48 535.56 391.94 402.81 549.43 546.84 478.97 474.47 379.12 334.66 382.40 396.18 400.99 671.88 479.34 819.29 532.09 590.37 619.37 504.96. Path Table I/O Time(S) 14.06 9.81 4.83 4.45 3.91 10.60 6.28 4.46 6.78 8.25 2.61 4.97 6.95 8.51 2.27 18.15 7.44 6.70 4.08 7.83 7.15. Node Table I/O Time(S) 237.20 314.51 323.90 317.97 317.88 365.35 335.97 258.88 303.78 385.14 231.80 285.65 315.97 269.96 180.76 297.83 357.39 265.91 301.23 307.23 298.72. Data Table I/O Time(S) 592.03 624.31 652.16 803.07 753.18 688.27 688.68 627.19 720.22 690.19 568.12 734.03 759.72 853.40 434.15 754.31 850.15 911.56 703.52 829.67 711.90. (b) Figure A.5: The result of experiments detail for translating and storing XML data under (a) the XParent schema and (b) the XPred schema, when DataSize was 20MB.. 43.