Motivation - XML-Relational資料庫系統之模型對應綱要設計

In this thesis, a user query for manipulate XML data is represented in the FLWOR expression which is one of the well-known XQuery[16] representation. The FLWOR expression is very popular because it similar to SQL commands. The FLWOR expressions are often useful for executing joins between two or more XML documents and for restructuring the result of a query. The FLWOR expressions consist of five clauses: FOR, LET, WHERE, ORDER BY and RETURN clauses. The FOR clause introduces a variable (with $ prefix) with its path and the specific filename of the XML document. The LET clause can introduce additional variables (also with $ prefix). The variable introduced by a FOR or LET clause means a set of nodes with specific path in the XML document. In other words, the FOR and LET clauses in a FLWOR expression generate an sequence of tuples of bound variables, called the tuple stream. The WHERE clause can specify a conditional expression that is evaluated once for each of these variables binding nodes. The ORDER BY clause is used to sort the tuple stream , and the RETURN clause is the format of results. When a model-mapping schema (e.g., XParent schema) is adopted to store XML data, the FLWOR expressions in the XQuery commands must be translated into SQL commands to manipulate data in a relational database. The following example shows a user query in XQuery and its corresponding SQL commands when the XParent schema is adopted.

XQuery

FOR $x IN document("video.xml")/Bib LET $y=$x/Movie_star

WHERE $y/Name="Will Smith"

AND $y/@SID=$x/Video/Cast_ID AND $x/Video/@Year="2007"

ORDER BY $x/Video/Title

RETURN $x/Video/Title AND $x/Video/Director Figure 2.3: An example XQuery command.

Example 1 An example XQuery command and its corresponding SQL command under the XParent schema.

Figure 2.3 shows an XQuery command for retrieving data from the XML document showed in Figure 1.1. The XQuery command is to find the title and director of videos

which are acted by Will Smith and published in 2007. Figure 2.4 shows the corresponding SQL command under the XParent schema. Such a SQL command has 17 equijoins and 8 selections which is a heavy cost query.

SELECT D4.Value, D5.Value

FROM LabelPath LP1, Data D1,DataPath DP1, DataPath DP2, LabelPath LP2, Data D2, LabelPath LP3, Data D3, DataPath DP3, DataPath DP4, LabelPath LP4, Data D4, DataPath DP5, LabelPath LP5, Data D5, DataPath DP6, LabelPath LP6, Data D6 WHERE LP1.ID=D1.PathID

Figure 2.4: The corresponding SQL command of the XQuery command in Figure 2.3 under the XParent schema.

It is clear that the XParent schema is very simple and easy to implement. However, it needs a large number of join operations to complete a user query. Particularlly, it needs to locate all of the nodes that correspond to elements or attributes that appear in a query. Such a job needs a large number of join operations to verify the relationships (e.g., the parent-child relationships) between nodes. Notice that when user queries become more complex than that in Example 1, the more number of join operations are needed to complete the user queries.

Because join operations are very costly, the number of joins for a query is proportional to the processing time of the query. Such an observation motivates this research. In this thesis, we shall propose a new model-mapping schema based on a trade-space-for-time strategy to reduce significant join costs for processing various types of user queries, such that better performance can be achieved.

Chapter 3 XPred - Translation and

Manipulation of XML Documents

In this chapter, we will propose a new model-mapping schema, called XPred, to reduce a potential large number of join operations when processing XML queries. Algorithms to translate and manipulate XML documents over relational database systems are also provided.

3.1 XPred Schema

The XPred schema is a model-mapping schema which can support any sophisticated applica-tion. The rationale behind our proposed XPred schema is to store the structural information distributely into nodes to reduce the number of join operations when processing user queries.

In particular, for every node in a given XML document, we store its predecessor’s informa-tion within itself. It can eliminate the join operainforma-tion for parent-child traversing such that the performance of query processing can be improved. In other words, the XPred schema re-duces a potential larger number of join operations for processing user queries. The database schema of XPred is as follows:

Path(PathID, Length, LabelPath)

Node(NodeID, PathID, Ordinal, PredID) Data(NodeID, PathID, Ordinal, PredID, Value)

Table Path

NodeID PathID Ordinal PredID Value

3 3 1 2 2007

4 4 1 2 I Am Legend

5 5 1 2 1

6 6 1 2 Francis Lawrence

7 7 1 2 1hrs. 40min.

9 3 1 8 2007

10 4 1 8 Alvin and the Chipmunks

11 5 1 8 2

12 6 1 8 Tim Hill

13 7 1 8 1hrs. 32min.

15 3 1 14 2007

16 4 1 14 The Perfect Holiday

17 5 1 14 3

25 13 1 20 31 wins,58 nominations

27 9 1 26 2

28 10 1 26 Jason Lee

29 11 1 26 male

30 12 1 26 25-Apr-70

31 13 1 26 2 wins,11 nominations

33 9 1 32 3

34 10 1 32 Gabrielle Union

35 11 1 32 female

36 12 1 32 29-Oct-72

37 13 1 32 5 wins,11 nominations PathID Length LabelPath

Figure 3.1: The three tables of the XML data graph in Figure 2.1 under the XPred schema.

Table Path stores the information of paths of an XML data graph, where the label-path of a node is a sequence of node names from the root to the node. The attribute PathID denotes the unique ID of each label-path. The attribute LabelPath denotes the name of the corresponding label-path which is a sequence of node names in the label-path. The attribute Length denotes the length of the corresponding label-path which is calculated from root to it. Table Node stores the information of elements and attributes of an XML document. The attribute NodeID denotes an unique ID of each node. The attribute PathID denotes the ID of the label-path (i.e., the foreign key of the PathID in Table Path) of each node. The attribute ordinal denotes the ordinal number of the corresponding node among nodes with the same name and connected to the same source node. Finally, the attribute PredID is the NodeID of its predecessor node (i.e., its parent node). Table Data is the same as the Table Node, except there is an additional attribute Value to store the value of a node. Where the value of a node

is the value of the corresponding element or attribute in the XML document. Figure 3.1 shows the corresponding three tables of the XML document in Figure 2.1 under the XPred schema.

The design of our proposed XPred schema is motivated by a potential performance problem of approaches based on model-mapping schemas. Typically, a user query must perform many join operations to locate parent-child relationships among an XML document stored in a relational database, as shown in Example 1. Such a problem has significant impact on performance when processing user queries, due to the cost of join operations are heavy. In this thesis, a trad-space-for-time strategy is adopted to reduce the number of join operations for processing a user query. Particularly, we add an attribute PredID in Tables Node and Data to provide a direct reference to their immediate predecessor nodes (i.e., parent nodes). Such an approach is easy to find elements or attributes that have the same predecessor node in an XML document. Because the cost of join operations are heavy, by reducing the number of join operations could significantly improvement on the performance of an XML storage system. Example 2 showed that the number of join operations is reduced when XPred schema is adopted.

XPred-SQL

SELECT D4.Value, D5.Value FROM Path P1, Data D1, Data D2,

Path P2, Path P3, Data D3, Data D4, Path P4, Data D5, Path P5, Data D6, Path P6 WHERE P1.PathID=D1.PathID

Figure 3.2: The corresponding SQL command of the XQuery command in Figure 2.3 under the XPred schema.

Example 2 A corresponding SQL command of the XQuery command in Figure 2.3 under the XPred schema.

Figure 3.2 shows a SQL command corresponding to the XQuery command in Firgure 2.3 under the XPred schema. When the XPred is adopted, there are 11 equijoins and 8 selections.

Compare to that in Example 1, it greatly reduce the number of join operations from 17 to 11 (i.e., 35.2%).

As you can see that the XPred schema can greatly reduce the number of join operations when processing user queries. This is because the XPred stores an addition information into nodes: its predecessor node’s ID. It simplifies the cost of searching a node’s predecessor

node which is common in many model-mapping-schema-based approaches. In the following sections, we will provide an algorithm for translating an XML documents into a relational database under XPred schema. Related algorithms for manipulating XML data are also provided.

在文檔中 XML-Relational資料庫系統之模型對應綱要設計 (頁 19-26)