Basic Operations - Building Web Services with Java

The basic XML processing architecture shown in Figure 2.4 consists of three key layers.

At far left are the XML documents an application needs to work with. At far right is the application. In the middle is the infrastructure layer for working with XML documents, which is the topic of this section.

79 Processing XML

Figure 2.4 Basic XML processing architecture

For an application to be able to work with an XML document, it must first be able to parse the document. Parsing is a process that involves breaking the text of an XML docu-ment into small identifiable pieces (nodes). Parsers break docudocu-ments into pieces such as start tags, end tags, attribute value pairs, chunks of text content, processing instructions, comments, and so on.These pieces are fed into the application using a well-defined API implementing a particular parsing model. Four parsing models are commonly used:

n Pull parsing

g

—The application always has to ask the parser to give it the next piece of information about the document. It’s as if the application has to “pull” the information out of the parser (hence the name of the model).The XML commu-nity has not yet defined standard APIs for pull parsing. However, because pull pars-ing is becompars-ing popular, this could happen soon.

n Push parsing

g

—The parser sends notifications to the application about the types of XML document pieces it encounters during the parsing process.The notifica-tions are sent in reading order, as they appear in the text of the document.

Notifications are typically implemented as event callbacks in the application code, and thus push parsing is also commonly known as event-based parsing.The XML community created a de facto standard for push parsing called Simple API for XML (SAX)

g

. SAX is currently released in version 2.0.

n One-step parsing

g

—The parser reads the whole XML document and generates a data structure (a parse tree

g

) describing its contents (elements, attributes, PIs, comments, and so on).The data structure is typically deeply nested; its hierarchy mimics the nesting of elements in the parsed XML document.The W3C has defined a Document Object Model (DOM)

g

for XML.The XML DOM specifies the types of objects that are included in the parse tree, their properties, and their operations.The DOM is so popular that one-step parsing is typically referred to as DOM parsing.The DOM is a language- and platform-independent API. It offers many obvious benefits but also some hidden costs.The biggest problem with the DOM APIs is that they often don’t map well to the native data structures of pro-gramming languages.To address this issue for Java, the Java community has started working on a Java DOM (JDOM) specification whose goal is to simplify the manipulation of document trees in Java by using object APIs tuned to the com-mon patterns of Java programming.

Character Stream

Serializer

Parser

Standardized XML APIs

Application XML Document(s)

n Hybrid parsing

g

—This approach combines characteristics of the other three parsing models to create efficient parsers for special scenarios. For example, one common pattern combines pull parsing with one-step parsing. In this model, the application thinks it’s working with a one-step parser that has processed the whole XML document from start to end. In reality, the parsing process has just begun. As the application keeps accessing more objects on the DOM (or JDOM) tree, the parsing continues incrementally so that just enough of the document is parsed at any given point to give the application the objects it wants to see.

The reasons there are so many different models for parsing XML have to do with trade-offs between memory efficiency, computational efficiency, and ease of programming.

Table 2.5 identifies some of the characteristics of the parsing models. In the table, control of parsing refers to who manages the step-by-step parsing process. Pull parsing requires that the application do that; in all other models, the parser takes care of this process.

Control of context refers to who manages context information such as the level of nesting of elements and their location relative to one another. Both push and pull parsing dele-gate this control to the application; all other models build a tree of nodes that makes maintaining context much easier.This approach makes programming with DOM or JDOM generally easier than working with SAX.The price is memory and computation-al efficiency, because instantiating computation-all these objects takes up time and memory. Hybrid parsers attempt to offer the best of both worlds by presenting a tree view of the docu-ment but doing incredocu-mental parsing behind the scenes.

Table 2.5 XML Parsing Models and Their Trade-offs

Model Control of Control of Memory Computational Ease of parsing context efficiency efficiency programming

Pull Application Application High Highest Low

Push (SAX) Parser Application High High Low

One-step (DOM) Parser Parser Lowest Lowest High

One-step (JDOM) Parser Parser Low Low Highest

Hybrid (DOM) Parser Parser Medium Medium High

Hybrid (JDOM) Parser Parser Medium Medium Highest

In the Java world, a standardized API—Java API for XML Processing (JAXP)

g

_—exists

for instantiating XML parsers and parsing documents using either SAX or DOM.

Without JAXP, Java applications weren’t completely portable across XML parsers because different parsers, despite following SAX and DOM, had different APIs for creation, con-figuration, and parsing of documents. JAXP is currently released in version 1.2. It doesn’t support JDOM yet because the JDOM specification isn’t complete at this point.

Although XML parsing addresses the problem of feeding data from XML documents into applications, XML output addresses the reverse problem—applications generating XML documents. At the most basic level, an application can directly output XML

81 Processing XML

markup. In Figure 2.4, this is indicated by the application working with a character stream.This isn’t difficult to do, but handling the basic syntax rules (attributes quoting, special character escaping, and so on) can become cumbersome. In many cases, it might be easier for the application to construct a data structure (DOM or JDOM tree) describ-ing the XML document that should be generated.Then, the application can use a seriali-zation

g

process to traverse the document tree and emit XML markup corresponding to its elements.This capability isn’t directly defined in the DOM and JDOM APIs, but most XML toolkits make it very easy to do just that.

在文檔中 Building Web Services with Java (頁 103-106)