Extensible Stylesheet Language (XSL) is a specification from the World Wide Web Consortium (W3C) and is broken down into two complementary technologies: XSL Formatting Objects and XSL Transformations (XSLT). XSL Formatting Objects, a language for defining formatting such as fonts and page layout, is not covered in this book. XSLT, on the other hand, was primarily
designed to transform a well-formed XML document into XSL Formatting Objects.
Even though XSLT was designed to support XSL Formatting Objects, it has emerged as the preferred technology for all sorts of transformations. Transformation from XML to HTML is the most common, but XSLT can also be used to transform well-formed XML into just about any text file format. This will give XML- and XSLT-based web sites a major leg up as wireless devices become more prevalent because XSLT can also be used to transform XML into Wireless Markup Language or some other stripped-down format that wireless devices will require.
2.1 XSLT Introduction
Why is transformation so important? XML provides a simple syntax for defining markup, but it is up to individuals and organizations to define specific markup languages. There is no guarantee that two organizations will use the exact same markup; in fact, you may struggle to agree on consistent formats within the same group or company. One group may use <employee>, while others may use <worker> or <associate>. In order to share data, the XML data has to be transformed into a common format. This is where XSLT shines -- it eliminates the need to write custom computer programs to transform data. Instead, you simply create one or more XSLT stylesheets.
An XSLT processor is an application that applies an XSLT stylesheet to an XML data source.
Instead of modifying the original XML data, the result of the transformation is copied into something called a result tree, which can be directed to a static file, sent directly to an output stream, or even piped into another XSLT processor for further transformations. Figure 2-1 illustrates the transformation process, showing how the XML input, XSLT stylesheet, XSLT processor, and result tree relate to one another.
Figure 2-1. XSLT transformation
The XML input and XSLT stylesheet are normally two separate entities.[1] For the examples in this chapter, the XML will always reside in a text file. In future chapters, however, we will see how to improve performance by dealing with the XML as an in-memory object tree. This makes sense from a Java/XSLT perspective because most web applications will generate XML dynamically rather than deal with a series of static files. Since the XML data and XSLT stylesheet are clearly separated, it is very plausible to write several different stylesheets that convert the same XML into radically different formats.
[1] Section 2.7 of the XSLT specification covers embedded stylesheets.
XSLT transformation can occur on either the client or server, although server-side
transformations are currently dominant. Since a vast majority of Internet users do not use XSLT-compliant browsers (at the time of this writing), the typical model is to transform XML into HTML on the web server so the browser sees only the resulting HTML. In a closed corporate
environment where the browser feature set can be controlled, moving the XSLT transformation process to the browser can improve scalability and reduce network traffic.
It should be noted that XSLT stylesheets do not perform the same function as Cascading Style Sheets (CSS), which you may be familiar with. In the CSS model, style elements are applied to HTML or XML on the web browser, affecting formatting such as fonts and colors. CSS do not produce a separate result tree and cannot be applied in advance using a standalone processor as XSLT can. The CSS processing model operates on the underlying data in a top down fashion in a single pass, while XSLT can iterate and perform conditional logic on the XML data. Although XSLT can produce style instructions, its true role is that of a transformation language rather than a style language. XSL Formatting Objects, on the other hand, is a style language that is much more comparable to CSS.
For wireless applications, HTML is not typically generated. Instead, Wireless Markup Language (WML) is the current standard for cell phones and other wireless devices. In the future, new standards such as XHTML Basic may be used. When using an XSLT approach, the same XML data can be transformed into many forms, all via different stylesheets. Regardless of how many stylesheets are used, the XML data will remain unchanged. A typical web site might have the following stylesheets for a single XML home page:
homeBasic.xslt
For older web browsers homeIE5.xslt
Takes advantage of newer Internet Explorer features homeMozilla.xslt
Takes advantage of newer Netscape features homeWML.xslt
Transforms into Wireless Markup Language homeB2B.xslt
Transforms the XML into another XML format, suitable for "B2B-style" XML data feeds to customers
Schema evolution implies an upgrade to an existing data source where the structure of the data must be modified. When the data is stored in XML format, XSLT can be used to support schema evolution. For example, Version 1.0 of your application may store all of its files in XML format, but Version 2.0 might add new features that cannot be supported by the old 1.0 file format. A perfect solution is to write a single stylesheet to transform all of the old 1.0 XML files to the new 2.0 file format.
2.1.1 An XSLT Example
You need three components to perform XSLT transformations: an XML data source, an XSLT stylesheet, and an XSLT processor. The XSLT stylesheet is actually a well-formed XML
document, so the XSLT processor will also include or use an XML parser. Apache's Xalan is used for most of the examples in this book; the previous chapter listed several other processors that you may want to investigate. You can download Xalan from http://xml.apache.org. It uses and includes Apache's Xerces parser, but can be configured to use other parsers. The ability to swap out parsers is important because this gives you the flexibility to use the latest innovations as competing (and perhaps faster) parsers are released.
Example 2-1 represents an early prototype of a discussion forum home page. The complete discussion forum application will be developed in Chapter 7. This is the raw XML data, without any formatting instructions or HTML. As you can see, the home page simply lists the message boards that the user can choose to view.
Example 2-1. discussionForumHome.xml
<?xml version="1.0" encoding="UTF -8"?>
<discussionForumHome>
<messageBoard id="1" name="Java Programming"/>
<messageBoard id="2" name="XML Programming"/>
<messageBoard id="3" name="XSLT Questions"/>
</discussionForumHome>
It is assumed that this data will be generated dynamically as the result of a database query, rather than hardcoded as a static XML file. Regardless of its origin, the XML data says nothing about how to actually display the web page. For clarity, we will keep the XSLT stylesheet fairly simple at this point. The beauty of an XML/XSLT approach is that you can beef up the stylesheet later on without compromising any of the underlying XML data structures. Even more importantly, the Java code that will generate the XML data does not have to be cluttered up with HTML and user interface logic; it just produces the basic XML data. Once the format of the data has been defined, a Java programmer can begin working on the database logic and XML generation code, while another team member begins writing the XSLT stylesheets.
Example 2-2 lists the XSLT stylesheet that produces the home page. Don't worry if not
everything in this first example makes sense. XSLT is, after all, a completely new language. We will cover everything in detail throughout the remainder of this and the next chapter.
Example 2-2. discussionForumHome.xslt
<?xml version="1.0" encoding="UTF -8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html"/>
<!-- match the document root -->
<xsl:template match="/">
<html>
<head>
<title>Discussion Forum Home Page</title>
</head>
<body>
<h1>Discussion Forum Home Page</h1>
<h3>Please select a message board to view:</h3>
<ul>
<xsl:apply-templates
select="discussionForumHome/messageBoard"/>
</ul>
</body>
</html>
</xsl:template>
<!-- match a <messageBoard> element -->
<xsl:template match="messageBoard">
<li>
<a href="viewForum?id={@id}">
<xsl:value-of select="@name"/>
</a>
</li>
</xsl:template>
</xsl:stylesheet>
The filename extension for XSLT stylesheets is irrelevant. In this book, .xslt is used. Many stylesheet authors prefer .xsl.
The first thing that should jump out immediately is the fact that the XSLT stylesheet is also a well-formed XML document. Do not let the xsl: namespace prefix fool you -- everything in this document adheres to the same basic rules that every other XML document must follow. Like other XML files, the first line of the stylesheet is an XML declaration:
<?xml version="1.0" encoding="UTF -8"?>
Unless you are dealing with internationalization issues, this will remain unchanged for every stylesheet you write. This line is immediately followed by the document root element, which contains the remainder of the stylesheet:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
The <xsl:stylesheet> element has two attributes in this case. The first, version="1.0", specifies the version of the XSLT specification. Although this is the current version at the time of this writing, the next version of the XSLT specification is well underway and may be finished by the time you read this. You can stay abreast of the latest XSLT developments by visiting the W3C home page at http://www.w3.org.
The next attribute declares the XML namespace, defining the meaning of the xsl: prefix you see on all of the XSLT elements. The prefix xsl is conventional, but could be anything you choose.
This is useful if your document already uses the xsl prefix for other elements, and you do not want to introduce a naming conflict. This is really the entire point of namespaces: they help to avoid name conflicts. In XML, <a:book> and <b:book> can be discerned from one another because each book has a different namespace prefix. Since you pick the namespace prefix, this avoids the possibility that two vendors will use conflicting prefixes.
In the case of XSLT, the namespace prefix does not have to be xsl, but the value does have to be http://www.w3.org/1999/XSL/Transform. The value of a namespace is not necessarily a real web site, but the syntax is convenient because it helps ensure uniqueness. In the case of XSLT, 1999 represents the year that the URL was allocated for this purpose, and is not related to the version number. It is almost certain that future versions of XSLT will continue to use this same URL.