![]() | ![]() |
Home |
|
|
XML Services in Adaptive Server Enterprise |
|
| Chapter 1 Introduction to XML Services |
|
| XML in the database |
Like HTML (Hypertext Markup Language), XML is a markup language and a subset of SGML (Standardized General Markup Language). XML, however, is more complete and disciplined, and it allows you to define your own application-oriented markup tags. These properties make XML particularly suitable for data interchange.
You can generate XML-formatted documents from data stored in Adaptive Server and, conversely, store XML documents and data extracted from them in Adaptive Server. You can also use Adaptive Server to search XML documents stored on the Web.
ReferencesThis chapter presents an overview of XML. For detailed information, refer to these Web documents:
XML is a markup language and subset of SGML that was created to provide functionality that goes beyond that of HTML for Web publishing and distributed document processing.
XML is less complex than SGML, but more complex and flexible than HTML. Although XML and HTML can usually be read by the same browsers and processors, XML has characteristics that make it better able to share documents:
XML documents possess a strict phrase structure that makes it easy to find and access data. For example, opening tags of all elements must have both an opening tag and a corresponding closing tag, for example, <p> A paragraph.</p>.
XML lets you develop and use tags that distinguish different types of data, for example, customer numbers or item numbers.
XML lets you create an application-specific document type, which makes it possible to distinguish one kind of document from another.
XML documents allow different displays of the XML data. XML documents, like HTML documents, contain only markup and content; they do not contain formatting instructions. Formatting instructions are normally provided on the client.
You can store XML documents in Adaptive Server as:
XML in a field of a Java object
XML in a text or image column
XML in a char or varchar column
Parsed XML in an image column
The sample Order document is designed for a purchase order application. Customers submit orders, which are identified by a date and a customer ID. Each order item has an item ID, an item name, a quantity, and a unit designation.
It might display on screen like this:
ORDER
Date: July 4, 2003
Customer ID: 123
Customer Name: Acme Alpha
Items:
Item ID | Item Name | Quantity |
987 | Coupler | 5 |
654 | Connector | 3 dozen |
579 | Clasp | 1 |
The following is one representation of this data in XML:
<?xml version="1.0"?>
<Order>
<Date>2003/07/04</Date>
<CustomerId>123</CustomerId>
<CustomerName>Acme Alpha</CustomerName><Item> <ItemId> 987</ItemId> <ItemName>Coupler</ItemName> <Quantity>5</Quantity> </Item>
<Item> <ItemId>654</ItemId> <ItemName>Connector</ItemName> <Quantity unit="12">3</Quantity> </Item>
<Item> <ItemId>579</ItemId> <ItemName>Clasp</ItemName> <Quantity>1</Quantity> </Item>
</Order>
The XML document has two unique characteristics:
The XML document does not indicate type, style, or color for specifying item display.
The markup tags are strictly nested. Each opening tag (<tag> ) has a corresponding closing (</tag>).
The XML document for the order data consists of:
The XML declaration, <?xml version="1.0"?>, identifying "Order" as an XML document.
XML represents documents as character data. In each document, you specify the character encoding (character set), either explicitly or implicitly. To explicitly specify the character set, include it in the XML declaration. For example:
<?xml version="1.0" encoding="ISO-8859-1">
If you do not include the character set in the XML declaration, the default, UTF8, is used.
When the default character sets of the client and server differ, Adaptive Server bypasses normal character-set translations so that the declared character set continues to match the actual character set. See "Character sets and XML data".
User-created element tags, such as <Order>...</Order>, <CustomerId>...</CustomerId>, <Item>....</Item>.
Text data, such as "Acme Alpha," "Coupler," and "579."
Attributes embedded in element tags, such as <Quantity unit = "12">. This embedding allows you to customize elements.
If your document contains these components, and the element tags are strictly nested, it is called a well-formed XML document. In the example above, element tags describe the data they contain, and the document contains no formatting instructions.
Here is another example of an XML document:
<?xml version="1.0"?>
<Info>
<OneTag>1999/07/04</OneTag>
<AnotherTag>123</AnotherTag>
<LastTag>Acme Alpha</LastTag> <Thing>
<ThingId> 987</ThingId>
<ThingName>Coupler</ThingName>
<Amount>5</Amount>
<Thing/> <Thing>
<ThingId>654</ThingId>
<ThingName>Connecter</ThingName>
</Thing> <Thing>
<ThingId>579</ThingId>
<ThingName>Clasp</ThingName>
<Amount>1</Amount>
</Thing>
</Info>This example, called "Info," is also a well-formed document and has the same structure and data as the XML Order document. However, it would not be recognized by a processor designed for Order documents because the document type definition (DTD) that Info uses is different from that of the Order document. For more information about DTDs, see "XML document types").
Consider a purchase order application. Customers submit orders, which are identified by a Date and the CustomerID, and which list one or more items, each of which has an ItemID, ItemName, Quantity, and units.
The data for such an order might be displayed on a screen as follows:
ORDER
Date: July 4, 1999
Customer ID: 123
Customer Name: Acme Alpha
Items:
Item ID | Item Name | Quantity |
987 | Coupler | 5 |
654 | Connector | 3 dozen |
579 | Clasp | 1 |
This data indicates that the customer named "Acme Alpha," whose Customer Id is "123", submitted an order on 1999/07/04 for couplers, connectors, and clasps.
The HTML text for this display of order data is as follows:
<html>
<body>
<p>ORDER
<p>Date: July 4, 1999
<p>Customer ID: 123
<p>Customer Name: Acme Alpha
<p>Items:</p>
<table bgcolor=white align=left border="3"
cellpadding=3>
<tr><td><B>Item ID </B></tr>
<td><B>Item Name </B></tr>
<td><B>Quantity </B>
</td></td></tr>
<tr><td>987</td>
<td>Coupler</td>
<td>5</td></tr>
<tr><td>654</td>
<td>Connector</td>
<td>3 dozen</td></tr>
<tr><td>579</td>
<td>Clasp</td>
<td>1</td></tr>
</table>
</body>
</html>This HTML text has certain limitations:
It contains both data and formatting specifications.
The data is the Customer Id and the various Customer Names, Item Names, and Quantities.
The formatting specifications are the indications for type style (<b>....</b>), color (bcolor=white), and layout (<table>....</table>, and also the supplementary field names, such as "Customer Name", and so on.
The structure of HTML documents is not well suited for extracting data.Some elements, such as tables, require strictly bracketed opening and closing tags, but other elements, such as paragraph tags ("<p>"), have optional closing tags.Some elements, such as paragraph tags ("<p>") are used for many sorts of data, so it is difficult to distinguish between a "123" that is a Customer ID and a "123" that is an Item ID, without specialized inference from surrounding field names.
This merging of data and formatting, and the lack of strict phrase structure, makes it difficult to adapt HTML documents to different presentation styles, and makes it difficult to use HTML documents for data interchange and storage. XML is similar to HTML, but includes restrictions and extensions that address these drawbacks.
XML document typesA document type definition (DTD) defines the structure of a class of XML documents, making it possible to distinguish between classes. A DTD is a list of element and attribute definitions unique to a class. Once you have set up a DTD, you can reference that DTD in another document, or embed it in the current XML document.
The DTD for XML Order documents, discussed in "A sample XML document" on page 3 looks like this:
<!ELEMENT Order (Date, CustomerId, CustomerName, Item+)> <!ELEMENT Date (#PCDATA)> <!ELEMENT CustomerId (#PCDATA)> <!ELEMENT CustomerName (#PCDATA)> <!ELEMENT Item (ItemId, ItemName, Quantity)> <!ELEMENT ItemId (#PCDATA)> <!ELEMENT ItemName (#PCDATA)> <!ELEMENT Quantity (#PCDATA)> <!ATTLIST Quantity units CDATA #IMPLIED>
Line by line, this DTD specifies that:
An order must consist of a date, a customer ID, a customer name, and one or more items. The plus sign, "+", indicates one or more items. Items signaled by a plus sign are required. A question mark in the same place indicates an optional element. An asterisk in the element indicates that an element can occur zero or more times. (For example, if the word "Item*" in the first line above were starred, there could be no items in the order, or any number of items.)
Elements defined by "(#PCDATA)" are character text.
The "<ATTLIST...>" definition in the last line specifies that quantity elements have a "units" attribute; "#IMPLIED", at the end of the last line, indicates that the "units" attribute is optional.
The character text of XML documents is not constrained. For example, there is no way to specify that the text of a quantity element should be numeric, and thus the following display of data would be valid:
<Quantity unit="Baker's dozen">three</Quantity>
<Quantity unit="six packs">plenty</Quantity>
Restrictions on the text of elements must be handled by the applications that process XML data.
An XML's DTD must follow the <?xml version="1.0"?> instruction. You can either include the DTD within your XML document, or you can reference an external DTD.
To reference a DTD externally, use something similar to:
<?xml version="1.0"?> <!DOCTYPE Order SYSTEM "Order.dtd"> <Order> ... </Order>
Here's how an embedded DTD might look:
<?xml version="1.0"?>
<!DOCTYPE Order [
<!ELEMENT Order (Date, CustomerId, CustomerName,
Item+)>
<!ELEMENT Date (#PCDATA)
<!ELEMENT CustomerId (#PCDATA)>
<!ELEMENT CustomerName (#PCDATA)>
<!ELEMENT Item (ItemId, ItemName, Quantity)>
<!ELEMENT ItemId (#PCDATA)>
<!ELEMENT ItemName (#PCDATA)>
<!ELEMENT Quantity (#PCDATA)>
<!ATTLIST Quantity units CDATA #IMPLIED> ]>
<Order>
<Date>1999/07/04</Date>
<CustomerId>123</CustomerId>
<CustomerName>Acme Alpha</CustomerName> <Item>
...
</Item>
</Order>DTDs are not required for XML documents. However, a valid XML document has a DTD and conforms to that DTD.
If the declared character sets of your client and server differ, you must be careful when declaring the character set of your XML documents.
Every XML document has a character encoding that is either specified in the encoding declaration of the XML declaration or is UTF-8 by default.
If you store an XML document in a character column that is not text, Adaptive Server translates the document into the server's character set before storing it. This is the way Adaptive Server normally translates character data, and you must ensure that the declared character set of the XML document matches that of the server.
If you store an XML document in a text column, Adaptive Server recognizes the XML document from the XML declaration and does not translate the character set to that of the server. When you read such an XML document from the database, Adaptive Server does not translate the character set of the data to that of the client, since doing so might compromise the integrity of the XML document.
If you store an XML document in an image column, Adaptive Server performs no conversions. This is the way Adaptive Server normally processes image data.
|
|