Product Details
XML Data Management: Native XML and XML-Enabled Database Systems

XML Data Management: Native XML and XML-Enabled Database Systems
By Akmal B. Chaudhri, Awais Rashid, Roberto Zicari

List Price: $49.99
Price: $34.21 & eligible for FREE Super Saver Shipping on orders over $25. Details

Availability: Usually ships in 24 hours
Ships from and sold by Amazon.com

33 new or used available from $14.91

Average customer review:

Product Description

Provides a discussion of the various XML data management approaches employed in a range of products and applications. Topics covered range from using XML with Oracle9i or SQL Server to embedded XML databases to Tamino. Softcover.


Product Details

  • Amazon Sales Rank: #1014536 in Books
  • Published on: 2003-03-22
  • Original language: English
  • Number of items: 1
  • Binding: Paperback
  • 688 pages

Editorial Reviews

From the Back Cover

"This is an excellent book that combines a practical and analytical look at the subject."

—Leo Korman, Principal Software Engineer, KANA Software

As organizations begin to employ XML within their information-management and exchange strategies, data management issues pertaining to storage, retrieval, querying, indexing, and manipulation increasingly arise. Moreover, new information-modeling challenges also appear. XML Data Management—with its contributions from experts at the forefront of the XML field—addresses these key issues and challenges, offering insights into the advantages and drawbacks of various XML solutions, best practices for modeling information with XML, and developing custom, in-house solutions.

In this book, you will find discussions on the newest native XML databases, along with information on working with XML-enabled relational database systems. In addition, XML Data Management thoroughly examines benchmarks and analysis techniques for performance of XML databases.

Topics covered include:

  • The power of good grammar and style in modeling information to alleviate the need for redundant domain knowledge
  • Tamino's XML storage, indexing, querying, and data access features
  • The features and APIs of open source eXist
  • Berkeley DB XML's ability to store XML documents natively
  • IBM's DB2 Universal Database and its support for XML applications
  • Xperanto's method of addressing information integration requirements
  • Oracle's XMLType for managing document centric XML documents
  • Microsoft SQL Server 2000's support for exporting and importing XML data
  • A generic architecture for storing XML documents in a relational database
  • X007, XMach-1, XMark, and other benchmarks for evaluating XML database performance
  • Numerous case studies demonstrate real-world problems, industry-tested solutions, and creative applications of XML data management solutions.

    Written for both XML and relational database professionals, XML Data Management provides a promising new approach to data management, one that is sure to positively impact the way organizations manage and exchange information.



    0201844524B01302003

    About the Author

    Akmal B. Chaudhri works for IBM developerWorks, where he is also Zone Editor for Special Projects. A recognized authority on objects and databases, he has been a regular presenter at many international conferences, including OOPSLA and Object World. In addition, he has edited several books on these topics.

    Awais Rashid is a Lecturer in the Computing Department of Lancaster University in the U.K. where he leads research into the application of new technologies, such as XML and aspect-oriented programming, and database systems. He has actively published on these topics and has organized a number of relevant international events.

    Roberto Zicari is a full Professor for Databases and Information Systems at the Johann Wolfgang Goethe University in Frankfurt/Main, Germany. He is an internationally recognized expert in Object Technology. He has consulted and lectured in Europe, North America, and Japan.

    0201844524AB01312003

    Excerpt. © Reprinted by permission. All rights reserved.

    The past few years have seen a dramatic increase in the popularity and adoption of XML, the Extensible Markup Language. This explosive growth is driven by its ability to provide a standardized, extensible means of including semantic information within documents describing semi-structured data. This makes it possible to address the shortcomings of existing markup languages such as HTML and support data exchange in e-business environments.

    Consider, for instance, the simple HTML document in Listing P.1. The data contained in the document is intertwined with information about its presentation. In fact, the tags describe only how the data is to be formatted. There is no semantic information that the data represents a person's name and address. Consequently, an interpreter cannot make any sound judgments about the semantics as the tags could as well have enclosed information about a car and its parts. Systems such as WIRE (Aggarwal et al. 1998) can interpret the information by using search templates based on the structure of HTML files and the importance of information enclosed in tags defining headings and so forth. However, such interpretation lacks soundness, and its accuracy is context dependent.

    Listing P.1 An HTML Document with Data about a Person



    Person Information


    Name: John Doe


    Address: 10 Church Street, Lancaster LAX 2YZ,
    UK




    Dynamic Web pages, where the data resides in a backend database and is served using predefined templates, reduce the coupling between the data and its representation. However, the semantics of the data can still be confusing when exchanging information in an e-business environment. A particular item could be represented using different names (in the simplest case) in two systems in a business-to-business transaction. This enforces adherence to complex, often proprietary, document standards.

    XML provides inherent support for addressing the above problems, as the data in an XML document is self-describing. However, the increasing adoption of XML has also raised new challenges. One of the key issues is the management of large collections of XML documents. There is a need for tools and techniques for effective storage, retrieval, and manipulation of XML data. The aim of this book is to discuss the state-of-the-art in such tools and techniques.

    This preface introduces the basics of XML and some related technologies before moving on to providing an overview of issues relating to XML data management and approaches addressing these issues. Only an overview of XML and related technologies is provided because several other sources cover these concepts in depth.

    P.1 What Is XML?

    XML is a W3C standard for document markup. It makes it possible to define custom tags describing the data enclosed by them. An example XML document containing data about a person is shown in Listing P.2. Note that tags in XML can have attributes. However, for simplicity, they have not been used in this example.

    Listing P.2 An XML Document with Data about a Person




    Doe
    John


    10
    Church Street
    Lancaster
    LAX 2YZ
    UK


    Unlike the HTML document in Listing P.1, the document in Listing P.2 contains only the data about the person and no representational information. The data and its meaning can be read from the document and the document formatted in a range of fashions as desired. One standard approach is to use XSL, the eXtensible Stylesheet Language.

    The flexible nature of XML makes it an ideal basis for defining arbitrary languages. One such example is WML, the Wireless Markup Language. Similarly, the XML schema language used to describe the structure of XML documents is based on XML itself.

    P.1.1 Well-Formed and Valid XML

    Although XML syntax is flexible, it is constrained by a grammar that governs the permitted tag names, attachment of attributes to tags, and so on. All XML documents must conform to these basic grammar rules. Such conformant documents are said to be well formed and can be interpreted by an XML interpreter, which means it's not necessary to write an interpreter for each XML document instance.In addition to being well formed, the structure of a particular XML document can be validated against a Document Type Definition (DTD) or an XML schema. An XML document conforming to a given DTD or schema is said to be valid.

    P.1.2 Data-Centric and Document-Centric XML

    XML documents can be classified on the basis of data they contain. Data-centric documents capture structured data such as that pertaining to a product catalog, an order, or an invoice. Document-centric documents, on the other hand, capture unstructured data as in articles, books, or e-mails. Of course, the two types can be combined to form hybrid documents that are both data-centric and document-centric. Listings P.3 and P.4 provide examples of data-centric and document-centric XML, respectively.

    Listing P.3 Data-Centric XML


    Doe

    1-234-56789-0
    2
    30.00


    Listing P.4 Document-Centric XML


    XML builds on the principles of two
    existing languages, HTML
    and SGML to create a simple
    mechanism . . .
    The generalized markup concept . . .

    P.2 XML Concepts

    This section provides an overview of basic XML concepts: DTDs, XML schemas, DOM, and SAX.

    P.2.1 DTDs and XML Schemas

    Both DTDs and XML schemas are mechanisms used to define the structure of XML documents. They determine what elements can be contained within the XML document, how they are to be used, what default values their attributes can have, and so on. Given a DTD or XML schema and its corresponding XML document, a parser can validate whether the document conforms to the desired structure and constraints. This is particularly useful in data exchange scenarios as DTDs and XML schemas provide and enforce a common vocabulary for the data to be exchanged.

    XML DTDs are subsets of SGML (Standard Generalized Markup Language) DTDs. An XML DTD lists the various elements and attributes in a document and the context in which they are to be used. It can also list any elements a document cannot contain. However, it does not define constraints such as the number of instances of a particular element within a document, the type of data within each element, and so on. Consequently, DTDs are inherently suitable for document-centric XML as compared to data-centric XML because data-typing and instantiation constraints are less critical in the former case. However, they can be and are being used for both types of documents.

    Listing P.5 shows a DTD for the simple XML document in Listing P.2. It describes which primitive elements form valid components for the three composite ones: person, name, and address. The keyword #PCDATA signifies that the element does not contain any tags or child elements and only parsed character data.

    Listing P.5 A DTD for the Simple XML Document in Listing P.2











    XML schemas differ from DTDs in that the XML schema definition language is based on XML itself. As a result, unlike DTDs, the set of constructs available for defining an XML document is extensible. XML schemas also support namespaces and richer and more complex structures than DTDs. In addition, stronger typing constraints on the data enclosed by a tag can be described because a range of primitive data types such as string, decimal, and integer are supported. This makes XML schemas highly suitable for defining data-centric documents. Another significant advantage is that XML schema definitions can exploit the same data management mechanisms as designed for XML; an XML schema is an XML document itself. This is in direct contrast with DTDs, which require specific support to be built into an XML data management system.

    Listing P.6 shows an XML schema for the simple XML document in Listing P.2. The sequence tag is a compositor indicating an ordered sequence of subelements. There are other compositors for choice and all. Also, note that, as shown for the address element, it is possible to constrain the minimum and maximum instances of an element within a document. Although not shown in the example, it is possible to define custom complex and simple types. For instance, a complex type Address could have been defined for the address element.

    Listing P.6 An XML Schema for the Simple XML Document in Listing P.2





    ...


    Customer Reviews

    Precisely what we needed5
    At our company, we write Java applications. Soon, we got to the point that we needed a more formal way to read/write data than merely an ad hoc approach. We use XML. The obvious approach is to use a well tested relational database, like those supplied by IBM, Oracle or Microsoft. A problem was getting detailed, objective explanations of what would be involved with each choice. Each vendor is perfectly willing to be our "friend" and supply us with reams of documentation. But still...

    The chapters in this book that describe how to hook up XML to those 3 vendors' databases were excellent and clear.

    But what we ended up doing was going with something suggested in ANOTHER chapter - building an embedded XML database. You will not see this advocated by a vendor; there is no sale for them here. Other than this book, we found it tough to get lucid explanations of the pros and cons of this route. It will take more work, but we hope it will give better performance - no interprocess communication, for one thing. Plus of course no licence fees, and easier installation and management, since we will have access/own all the source code. This was not our original intention, by any means. But the book's comparative analysis was so persuasive that we ended up taking this road. (Hopefully, it will not be a dead end.)

    That one chapter on embedded XML databases was, to us, the most precious thing in the entire book!

    Well overview of available products and strategies5
    I think it is a very good book. It describes several actors in the XML data storage world. It also points out several strategies to deal with XML in relational databases.
    It is very easy to read and the language is very clear.
    Some experience in XML and how to store it is recommended in order to get the most of it.
    I really enjoyed the chapter on eXist as it really goes into details about the index and storage architecture. It is stays quite high level though.
    It helps you understand pros and cons of the different products and architectures (client/server as opposed to embedded).
    Everyone dealing with XML storage should read it.