(red wave icon)

XHTML and XML Update and Overview:
Seminar Outline.

Computing & Information Services Department.
instructor: jim.cerny@unh.edu


Description.

For existing Web authors, this is a look at where the standards are heading with XHTML and XML for markup of pages and information. Emphasis on the concepts, jargon, and current tools. Not hands-on. Prerequisite: Basic HTML useful, but not essential.


    Background.

  1. History and context.
    • On the surface much Web page change seems to involve adding graphics, sounds, and other features, but there are really deep changes going on at the standards level.
    • Structure (semantics) and presentation (layout).
    • SGML as a metalanguage for authoring markup languages. Look at the family tree
    • Issues of internationalization.
    • Issues of accessibility.
    • Planning for long-term, flexible, multiple use of information. Planning for things we don't even know yet!
    • The role of XML.
    • The role of XHTML.
    • Whether to be excited or dismayed?!
    • Contingent on support in software tools for editing, authoring, display, and other processing. Vendors are ambiguous in their support of standards, e.g., Microsoft's recurring wish to "embrace and extend."


  2. Concepts and components.
    • XML (eXtensible Markup Language). This is for structure or semantics, allowing you to develop your own language.
    • DTD (Document Type Definition). This is the grammar, the elements of the language and the rules for putting them together.
    • Schema. This is functionally equivalent to a DTD, but written in XML.
    • XSL (eXtensible Style Language). This is an application of XML to develop your own stylesheets.
    • DOM (Document Object Model). This is a programming interface for HTML and XML that allows scripts and other programs to update the display appearance.
    • Elements, tags, attributes, and values. An element may consist of:
      <tag attribute="value">content</tag>
    • Well-formedness. Is the syntax correct.
    • Validity. Does it follow the rules of the grammar (DTD?



    Upgrading from HTML to XHTML.

  3. XHTML differences from HTML. Summarized from the XHTML 1.0 standard and an article by Peter Wiggin.
    • Requires a DOCTYPE declaration referencing a a DTD (Document Type Declaration):
      <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN
      "http://www.w3.org/TR/xhtml1/DTD/transitional.dtd">
    • Root element must be <html> and designate the XHTML 1.0 namespace:
      <html xmlns="http://www/w3/org/TR/xhtml1">
    • All HTML must be in lower case:
      <img src="foobar.gif" align="left">
    • All attribute values must be quoted:
      <table border="0">
    • All non-empty elements must be terminated.
      <li>item one</li>
    • All empty elements must be terminated.
      <hr />
    • Attribute values cannot be minimized:
      <input type="radio" checked="checked">
    • Elements must nest, not overlap.
      <p>This is <b>bold</b></p>.
    • Some elements are required:
      <head>  <title>  <body>
    • Become aware of the deprecated XHTML elements and elements planned for removal (such as FRAMES). This is enforced in Strict XHTML.
    • Become conscious of the rules for which elements may be contained in other elements. For example, in Strict XHTML the only element allowed in OL or UL lists is LI. And in turn there are rules for the elements allowed in LI elements.


  4. A simple example in non-compliant HTML and in transitional compliant XHTML and in strict compliant XHTML. Let's test each version with the validation service at the W3C.

  5. Dave Raggett's Tidy program will convert many HTML elements to XHTML compliance, at least to Transitional compliance.

    Run Tidy as a command-line program. To eliminate complications in specifying file paths, let's assume that your files are in the same subdirectory with the Tidy executable and that you want to modify the original files (assuming you have another copy of them in case you make a mess of things).
    • On Windows, open a Command Prompt window.
    • Change directory to the subdirectory.
    • Issue a command as described in the Tidy manual.
    • For example, to confirm the Tidy version and options:
      tidy -help
    • For example, to process a single index.html file, with errors logged:
      tidy -f err.txt -m index.html
    • For example, to process all the HTML files and log errors:
      tidy -f tiderr.txt -m *.html

    Even more useful is the GUI version of Tidy. That allows either direct process of a file via a graphical user interface or saving a configuration file for command line processing.

  6. What we really need, of course, is an array of tools that are XHTML-aware. Besides converters like Tidy, we need standards-compliant browsers and authoring tools. See the browser XML support chart for browser status with XHTML's parent, XML. At this time, the major HTML authoring tools (Dreamweaver 3.0, FrontPage 4.0) do not include XHTML support.

  7. Other software tools may help in special circumstances. For example, there is the Demoroniser, by John Walker, designed to "correct moronic Microsoft HTML. Even if it not useful, it injects an element of humor in the standards process!

  8. Jim's Summary Advice.
    • Don't panic!
    • If you are hands-on with HTML, begin experimenting now with XHTML. Pick some documents to convert. This will give you an awareness of the concepts and the technology.
    • If you are not hands-on, there is little risk in waiting for the conversion and support tools to get better.
    • Everyone should begin to reform their coding practices. Begin now to use and understand Cascading Style Sheets (CSS). That means no more FONT elements!



    XML.  Now or Later?

  9. XML is not yet ready for most Web development. It assumes a lot of processing software is in place, in effect a whole system.

  10. <style> and <script> code must be enclosed in a CDATA section if treated as XML rather than XHTML:
    <script language="JavaScript">
    <!--
    <![CDATA[
    document.write("<p>Hello World.</p>");
    ]]>
    //-->
    </script>


  11. A Microsoft example. This is just to give a sense of the look and feel of the coding involved. Designing a full DTD is complex work, not done casually.

  12. A DocBook example by by Paul Sand.


  13. But Wait, There's More!  RDF metadata.

  14. Metadata is data about data.

  15. Metadata across Web-based activities. RDF at W3C.

  16. Existing Dublin Core meta tags.

  17. Books and Glossaries.


Return to XHTML and XML main page.

jim.cerny@unh.edu (photo of Jim)Stop me before I click again! (XHTML validation icon)


[an error occurred while processing this directive]