+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"\r
+"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" [\r
+<!ENTITY html2db "<code>html2db.xsl</code>">\r
+]>\r
+<html xmlns:x="http://www.w3.org/1999/xhtml"\r
+ xmlns:db="urn:docbook">\r
+<head>\r
+<title>This title is ignored</title>\r
+</head>\r
+<body>\r
+\r
+<h1>html2db.xsl</h1>\r
+\r
+<!-- The xmlns attribute escapes into the Docbook namespace -->\r
+<articleinfo xmlns="urn:docbook">\r
+ <author>\r
+ <firstname>Oliver</firstname>\r
+ <surname>Steele</surname>\r
+ </author>\r
+ <revhistory>\r
+ <revision>\r
+ <revnumber>1</revnumber>\r
+ <date>2004-07-30</date>\r
+ </revision>\r
+ <revision>\r
+ <revnumber>1.0.1</revnumber>\r
+ <date>2004-08-01</date>\r
+ <revdescription><para>Editorial changes to the\r
+ readme.</para></revdescription>\r
+ </revision>\r
+ </revhistory>\r
+ <date>2004-07-30</date>\r
+</articleinfo>\r
+\r
+<h2>Overview</h2>\r
+\r
+<p>&html2db; converts an XHTML source document into a Docbook output\r
+document. It provides features for customizing the generation of the\r
+output, so that the output can be tuned by annotating\r
+the source, rather than hand-editing the output. This makes it useful\r
+in a processing pipeline where the source documents are maintained in\r
+HTML, although it can be used as a one-time conversion tool\r
+too.</p>\r
+\r
+<p>This document is an example of &html2db; used in conjunction with\r
+the Docbook XSL stylesheets. The <a href="index.src.html">source\r
+file</a> is an XHTML file with some embedded Docbook elements and\r
+processing instructions. &html2db; compiles it into a <a\r
+href="index.xml">Docbook document</a>, which can be used to generate\r
+this output file (which includes a Table of Contents), a <a\r
+href="docs/index.html">chunked HTML file</a>, a <a\r
+href="html2db.pdf">PDF</a>, or other formats.</p>\r
+\r
+<h2>Features</h2>\r
+<dl>\r
+<dt>XSLT implementation</dt>\r
+<dd>This tool is designed to be embedded within an XSLT processing\r
+pipeline. <code>html2html.xslt</code> can be used in a custom\r
+stylesheet or integrated into a larger system. See <a\r
+href="#embedding">Overriding</a>.</dd>\r
+\r
+<dt>Customizable</dt>\r
+<dd>The output can be customized by the means of additonal markup in\r
+the XHMTL source. See the section on <a\r
+href="#customization">customization</a>.</dd>\r
+\r
+<dt>Creates outline structure</dt>\r
+<dd><code>h1</code>, <code>h2</code>, etc. are turned into nested\r
+<code>section</code> and <code>title</code> elements (as opposed to\r
+bridge heads).</dd>\r
+\r
+<dt>Accepts a wide variety of XHTML</dt>\r
+<dd>In particular, &html2db; automatically wraps <dfn>naked item\r
+text</dfn> (text that is not enclosed in a <code><p></code>)\r
+inside a table cell or list item. Naked text is a common property of\r
+XHTML documents, but needs to be clothed to create valid\r
+Docbook.<db:footnote><p>This feature is limited. See <a\r
+href="#implicit-blocks">Implicit Blocks</a>.)</p></db:footnote></dd>\r
+\r
+</dl>\r
+\r
+<h2>Requirements</h2>\r
+<ul>\r
+<li>Java: JRE or JDK 1.3 or greater.</li>\r
+<li>Xalan 2.5.0.</li>\r
+<li>Familiarity with installing and running JAR files.</li>\r
+</ul>\r
+\r
+<p>&html2db; might work with earlier versions of Java and Xalan, and\r
+it might work with other XSLT processors such as Saxon and\r
+xsltproc.</p>\r
+\r
+<h2>License</h2>\r
+<p>This software is released under the Open Source <a href="http://www.opensource.org/licenses/artistic-license.php">Artistic License</a>.</p>\r
+\r
+<h2>Installation</h2>\r
+<ul>\r
+<li>Install JRE 1.3 or higher.</li>\r
+<li>Install Xalan, if necessary.</li>\r
+<li>Download <code>html2db-1.zip</code> from <a href="http://osteele.com/sources/html2db.zip">http://osteele.com/sources/html2db-1.zip</a>.</li>\r
+<li>Unzip <code>html2db-1.zip</code>.</li>\r
+</ul>\r
+\r
+<h2>Usage</h2>\r
+<p>Use Xalan to process an XHTML source file into a Docbook file:</p>\r
+\r
+<pre class="example">\r
+java org.apache.xalan.xslt.Process -XSL html2dbk.xsl -IN doc.html > doc.xml\r
+</pre>\r
+\r
+<p>See <a href="index.src.html"><code>index.src.html</code></a> for an\r
+example of an input file.</p>\r
+\r
+<p>If your source files are in HTML, not XHTML, you may find the <a\r
+href="http://tidy.sourceforge.net/">Tidy</a> tool useful. This is a\r
+tool that converts from HTML to XHTML, and can be added to the front\r
+of your processing pipeline.</p>\r
+\r
+<p>(If you need to process HTML and you don't know or can't figure out\r
+from context what a processing pipeline is, &html2db; is probably not\r
+the right tool for you, and you should look for a local XML or Java\r
+guru or for a commercially supported product.)</p>\r
+\r
+<h2>Specification</h2>\r
+\r
+<h3>XHTML Elements</h3>\r
+<p><code>code/i</code> stands for "an <code>i</code> element\r
+immediately within a <code>code</code> element". This notation is\r
+from XPath.</p>\r
+\r
+<p>XHTML elements must be in the XHTML Transitional namespace,\r
+<code>http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd</code>.</p>\r
+\r
+<table>\r
+<tr>\r
+<th>XHTML</th>\r
+<th>Docbook</th>\r
+<th>Notes</th>\r
+</tr>\r
+\r
+<tr>\r
+<td><code>b</code>, <code>i</code>, <code>em</code>, <code>strong</code></td>\r
+<td><code>emphasis</code></td>\r
+<td>The <code>role</code> attribute is the original tag name</td>\r
+</tr>\r
+\r
+<tr>\r
+<td><code>dfn</code></td>\r
+<td><code>glossitem</code>, and also <code>primary</code> <code>indexterm</code></td>\r
+</tr>\r
+\r
+<tr>\r
+<td><code>code/i</code>, <code>tt/i</code>, <code>pre/i</code></td>\r
+<td><code>replaceable</code></td>\r
+<td>In practice, <code>i</code> within a monospace content is usually used to mean replaceable text. If you're using it for emphasis, use <code>em</code> instead.</td>\r
+</tr>\r
+\r
+<tr>\r
+<td><code>pre</code>, <code>body/code</code></td>\r
+<td><code>programlisting</code></td>\r
+</tr>\r
+\r
+<tr>\r
+<td><code>img</code></td>\r
+<td><code>inlinemediaobject/imageobject/imagedata</code></td>\r
+<td>In an inline context.</td>\r
+</tr>\r
+\r
+<tr>\r
+<td><code>img</code></td>\r
+<td><code>[informal]figure/mediaobject/imageobject/imagedata</code></td>\r
+<td>If it has a <code>title</code> attribute or <code>db:title</code> it's wrapped in a <code>figure</code>. Otherwise it's wrapped in an <code>informalfigure</code>.</td>\r
+</tr>\r
+\r
+<tr>\r
+<td><code>table</code></td>\r
+<td><code>[informal]table</code></td>\r
+<td>XHTML <code>table</code> becomes Docbook <code>table</code> if it has a <code>summary</code> attribute; <code>informaltable</code> otherwise.</td>\r
+</tr>\r
+\r
+<tr>\r
+<td><code>ul</code></td>\r
+<td><code>itemizedlist</code></td>\r
+<td>But see the processing instruction <a href="#simplelist">below</a>.</td>\r
+</tr>\r
+</table>\r
+\r
+\r
+\r
+<h3>Links</h3>\r
+<table summary="Link Translation">\r
+<tr>\r
+<th>XHTML</th>\r
+<th>Docbook</th>\r
+<th>Notes</th>\r
+</tr>\r
+\r
+<tr>\r
+<td><code><a name="<var>name</var>"></code></td>\r
+<td><code><anchor id="{$anchor-id-prefix}<var>name</var>"></code></td>\r
+<td>An anchor within a <code>h<var>n</var></code> element is attached to the enclosing <code>section</code> as an <code>id</code> attribute instead.</td>\r
+</tr>\r
+\r
+<tr>\r
+<td><code><a href="#<var>name</var>"></code></td>\r
+<td><code><link linkend="{$anchor-id-prefix}<var>name</var>"></code></td>\r
+</tr>\r
+\r
+<tr>\r
+<td><code><a href="<var>url</var>"></code></td>\r
+<td><code><ulink url="<var>name</var>"></code></td>\r
+</tr>\r
+\r
+<tr>\r
+<td><code><a name="mailto:<var>address</var>"></code></td>\r
+<td><code><email><var>address</var></email></code></td>\r
+</tr>\r
+\r
+</table>\r
+\r
+<h3 id="tables">Tables</h3>\r
+\r
+<p>XHTML <code>table</code> support is minimal. &html2db; changes the\r
+element names and counts the columns (this is necessary to get table\r
+footnotes to span all the columns), but it does not attempt to deal\r
+with tables in their full generality.</p>\r
+\r
+<p>An XHTML <code>table</code> with a <code>summary</code> attribute\r
+generates a <code>table</code>, whose <code>title</code> is the value\r
+of that summary. An XHTML <code>table</code> without a\r
+<code>summary</code> generates an <code>informaltable</code>.</p>\r
+\r
+<p>Any <code>tr</code>s that contain <code>th</code>s are pulled to\r
+the top of the table, and placed inside a <code>thead</code>. Other\r
+<code>tr</code>s are placed inside a <code>tbody</code>. This matches\r
+the commanon XHTML <code>table</code> pattern, where the first row is\r
+a header row.</p>\r
+\r
+<h3 id="implicit-blocks">Implicit Blocks</h3>\r
+<p>XHTML allows <code>li</code>, <code>dd</code>, and <code>td</code>\r
+elements to contain either inline text (for instance,\r
+<code><li>a list item</li></code>) or block structure\r
+(<code><li><p>a block</p></li></code>). The\r
+corresponding Docbook elements require block structure, such as\r
+<code>para</code>.</p>\r
+\r
+<p>&html2db; provides limited support for wrapping naked text in\r
+these positions in <code>para</code> elements. If a list item or\r
+table cell item directly contains text, all text up to the position of\r
+the first element (or all text, if there is no element) is wrapped in\r
+<code>para</code>. This handles the simple case of an item that\r
+directly contains text, and also the case of an item that contains\r
+text followed by blocks such as paragraphs.</p>\r
+\r
+<p>Note that this algorithm is easily confused. It doesn't\r
+distinguish between block and inline XHTML elements, so it will only\r
+wrap the first word in <code><li>some <b>bold</b>\r
+text</li></code>, leading to badly formatted output. Twhe\r
+workaround is to wrap troublesome content in explicit\r
+<code><p></code> tags.</p>\r
+\r
+<h3 id="docbook-elements">Docbook Elements</h3>\r
+\r
+<p>Elements from the Docbook namespace are passed through as is.\r
+There are two ways to include a Docbook element in your XHTML\r
+source:</p>\r
+\r
+<dl>\r
+<dt>Global prefix</dt>\r
+<dd><p>A <dfn>fake Docbook namespace</dfn><db:footnote><p>The fake\r
+Docbook namespace is <code>urn:docbook</code>. Docbook doesn't really\r
+have a namespace, and if it did, it wouldn't be this one. See <a\r
+href="#docbook-namespace">Docbook namespace</a> for a discussion of\r
+this issue.</p></db:footnote>\r
+\r
+declaration may be added to the document root element. Anywhere in\r
+the document, the prefix from this namespace declaration may be used\r
+to include a Docbook element. This is useful if a document contains\r
+many Docbook elements, such as <code>footnote</code> or\r
+<code>glossterm</code>, interspersed with XHTML. (In this case it may\r
+be more convenient to allow these elements in the XHMTL namespace and\r
+add a customization layer that translates them to docbook elements,\r
+however. See <a href="#customization">Customization</a>.)</p>\r
+\r
+<pre class="example"><![CDATA[\r
+<html xmlns="http://www.w3.org/1999/xhtml"\r
+ xmlns:db="urn:docbook">\r
+ ...\r
+ <p>Some text<db:footnote>and a footnote</db:footnote>.</p>\r
+]]></pre></dd>\r
+\r
+<dt>Local namespace</dt>\r
+<dd><p>A Docbook element may be introduced along with a prefix-less\r
+namespace declaration. This is useful for embedding a Docbook\r
+document fragment (a hierarchy of elements that all use Docbook tags)\r
+within of a XHTML document.</p>\r
+\r
+<pre class="example"><![CDATA[\r
+ ...\r
+ <articleinfo xmlns="urn:docbook">\r
+ <author>\r
+ <firstname>...</firstname>\r
+ ...\r
+]]></pre></dd>\r
+</dl>\r
+\r
+<p>The source to <a href="index.src.html">this document</a>\r
+illustrates both of these techniques.</p>\r
+\r
+<p class="note">Both these techniques will cause your document to be\r
+invalid as XHTML. In order to validate an XHTML document that\r
+contains Docbook elements, you will need to create a custom schema.\r
+Technically, you then ought to place your document in a different\r
+namespace, but this will cause &html2db; not to recognize it!</p>\r
+\r
+\r
+<h3>Output Processing Instructions</h3>\r
+\r
+<p>&html2db; adds a few of processing instructions to the output file.\r
+The Docbook XSL stylesheets ignore these, but if you write a\r
+customization layer for Docbook XSL, you can use the information in\r
+these processing instructions to customize the HTML output. This can\r
+be used, for example, to set the <code>a</code> <code>onclick</code>\r
+and <code>target</code> attributes in the HTML files that Docbook XSL\r
+creates to the same values they had in the input document.</p>\r
+\r
+<dl>\r
+<dt><code><?html2db attribute="<var>name</var>" value="<var>value</var>"?></code></dt>\r
+<dd>Placed inside a link element to capture the value of the <code>a</code> <code>target</code> and <code>onclick</code> attributes. <var>name</var> is the name of the attribute (<code>target</code> or <code>onclick</code>), and <var>value</var> is its value, with <code>"</code> and <code>\</code> replaced by <code>\"</code> and <code>\\</code>, respectively.</dd>\r
+\r
+<dt><code><?html2db element="br"?></code></dt>\r
+<dd>Represents the location of an XHTML <code>br</code> element in the\r
+source document.</dd>\r
+\r
+</dl>\r
+\r
+<p>You can also include <code><?db2html?></code> processing\r
+instructions in the HTML source document, and they will be copied\r
+through to the Docbook output file unchanged (as will all other\r
+processing instructions).</p>\r
+\r
+\r
+<h2 id="customization">Customization</h2>\r
+<h3>XSLT Parameters</h3>\r
+<dl>\r
+ <dt><code><xsl:param name="anchor-id-prefix" select="''/></code></dt>\r
+ <dd>Prefixed to every id generated from <code><a name=></code>\r
+ and <code><a href="#"></code>. This is useful to avoid\r
+ collisions between multiple documents that are compiled into the\r
+ same book. For instance, if a number of XHTML sources are assembled\r
+ into chapters of a book, you style each source file with a prefix of\r
+ <code><var>docid</var>.</code> where <var>docid</var> is a unique id\r
+ for each source file.</dd>\r
+ \r
+ <dt><code><xsl:param name="document-root" select="'article'"/></code></dt>\r
+ <dd>The default document root. This can be overridden by\r
+ <code><?html2db class="<var>name</var>"></code> within the\r
+ document itself, and defaults to <code>article</code>.</dd>\r
+</dl>\r
+\r
+<h3 id="processing-instructions">Processing instructions</h3>\r
+<p>Use the <code><?html2db?></code> processing instruction to\r
+customize the transformation of the XHTML source to Docbook:</p>\r
+\r
+<table>\r
+<tr>\r
+<th>Processing instruction</th>\r
+<th>Content</th>\r
+<th>Effect</th>\r
+</tr>\r
+\r
+<tr>\r
+<td><code><?html2db class="<var>xxx</var>"?></code></td>\r
+<td><code>body</code></td>\r
+<td>Sets the output document root to <var>xxx</var>. Useful for\r
+translating to <code>prefix</code>, <code>appendix</code>, or <code>chapter</code>; the default is\r
+<var>$document-root</var>.</td>\r
+</tr>\r
+\r
+<tr id="simplelist">\r
+<td><code><?html2db class="simplelist"?></code></td>\r
+<td><code>ul</code></td>\r
+<td>Creates a vertical <code>simplelist</code>.<db:footnote><db:para>Note that the\r
+current implementation simply checks for the presence of <em>any</em>\r
+<code>html2db</code> processing instruction.</db:para></db:footnote></td>\r
+</tr>\r
+\r
+\r
+<tr>\r
+<td><code><?html2db rowsep="1"?></code></td>\r
+<td><code>[informal]table</code></td>\r
+<td>Sets the <code>rowsep</code> attribute on the generated <code>table</code>.<db:footnote><db:para>Note that the current implementation simply checks for the presence of <em>any</em> <code>html2db</code> processing instruction that begins with <code>rowsep</code>, and assumes the vlaue is <code>1</code>.</db:para></db:footnote></td>\r
+</tr>\r
+</table>\r
+\r
+<h3 id="embedding">Overriding the built-in templates</h3>\r
+<p>For cases where the previous techniques don't allow for enough\r
+customization, you can override the builtin templates. You will need\r
+to know XSLT in order to do this, and you will need to write a new\r
+stylesheet that uses the <code>xsl:import</code> element to import\r
+<code>html2db.xsl</code>.</p>\r
+\r
+<p>The <a href="examples.xsl"><code>example.xsl</code></a> stylesheet\r
+is an example customization layer. It recognizes the <code><div\r
+class="abstract"></code> and <code><p class="note"></code>\r
+classes in the <a href="index.src.html">source</a> for this document,\r
+and generates the corresponding Docbook elements.</p>\r
+\r
+\r
+<h2>FAQ</h2>\r
+<h3>Why generate Docbook?</h3>\r
+<p>The primary reason to use Docbook as an <em>output</em> format is\r
+to take advantage of the Docbook XSL stylesheets. These are a\r
+well-designed, well-documented set of XSL stylesheets that provide a\r
+variety of publishing features that would be difficult to recreate\r
+from scratch for HTML:</p>\r
+\r
+<ul>\r
+<li>Automatic Table-of-Contents generation</li>\r
+<li>Automatic part, chapter, and section numbering.</li>\r
+<li>Creation of single-page, multi-page, PDF, and WinHelp files from the same source document.</li>\r
+<li>Navigation headers, footers, and metadata for multi-page HTML\r
+documents.</li>\r
+<li>Link resolution and link target text insertion across multiple pages and numbered targets.</li>\r
+<li>Figure, example, and table numbering, and tables of these.</li>\r
+<li>Index and glossary tools.</li>\r
+</ul>\r
+\r
+<h3>Why write in XHTML?</h3>\r
+\r
+<p>Given that Docbook is so great, why not write in it?</p>\r
+\r
+<p>Where there are not legacy concerns, Docbook is probably a better\r
+choice for structured or technical documentation.</p>\r
+\r
+<p>Where the only legacy concern is the documents themselves, and not\r
+the tools and skill sets of documentation contributors, you should\r
+consider using an (X)HMTL convertor to perform a one-time conversion\r
+of your documentation source into Docbook, and then switching\r
+development to the result files. You can use this stylesheet to\r
+perform this conversion, or evaluate other tools, many of which are\r
+probably appropriate for this purpose.</p>\r
+\r
+<p>Often there are other legacy concerns: the availability of cheap\r
+(including free) and usable HTML editors and editing modes; and the\r
+fact that it's easier to teach people XHTML than Docbook. If either\r
+of this is an issue in your organization, you may want to maintain\r
+documentation sources in XHTML instead of Docbook</p>\r
+\r
+<p>For example, at <a href="http://www.laszlosystems.com/">Laszlo</a>,\r
+most developers contribute directly to the documentation. Requiring\r
+that developers learn Docbook, or that they wait on the doc team to\r
+get content into the docs, would discourage this.</p>\r
+\r
+<h3>Why not use an existing convertor?</h3>\r
+\r
+<p>This isn't the first (X)HTML to Docbook convertor. Why not use one\r
+of the exisitng ones?</p>\r
+\r
+<p>Each HTML to Docbook convertors that I could find had at least some\r
+of the following limitations, some of which stemmed from their\r
+intended use as one-time-only convertors for legacy documents:</p>\r
+\r
+<ul>\r
+<li>Many only operated on a subset of HTML, and relied upon hand\r
+editing of the output to clean up mistakes. This made them impossible\r
+to use as part of a processing pipeline, where the source is\r
+<em>maintained</em> in XHTML.</li>\r
+\r
+<li>There was no way to customize the output, except by (1) hand\r
+editing, or (2) writing a post-processing stylesheet, which didn't\r
+have access to the information in the XHTML source document.</li>\r
+\r
+<li>Many of them were difficult or impossible to customize and\r
+extend. They were closed-source, or written in Java or Perl (which I\r
+find to be a difficult languages to use for customizing this kind of\r
+thing) and embedded in a larger system.</li>\r
+\r
+<li>They didn't take full advantage of the Docbook tag set and content\r
+model to represent document structure. For instance, they didn't\r
+generate nested <code>section</code> elements to represent\r
+<code>h1</code> <code>h2</code> sequences, or <code>table</code> to\r
+represent tables with <code>summary</code> attributes.</li>\r
+</ul>\r
+\r
+<h3>I got this error. What does it mean?</h3>\r
+<dl>\r
+<dt>Q. <code>Fatal Error! The element type "br" must be terminated by the matching end-tag "</br>".\r
+</code></dt>\r
+<dd>A. Your document is HTML, not <em>X</em>HTML. You need to fix it, or run it through Tidy first.</dd>\r
+\r
+<dt>Q. My output document is empty except for the <code><?xml version="1.0" encoding="UTF-8"?></code> line.</dt>\r
+<dd>A. The document is missing a namespace declaration. See the <a href="index.src.html">example</a> for an example.</dd>\r
+\r
+<dt>Q. Some of the headers and document sections are repeated multiple times.</dt>\r
+<dd>A. The document has out-of-sequence headers, such as <code>h1</code> followed by <code>h3</code> (instead of <code>h2</code>). This won't work.</dd>\r
+\r
+<dt>Q. <code>Fatal Error! The prefix "db" for element "db:footnote" is not bound.</code></dt>\r
+<dd>A. You haven't declared the <code>db</code> namespace prefix. See the <a href="index.src.html">example</a> for an example.</dd>\r
+\r
+</dl>\r
+\r
+\r
+<h2>Implementation Notes</h2>\r
+\r
+<h3>Bugs</h3>\r
+<ul>\r
+<li>Improperly sequenced <code>h<var>n</var></code> (for example\r
+<code>h1</code> followed by <code>h3</code>, instead of\r
+<code>h2</code>) will result in duplicate text.</li>\r
+</ul>\r
+\r
+\r
+<h3>Limitations</h3>\r
+<ul>\r
+<li>The <code>id</code> attribute is only preserved for certain\r
+elements (at least <code>h<var>n</var></code>, images, paragraphs, and\r
+tables). It ought to be preserved for all of them.</li>\r
+<li>Only the <a href="#tables">very simplest</a> table format is\r
+implemented.</li>\r
+<li>Always uses compact lists.</li>\r
+<li>The string matching for <code><?html2b\r
+class="<var>classname</var>"?></code> requires an exact match\r
+(spaces and all).</li>\r
+<li>The <a href="#implicit-blocks">implicit blocks</a> code is easily\r
+confused, as documented in that section. This is\r
+easy to fix now that I understand the difference between block and\r
+inline elements (I didn't when I was implementing this), but I\r
+probably won't do so until I run into the problem again.</li>\r
+\r
+</ul>\r
+\r
+\r
+\r
+\r
+<h3>Wishlist</h3>\r
+<ul>\r
+<li>Allow <code><html2db attribute-name="<var>name</var>"\r
+value="<var>value</var>"?></code> at any position, to set arbitrary\r
+Docbook attributes on the generated element.</li>\r
+\r
+<li>Use different technique from the <a href="#docbook-elements">fake\r
+namespace prefix</a> to name Docbook elements in the source, that\r
+preserves the XHTML validity of the source file. For example, an\r
+option transform <code><div class="db:footnote"></code> into\r
+<code><footnote></code>, or to use a processing attribute\r
+(<code><div><?html2db classname="footnote"?></code>).</li>\r
+\r
+<li>Parse DC metadata from XHTML <code>html/head/meta</code>.</li>\r
+\r
+<li>Add an option to use <code>html/head/title</code> instead of\r
+<code>html/body/h1[1]</code> for top title.</li>\r
+\r
+<li>Allow an <code>id</code> on every element.</li>\r
+\r
+<li>Add an option to translate the XHTML <code>class</code> into a\r
+Docbook <code>role</code>.</li>\r
+\r
+<li>Preserve more of the whitespace from the source document &emdash; especially within lists and tables &emdash; in order to make it easier to debug the output document.</li>\r
+\r
+<h3>Support</h3>\r
+<p>This is a work in progress. It serves my needs, but doesn't\r
+attempt to be much more general than that. If you run into anything\r
+it can't handle, please send a note, or better yet, a patch, to <a\r
+href="mailto:steele@osteele.com">steele@osteele.com</a>. I can't\r
+promise to address problems (I have a day job too), but knowing what\r
+people have run into will help my prioritize my work when I do have\r
+time to work on this.</p>\r
+\r
+\r
+</ul>\r
+\r
+\r
+<h3>Design Notes</h3>\r
+<h4 id="docbook-namespace">The Docbook Namespace</h4>\r
+<p>&html2db; accepts elements in the "Docbook namespace" in XHTML\r
+source. This namespace is <code>urn:docbook</code>.</p>\r
+\r
+<p>This isn't technically correct. Docbook doesn't really have a\r
+namespace, and if it did, it wouldn't be this one. <a\r
+href="http://www.faqs.org/rfcs/rfc3151.html">RFC 3151</a> suggests\r
+<code>urn:publicid:-:OASIS:DTD+DocBook+XML+V4.1.2:EN</code> as the\r
+Docbook namespace.</p>\r
+\r
+<p>There two problems with the RFC 3151 namespace. First, it's long\r
+and hard to remember. Second, it's limited to Docbook v4.1.2 &emdash;\r
+but &html2db; works with other versions of Docbook too, which would\r
+presumably have other namespaces. I think it's more useful to\r
+<em>under</em>specify the Docbook version in the spec for this tool.\r
+Docbook itself underspecifies the version completely, by avoiding a\r
+namespace at all, but when mixing Docbook and XHTML elements I find it\r
+useful to be <em>more</em> specific than that.</p>\r
+\r
+<h3>History</h3>\r
+<p>The original version of &html2db; was written by <a\r
+href="http://osteele.com">Oliver Steele</a>, as part of the <a\r
+href="http://laszlosystems.com">Laszlo Systems, Inc.</a> documentation\r
+effort. We had a set of custom stylesheets that formatted and added\r
+linking information to programming-language elements such as\r
+<code>classname</code> and <code>tagname</code>, and added\r
+Table-of-Contents to chapter documentation and numbers examples.</p>\r
+\r
+<p>As the documentation set grew, the doc team (John Sundman)\r
+requested features such as inter-chapter navigation, callouts, and\r
+index and glossary elements. I was able to beat all of these back\r
+except for navigation, which seemed critical. After a few days trying\r
+to implement this, I decided it would be simpler to convert the subset\r
+of XHTML that we used into a subset of Docbook, and use the latter to\r
+add navigation. (Once this was done, the other features came for\r
+free.)</p>\r
+\r
+<p>During my August 2004 "sabbatical", I factored the general html2db\r
+code out from the Laszlo-specific code, refactored and otherwise\r
+cleaned it up, and wrote this documentation.</p>\r
+\r
+<h3>Credits</h3>\r
+<p>&html2db; was written by <a href="http://osteele.com">Oliver Steele</a>, as part of the <a href="http://laszlosystems.com">Laszlo Systems, Inc.</a> documentation effort.</p>\r
+\r
+</body>\r
+</html>
\ No newline at end of file