DocBook for Writers of CLP Documentation

David de la Nuez


Table of Contents

Introduction
DocBook?
Getting Ready for DocBook
Need to Know
DocBook and CLP, Perfect Together
Tips and Suggestions
Resources

Introduction

The CLP User Guide is written in DocBook XML. This tutorial serves as an introduction to using DocBook for maintaining the Guide, as well as writing new documentation. There are countless DocBook and XML resources available both in print and online, so this tutorial will be limited in its scope to applying these two technologies to documenting CLP. See the section called “Resources” to learn more about DocBook and XML.

DocBook?

Why DocBook? Why not HTML or LaTeX? Here are a few of the reasons:

  • DocBook and the tools we use to work with it are Open Source.

  • LaTeX is nice for mathematical markup, but DocBook exists for marking-up technical documentation.

  • Basic HTML is easy to learn and use, but it is very clumsy. DocBook does a good job of separating content from presentation (HTML does not do this particularly well, even with the use of CSS), allowing the writer to focus on what really matters, the content.

  • DocBook can transformed into high-quality online and printed output (e.g. HTML and PDF, respectively), while working from a single source.

  • DocBook is very robust, thoroughly documented, and has a strong community behind it.

  • DocBook is a modern though mature standard for documentation of software (or other) projects which is at no risk of obsolescence.

  • The tools needed for creating and manipulating DocBook documents are typically part of existing *nix installations (including Cygwin) so little or no installation of new software is required to use DocBook (editing can be done in any text editor).

  • Many tedious tasks, such as the creation of a table of contents, are handled automatically by a good DocBook configuration.

Getting Ready for DocBook

Editing and publishing CLP documentation with DocBook requires that some important tools be installed. It is likely that all or most of these tools are already in place on a typical *nix (e.g. Linux) system. Windows users should strongly consider installing Cygwin, as this tutorial assumes a *nix environment for DocBook development. In fact, the following instructions for installation are for Cygwin and Red Hat Linux, (and should not be altogether different for another *nix system).

The necessary Cygwin packages can all be found in the "Doc" section of the categorical view of Cygwin's setup.exe. The user should verify that all of them are selected (because there may not be adequate dependency rules to ensure that all the correct packages are installed). The packages in question are:

  • dockbook-xml42 [1]

  • docbook-xsl

  • libxml2

  • libxslt

  • xmlto

According to The Selfdocbook (XML Edition), the Red Hat Linux (7.3) packages needed are:

  • sgml-common and xml-common

  • docbook-style-xsl

  • docbook-dtds

  • xmlto

The Selfdocbook also lists a few other packages, but they are not necessary for HTML output (this tutorial does not (yet) address how to create output in other formats such as PDF).

The last of the packages, xmlto, is a shell script which facilitates the conversion of DocBook documents to HTML and other formats. If all the tools are properly installed, creating an HTML version of this tutorial, for example, is as trivial as typing xmlto html docbook4clp.xml at the command line. But before jumping in to work with DocBook, there are some important issues which need to be addressed.

Need to Know

Knowledge of DocBook is like a security clearance: the user is on a need-to-know basis. That is, to start working with DocBook in a properly configured environment, a user needs to know very little, but there is always something more out there to learn. This section addresses a few details of DocBook that the typical user needs to know to get a first DocBook document up and running. Details will be left to the reader to fill-in from other resources (see the section called “Resources”).

What makes an XML document a DocBook document? It is not difficult to write a "valid" XML document. The following example would constitute a valid XML document:

  
  <?xml version="1.0" encoding="ISO-8859-1"?>
  <book>
    <title>How CLP Won the West</title>
    <chapter>
      <title>In the Beginning</title>
      <para>
      There once was a large LP...
      </para>
    </chapter>
  </book>

This document is not much use, though, without some meaning for the tags in it. The DocBook DTD is what gives a document meaning. The following example works better, and constitutes a valid DocBook document:


  <?xml version="1.0" encoding="ISO-8859-1"?>
  <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
                  "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" [
  <book>
    <title>How CLP Won the West</title>
    <chapter>
      <title>In the Beginning</title>
      <para>
      There once was a large LP...
      </para>
    </chapter>
  </book>

The only difference is the document type declaration which states the document is meant to adhere to the standard described in the file http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd (see the section called “Resources” for where to read more about document type declarations the DocBook DTD). In other words, adding the extra line of code makes this little example a genuine DocBook document. In this case, the declaration uses an Internet address. However, in a properly configured environment, a network connection would not be necessary to work with the document, thanks to the "catalog" mechanism. A discussion of catalogs is beyond the scope of this document (see the section called “Resources” for more). Should catalogs not be properly configured on a given system, one could instead use a local path to the DTD in the document type declaration (e.g. /usr/share/docbook-xml42/docbookx.dtd).

Suppose the name of the file containing the example above is bookex.xml. To create a single HTML document from this file is as simple as typing one command:

$ xmlto html-nochunks bookex.xml

To create a multi-part HTML version is just as easy:

$ xmlto html bookex.xml

A final and very important DocBook topic is that of "entities". For the purposes of writing CLP documentation (i.e. what follows is a gross simplification), entities are a way of "#include-ing" one document into another, and of using certain special characters which would otherwise confuse the tools used to process DocBook documents. The simplest example of the latter is the < symbol, which is used to begin tags in XML. Rather than putting the character directly into the document text, an entity can be used. Specifically, one would use the string &lt; instead. The other use of entities, as suggested above, is to split a document into convenient pieces. This is demonstrated in the section called “DocBook and CLP, Perfect Together”.

DocBook and CLP, Perfect Together

The DocBook XML source of the CLP documentation is available via the COIN CVS repository in the COIN/Clp/Docs directory. The first file of interest is clpuserguide.xml. At the time of this tutorial's writing, the file looked like this:


<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
                  "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" [
  <!ENTITY authors SYSTEM "authors.xml">
  <!ENTITY legal SYSTEM "legal.xml">
  <!ENTITY intro SYSTEM "intro.xml">
  <!ENTITY basicmodelclasses SYSTEM "basicmodelclasses.xml">
  <!ENTITY notsobasic SYSTEM "notsobasic.xml">
  <!ENTITY moresamples SYSTEM "moresamples.xml">
  <!ENTITY clpexe SYSTEM "clpexe.xml">
  <!ENTITY messages SYSTEM "messages.xml">
  <!ENTITY faq SYSTEM "faq.xml">
  <!ENTITY faqcontent SYSTEM "faqcontent.xml">
  <!ENTITY doxygen SYSTEM "doxygen.xml">
  <!ENTITY revhist SYSTEM "revhist.xml">
]>
<book id="clpuserguide" lang="en">
<bookinfo>
<title>CLP User Manual</title>
  &authors;
  &legal;
</bookinfo>
  &intro;
  &basicmodelclasses;
  &notsobasic;
  &moresamples;
  &clpexe;
  &messages;
  &faq;
  &doxygen;
  &revhist;
</book>

Essentially clpuserguide.xml contains a series of entity declarations which refer to other XML files (e.g. <!ENTITY authors SYSTEM "authors.xml">, which are then included into the main file via use of the entities (e.g. &authors;. This allows a neat separation of chapters in the Guide, resulting in more manageable and readable source than would be possible without the use of entities. Neither the names nor the order of the declarations of the entities is particularly important, but it is a good practice to follow the informal convention of naming the entity after the chapter filename, and declaring it in a sensible place with respect to the order of the chapters.

Editing a particular chapter of the Guide is a matter of editing a single, reasonably sized file. The addition of a new chapter merely entails the declaration of a new entity and the writing of a short additional line in clpuserguide.xml. Suppose a chapter on the barrier method of CLP was planned (it is, in fact). The chapter could be written in a file named barrier.xml, while an entity was declared and used in clpuserguide.xml. If the barrier chapter was to preceed, say, the chapter on the CLP executable, the new clpuserguide.xml would look like this (with changes emphasized):


<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
                  "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" [
  <!ENTITY authors SYSTEM "authors.xml">
  <!ENTITY legal SYSTEM "legal.xml">
  <!ENTITY intro SYSTEM "intro.xml">
  <!ENTITY basicmodelclasses SYSTEM "basicmodelclasses.xml">
  <!ENTITY notsobasic SYSTEM "notsobasic.xml">
  <!ENTITY moresamples SYSTEM "moresamples.xml">

  <!ENTITY barrier SYSTEM "barrier.xml">

  <!ENTITY clpexe SYSTEM "clpexe.xml">
  <!ENTITY messages SYSTEM "messages.xml">
  <!ENTITY faq SYSTEM "faq.xml">
  <!ENTITY faqcontent SYSTEM "faqcontent.xml">
  <!ENTITY doxygen SYSTEM "doxygen.xml">
  <!ENTITY revhist SYSTEM "revhist.xml">
]>
<book id="clpuserguide" lang="en">
<bookinfo>
<title>CLP User Manual</title>
  &authors;
  &legal;
</bookinfo>
  &intro;
  &basicmodelclasses;
  &notsobasic;
  &moresamples;

  &barrier;

  &clpexe;
  &messages;
  &faq;
  &doxygen;
  &revhist;
</book>

The barrier chapter source might look something like this:


<?xml version="1.0" encoding="ISO-8859-1"?>
<chapter id="barrier">
  <title>
  The CLP Barrier Method
  </title>
  <section>
    <para>
      The CLP barrier method can be used …
    </para>
  </section>
</chapter>

Note the absence of a document type declaration; it is not necessary (and in fact "illegal") in this context because this file is included in the main file via the entity mechanism (only one document type declaration is allowed).

With some content in the proposed barrier.xml and the appropriate changes made to clpuserguide.xml, a new HTML version of the Guide could be created in much the same manner as the small book example above was transformed to HTML:

$ xmlto html-nochunks clpuserguide.xml

or for a sectioned version:

$ xmlto html clpuserguide.xml

Most of the chapters and appendices in the Guide exist only to be used in the Guide. There is currently one exception, the FAQ. The FAQ is constructed in a way that allows its inclusion in the Guide as well as on the CLP website (i.e. we have a single source document for our frequently asked questions). The file pointed to by the entity faq, faq.xml, is a wrapper for the file faqcontent.xml (with corresponding entity faqcontent). faqcontent.xml has another wrapper in coin-web/Clp named faqwrapper.xml, which will be addressed elsewhere.

Tips and Suggestions

This tutorial, as well as the first DocBook release of the CLP User Guide, were written using the Emacs editor. Most any text editor will do as a DocBook editor, but Emacs has its advantages. First, naturally, Emacs is Open Source. Second, there are Emacs modes tailored for editing XML documents which provide features such as syntax highlighting. One such mode is PGSML, which may be part of a system's default Emacs configuration (this appears to be the case with Cygwin, at the very least).

As the size of a DocBook project grows, so does the time it takes to transform it to HTML. If one wishes to simply check the validity of a document rather than wait for the entire HTML generation process to complete, the validating parser called by xmlto is easy enough to use:

$ xmllint --nout --postvalid --xinclude clpuserguide.xml

If there are no errors in the document, the parser will terminate without any explicit output. If there is in fact an error, a (sometimes) helpful error message will be printed by the parser.

With DocBook, as is the case with any other computer language, it is easiest to learn by example. The existing examples which are part of CLP are this tutorial, of course, and the User Guide itself. The Selfdocbook (XML Edition) is also an excellent example, as it is a DocBook document which includes its own source. Note that the source of this tutorial is available from the COIN CVS repository in the COIN/Clp/Docs/Howto directory.

The DocBook community is quite active, so the official mailing lists are highly recommended. See the section called “Resources” for more information on the lists as well as a number of other helpful resources.

Resources

Below is a list of some online resources for learning more about DocBook and XML.

  • DocBook.org: The official site for DocBook: The Definitive Guide (see below).

  • DocBook:The Definitive Guide: The number one reference for DocBook tags. The book is very much oriented toward users of the SGML version of DocBook, but is still the best resource available for CLP documenters.

  • The Selfdocbook (XML Edition) is another very useful reference. It is a self-documenting introduction to DocBook XML (it includes its own source, which makes ita great learning tool).

  • DocBook Wiki Full of useful DocBook links.

  • The Official DocBook homepage: Not terribly useful, but it includes information on the DocBook mailing lists, and a page where one can download DocBook.

  • DocBook FAQ A very handy list of frequently asked quesions (with answers!) about DocBook.



[1] This is for version 4.2 of DocBook. Future versions will have a slightly different name.