|
This page dicusses the role of DTD's in marking up an historic document. It does not go into the specifics of page authoring. All documents that are marked up for Project Gutenberg must be marked up according to a DTD, and then validated against that DTD. Here are a list of the DTD's available at present.
DTD's or schemas are a description of the way a document is marked up, and contains such information as the permitted elements and attributes, the permitted content model of an element, and the attribute types. They are particularly important in marking up historic documents, as they let any future reviewer of the document know what they can expect. They are also important in preventing anarchy. If it wasn't for a DTD or Schema the marker would be free to mark up the document in any way they wanted. Although this may be fun for the marker, it would not be fun for someone who had to review the document at a later date.
DTD's and Schemas can be used to impose an order on a document, and this is exactly what we want to do at the top level of the document, where we want to make sure all the markup blurbs and e-text blurbs are included. When we come to the historic document proper we need a much looser state of affairs. The book or poem is written, and we just want to make sure we have a DTD that describes the content!
We will not teach you about the specifics of DTD's here. There are several good tutorials and books available, and also the guild has an XML class. There are narrative descriptions of the various DTD's in the next few pages.
All the DTD's used must be free to use in perpetuity. (Note this is applies to several well known DTD's including DocBook). We will maintain a series of suitable DTD's on this site. These DTD's will evolve, and hopefully improve over time, but all old versions will be maintained. We intend to follow these general principles.
All 'gut' DTD's should all have the same top level structure. This structure is shown in the following diagram.
The following shows "pseudo code" that explains the heirachy and the nature of the content of each section.
<gutdoc> <gutblurb> [This contains all the meta information about the document that was developed by the original transcriber. It will probably not be displayed by the style sheet.] </gutblurb> <markupblurb> [This section contains the information about the marker of the document, including all details of the revision history. It will probably not be displayed by the style sheet.] </markupblurb> <gutcredit> [This short credit is designed to be displayed at the top of the document. eg. ] This document was marked up by [name], a member of the HTML writers guild as part of Project Gutenberg.[date]. The original transcription was made by [name] date. For further information go to view/source. </gutcredit> <gutbook> The document proper goes here. The DTD for this will vary. </gutbook> <endmarkupblurb> [Typically this will contain and notes made by the marker pertaining to the document it self, including foot notes. It will probably not be displayed by the style sheet. </endmarkupblurb> <endgutblurb> Most e-texts have a line or two of additional meta information. </endgutblurb> </gutdoc>
Each of the non document top levels should have the same top level structure namely:- (#PCDATA|para|subsect|title)*
At present, in addition to the the XHTML dtd there are four dtd's available, the gutpoems1.dtd, the gutplay1.dtd, and the gutbook1.dtd, plus a DTD for books with poetry and plays included. There is also a series of elementary tutorials on the TEI DTD's (teixlite.dtd) starting at teidtds1.html