![]() |
|
SDD Part 0: Introduction and Primer to the SDD Standard |
SDD Part 0 is a non-normative introduction to the Taxonomic Databases Working Group SDD (Structure of Descriptive Data) Standard. Its intention is to provide a background, introduction and primer to the SDD Standard, with examples. Since the SDD Standard is a work-in-progress, this document will be updated from time to time.
Version: 3 Dec. 2003
Edited: Kevin Thiele (Centre for Biological Information Technology, University of Queensland), with financial support from the Gordon and Betty Moore Foundation (www.moore.org).
This document has not yet been reviewed by the SDD Working Group of TDWG. It is intended to facilitate discussion.
To contribute to the discussion on the SDD Standard and to comment on this document, please join the SDD discussion list by emailing the SDD List Server or contribute to the SDD Wiki.
Complete documentation of the SDD Schema is available on the SDD web site.
TDWG maintains other standards that relate to the SDD standard, particularly the (Names) standard. Wherever possible, element names conform across standards.
In September 1998 the Taxonomic Databases Working
Group (TDWG) of the International Union of
Biological Sciences (IUBS) established the Structure of Descriptive Data
(SDD) subgroup. TDWG’s role is to facilitate and manage the development of
international standards in the taxonomic domain. The SDD subgroup was
established to develop an international XML-based standard for capturing and
managing descriptive data for organisms).
Development of the SDD standard was initiated in response to recognition that
the existing standard previously endorsed by TDWG – the
DELTA data standard developed
at CSIRO in Canberra from 1971 and adopted by TDWG as a descriptive data
standard in 1991 – had become inadequate (FAQ:
Why not continue to use DELTA?).
The SDD subgroup began discussing and scoping a standard through an email
discussion group in November 1999 (see the
SDD email list
archives). Considerable progress
has been made at face-to-face meetings amongst a small
group of core contributors, in Nov. 2001 (Canberra), Oct. 2002 (Sao Paulo), Feb. 2003 (Paris)
and October 2003 (Lisbon).
Version 0.9 of the SDD standard and Version 0.9 of this document were released on the TDWG website in December 2003.
In taxonomy, descriptions of taxa are one of the prime storages for both raw and highly processed data. Virtually all known organisms have published descriptions in some form - indeed, it is a requirement under the International Codes of Botanical and Zoological Nomenclature that valid publication of a new taxon must include a diagnostic description. Descriptions of taxa form the core of biological monographs and of Floras and other field guides.
Descriptions in taxonomy take several forms. The most common and least tractable is the natural-language description (Box 1). A natural-language description is a semi-structured, semi-formalised description of an organism or (more usually) a taxon. Natural-language descriptions may be simple, short and written in plain language (if used for a popular field guide), or long, highly formal and using specialised terminology when used in a taxonomic monograph or other treatment.
Calidris canutus (Red Knot)
Stout wader with bill same length as head, crown unstreaked, narrow white bar
in wing, pale rump with grey barring, shortish olive legs. Non-breeding: grey
above with narrow pale edging to feathers, pale eyebrow, smudged sides to neck
with faint spotting. Juvenile: feathers of back edged white with dark
subterminal bar, breast more heavily spotted pale buff and flanks barred,
crown faintly streaked. Breeding: rufous underparts, feathers of back rufous
patterned with black. Voice: 'knut-knut', `nyui , high-pitched `toowit-wit'.
from Slater, P., Slater, P. & Slater, R. (2001) The Slater Field Guide to Australian Birds (Reed New Holland: Sydney)
Tithorea harmonia Godman & Salvin
Antennae
orange, forewing short with pointed tip, white checks on wing edges, spots on
ventral hindwing margin paired, black bar on lower edge of forewing discal cell,
black bar above hindwing discal cell, discal bar reduced to a spot or absent.
from www.cs.umb.edu/~whaber/Monte/Ithomid/Tith-harm.html
Discaria pubescens (Brongn.) Druce
Rigid, spreading shrub to c. 1 m high and wide; stems glabrous. Leaves soon
deciduous, c. oblong, to 10 mm long, 3 mm wide, obtuse or minutely mucronate
within an apical notch, margins minutely toothed, surfaces glabrous or a few
hairs present near tip; stipules dark reddish-brown, c. 1 mm long, often
shallowly joined around the node, pubescent on inner face; spines stout, 1.5-4
cm long. Flowers white, solitary or in few-flowered axillary cymes, sometimes
congested on short apical shoots; pedicels 2-3 mm long; hypanthium c. 1.5 mm
long; sepals somewhat spreading, 1-1.5 mm long; petals attached at throat of
hypanthium, c. 1 mm long; stamens subequal to and weakly hooded by petals;
disc prominent, lining base of hypanthium, obscurely 5-angled; style minute.
Capsule prominently 3-lobed, 4-5 mm diam., the valves separating incompletely
at maturity and splitting dorsally and medially.
from Walsh, N.G. (1999) Rhamnaceae, in N.G.Walsh & T.J.Entwisle, Flora of Victoria Volume 4, Dicotyledons, Cornaceae to Asteraceae (Inkata Press: Melbourne)
A relatively small number of descriptions comprise fully structured data, such as Lucid LIF files (Box 2), DELTA descriptions (Box 3) and NEXUS data files.
|
Box 1.2.2 - a simple Lucid Interchange Format (LIF) file
#Lucid Interchange Format File v. 2.1 |
Box 1.2.3 - a simple DELTA file
*SHOW: Gentianella - character list. Last revised 16 April 1997. |
|
Most descriptions of organisms - natural language descriptions - are devoid of data markup and are almost entirely intractable for processing by analytical engines and data-mining routines. Structured descriptions, including descriptions in DELTA, Lucid and NEXUS formats, use more or less proprietary formats that are intimately tied to one or a small number of software implementation, and in general evolution of the software platform and of the format occur in tandem. In cases where packages provide tools to translate between formats (e.g. the Lucid-DELTA Translator and the DELTA CONFOR programs), translation is usually lossy (because the different formats maintain different data structures), and maintenance of the translation programs is difficult (since they must track changes made to proprietary formats on both sides of the translation). Further, if the software platforms lose support the data stored in the proprietary format for that software become legacy data and cannot be easily maintained.
The SDD subgroup consider that an independent, international standard for descriptive data is important. Such a standard is crucial to enabling lossless porting of data between existing and future software platforms including identification, data-mining and analysis tools, and federated databases. The absence of such a standard is a major impediment to the greater use of digitised descriptive data, and brings substantial inefficiencies to taxonomy as a whole.
The SDD Standard intends to:
SDD will be XML-based, and will provide a schema for validation of documents.
SDD seeks to facilitate:
The simplest possible description comprises a single descriptive statement about an organism, taxon or object. An example of such a description is given in Box. 5, and its SDD representation in Example 1.
Viola hederacea Labill.
Leaves simple
<?xml version="1.0" encoding="UTF-8"?>
<Document xmlns="http://www.tdwg.org/2003/SDD_09" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.tdwg.org/2003/SDD_09 C:\DOCUME~1\KEVINT~1\Desktop\SDDBET~1\SDD_09.xsd">
<GenerationMetadata TimeStamp="2002-11-08T10:00:00"
GeneratorName="n/a, handcrafted instance document" GeneratorVersion="n/a"/>
<ProjectDefinition>
<Version>
<Major>1</Major>
</Version>
<RevisionData>
<Authors>
<Agent ref="1"/>
</Authors>
<InitiationDate>1999-08-13T00:00:00</InitiationDate>
<LastRevisionDate>2003-11-05T00:00:00</LastRevisionDate>
</RevisionData>
<AudienceSpecificData>
<Representation audience="en5">
<Title>The Genus Viola</Title>
<Rights>
<CopyrightStatement>(c) 2003 Centre for Occasional Botany</CopyrightStatement>
</Rights>
</Representation>
</AudienceSpecificData>
<Audiences defaultaudience="en5">
<Audience audiencekey="en5" lang="en" ExpertiseLevel="5">
<LabelText>Experts</LabelText>
</Audience>
</Audiences>
</ProjectDefinition>
<Terminology>
<Characters>
<Character key="1">
<Label>
<Representation audience="en5">
<Text>Leaf complexity</Text>
</Representation>
</Label>
<Type>nominal</Type>
<Categorical>
<States>
<StateDefinition key="1">
<Label>
<Representation audience="en5">
<Text>Simple</Text>
</Representation>
</Label>
</StateDefinition>
<StateDefinition key="2">
<Label>
<Representation audience="en5">
<Text>Compound</Text>
</Representation>
</Label>
</StateDefinition>
</States>
</Categorical>
</Character>
</Characters>
</Terminology>
<Entities>
<Classes>
<Class key="1">
<FreeFormDescription>Viola hederacea Labill.</FreeFormDescription>
</Class>
</Classes>
</Entities>
<Resources>
<Agents>
<Agent key="1">
<FreeFormDescription>Kevin Thiele</FreeFormDescription>
<LastName>Thiele</LastName>
</Agent>
</Agents>
</Resources>
<Descriptions>
<CodedDescription key="101">
<Class ref="1"/>
<RevisionData>
<Authors>
<Agent ref="1"/>
</Authors>
<InitiationDate>2003-08-13T10:23:11</InitiationDate>
<LastRevisionDate>2003-08-13T10:23:11</LastRevisionDate>
</RevisionData>
<CharacterData>
<Character ref="1">
<State ref="1"/>
</Character>
</CharacterData>
</CodedDescription>
</Descriptions>
</Document>
SDD documents are structured using seven high-level XML elements. Four of these (listed in bold italic below) are mandatory, while the remaining three are optional.
<Document> is the root of an SDD document, and encloses all other elements
The <GenerationMetadata> element is used to specify metadata about the process (application or script) that created the current SDD document or data stream, such as name of the generating application and date and time at which the document was created.The <ProjectDefinition> element is used to capture metadata about the project from which the document data are sourced, including details of authors and contributors to the project, the project status, publication and revision dates, sources of data etc.
The <Terminology> element defines a list of characters and their states used to describe the entities described in the document.The <Entities> element defines a list of entities (such as taxa and specimens) for which descriptions are provided in the document.
The <Resources> element provides for definitions of resources (images, notes, contributors etc) referred to elsewhere in the document.The <Descriptions> element contains descriptions (either coded or marked-up natural language) of the document's entities
A valid SDD document must include <Document>, <GenerationMetadata>, <ProjectDefinition> and <Resources> elements. In addition, it may contain a <Terminology> section alone (if used to provide character and state resources from a project), an <Entities> and <Descriptions> section alone (if used to carry natural language descriptions with no markup), or <Terminology>, <Entities> and <Descriptions> elements (in which case it may carry coded or marked-up natural language descriptions).
FAQ: Why are SDD documents so verbose and complex?
Example 1 describes only the most basic of SDD structures. There are two ways to go further: either use the links at left below for more information about specific SDD tasks, or click on an element in the Schema diagram at right below to navigate to information specific to that element.

KRT Last Edit: 31 Dec 03