SDD document: Definition of the scope of descriptive data

TDWG working group: Structure of Descriptive Data (SDD)

Introduction

The following discussion is an attempt to answer some general questions about descriptive data and how our work differs from the work of other TDWG standard groups.

What are descriptive data?

Definition: Descriptive data inform about the state of repeatably observable, inherent properties of objects (= individual organisms) and the classes (= taxa) to which these objects belong.

REREAD the rest according the changed definition above!

Specimens in natural history collections are special named cases of objects. Descriptive data should not be limited to observations that are possible on preserved dead objects, although these data are very important to study type specimens in taxonomy.

The demand for "repeatability" should be understood as "potential repeatability". Observations on a single object are not repeatable if the object is unlikely to be encountered again (birds observerd during flight) or whenever an observation methods destroy the feature that is being observed (or even the entire specimen).

Other observations may depend on the state of the object. For example, the observation of behavior requires living state, certain flower colors in herbarium specimens require good preservation. Some observations can only be made during a certain time after death (e. g. molecular features deteriorate with time: enzymes fast, DNA slower).

Furthermore, the object may have the observed state only with a given frequency, which may reflect population polymorphism or environmental factors. Note that the probability/uncertainty of repeatability (see "Certainty modifiers") is different from repeatability with a given frequency (see "Frequency modifiers").

Data that are attached to the object only by the method of handling or processing it (collection data, preparation data, transfer information, etc.) are not inherent to the object. Where processing methods are needed to make property observations, these should be defined in the definition of a character. Deviations and unusual case may be stored in character observation annotations.

Images or other media captures are highly relevant to descriptive data. However, they usually are related to raw data documenting observation (similarly to raw HPLC data) and differ from structured analytical descriptive statements about objects.

The definition of descriptive data given above is not limited to morphological data. Morphological data are important because they:

Specifically, the definition includes:

Some types of data are for pragmatic reasons often included in descriptive data sets, but form a border area:

These data are equally well treated in organism observation or specimen collection databases and the data in descriptive databases are usually derived from these data source. However, the synthesis and interpretation of all available distribution or interaction data into a coherent picture for a class of organisms is a separate piece of scientific knowledge and is usually handled as descriptive data.


species page assemblage

Fig. 1. Relation between descriptive data and other biodiversity data areas. "Species pages" are a combination of descriptive data with data derived from other sources

 

Why are people doing it?

The driving force behind most of the interest in descriptive data is identification of organisms. A taxonomic name service or species bank is the portal to biodiversity information if the name is already known. A descriptive data service is, however, the primary information portal if a name is not yet known. An online interactive key is basically the "query mechanism" for descriptive data.

The emphasis of much work in SDD is the need to achieve fast and good identification results. This implies that characters need to be ranked according to their suitability for a given identification task. Their suitability will often differ depending on the group that is being identified and depending on previously answered descriptive questions. Furthermore, successful identification needs an exact terminology, where the concepts used by describer and identifier are identical or sufficiently similar to reach the same conclusions about the object at hand. Short text labels for characters or states are often insufficient to achieve this. Images and other media can help, but often considerable more effort is needed to achieve exact and understandable definitions. Finally, identification should be error tolerant during the process of answering multiple questions, and it should contain a verification phase after a taxon name has been reached (through searching similar taxa and asking differential questions, or through displaying a full description, possibly with graphic illustrations in the case of morphological features).

A related task is the documentation of organism identifications in biodiversity research. If specimens in natural history collections would be identified using interactive keys, the identification steps could automatically provide a basic description of the specimen that could be stored as descriptive data. On the next revision of the specimen identification, these data could be used to confirm identification or help to understand what taxonomic concept ("potential taxon") was used.

The documentation of identifications is also highly important for field or laboratory studies where no voucher specimens are preserved in a collection (collection managers consider this a sin, but nevertheless this is the dominant practice in most applied scientific areas). Currently it is extremely difficult to assess which taxonomic concept was used in published biodiversity data (e. g. studies of ecological interactions). Although highly desirable, currently not even the identification literature used is routinely cited. Digital documentation would not only document which descriptive information was used for identification, but also which characteristics where actually verified. Although such use may currently be unrealistic in field studies, digital technology is progressing fast, and in laboratory studies the use of digital identification and documentation tools would already save valuable time for the researcher. An important feature in this context is the storage of original descriptive observation data like individual measurements. These data should be stored directly and the synthesized statistical summary information automatically created from them.

Beyond all this, descriptive data are also a general attempt to document the properties of biodiversity on earth. Information about chemical or molecular properties is relevant for many medicinal or biotechnological purposes. Information about organism interactions (e. g. pollinators, host-pathogen, predator-prey) or the interaction with the environment (e. g. growth of plants in different soils) are relevant for agriculture as well as to understand ecological networks.

Finally, descriptive data are extensively used for phylogenetic and other non-phylogenetic classifications. Taxonomy heavily relies on descriptive data to reach the conclusions that are then synthesized into synonymy and taxonomic hierarchies.

Gregor Hagedorn; Vers. 1; 21. February 2003



Return to the SDD starting page.

First published 2003-02-20, last update: 2003-10-24.

Valid XHTML 1.0! Valid CSS1! Viewable With Any Browser