SDD Primer Home   Index

SDD Part 0: Introduction and Primer to the SDD Standard

3.3.1 Specifying the list of characters and states

A fundamental part of most SDD documents is a list of characters and states defined under the <Terminology> element. Characters and their states are the descriptors that will be used to describe the document's entities (taxa or other objects).

A valid SDD document may comprise character and state definitions without any described entities (such a document would be useful for sharing character and state definitions among several projects). Another valid SDD document may lack character definitions altogether (the <Terminology> element is optional) if it is to hold natural-language descriptions with no markup. If the document contains coded descriptions or marked-up natural language descriptions, a <Terminology> element is necessary.

Characters and states are defined in SDD documents in a list in the <Terminology> section. They are provided with key values by which they are then referenced throughout the document. Characters and states once defined may also be arranged into hierarchies or trees - see the topic Arranging characters into hierarchies (trees) for more information on this.

Example 3.3.1.1 defines a simple multistate character (Dorsal fin) with two states (higher than long and longer than high). Character and state labels are represented in English and German.
 

Example 3.3.1.1 - Specification of a simple multistate character in SDD

  <Terminology>
    <Characters>
      <Character key="1">
        <Label>
          <Representation audience="en5">
            <Text>Dorsal fin</Text>
          </Representation>
          <Representation audience="de5">
            <Text>Rückenflosse</Text>
          </Representation>
        </Label>
        <Type>nominal</Type>
        <Categorical>
          <States>
            <StateDefinition key="1">
              <Label>
                <Representation audience="en5">
                  <Text>higher than long</Text>
                </Representation>
                <Representation audience="de5">
                  <Text>höher als lang</Text>
                </Representation>
              </Label>
            </StateDefinition>
            <StateDefinition key="2">
              <Label>
                <Representation audience="en5">
                  <Text>longer than high</Text>
                </Representation>
                <Representation audience="de5">
                  <Text>langer als hoch</Text>
                </Representation>
              </Label>
            </StateDefinition>
          </States>
        </Categorical>
      </Character>
    </Characters>
  </Terminology>

Each character is defined in a <Character> element in the <Characters> collection. Characters must be furnished with Key values (e.g. <Character key="1">). These are used to refer to the character elsewhere in the SDD document. Key values for characters must be positive integers and must be unique across the set of characters specified in the document. In large projects, particularly collaborative ones, characters should be furnished with key values that are unique across the entire project, even if only a subset of characters are reported in any given SDD document.

<Character> elements have the following subelements (mandatory elements in italic, optional elements in plain text)

<Label> is used to provide a short, descriptive label for each character, such as would appear in a listing of characters.

<Type> specifies the type of the character. Characters may be of the following types:

In Example 3.3.1.1 the character is nominal. An ordinal character is defined in the same way, but in this case the states of the character are arranged in their natural sequence.

The <Categorical> element holds data relevant to nominal and ordinal (categorical) characters, such as the character's states.

States of categorical characters are defined in two or more <StateDefinition> elements in the <States> collection. Within <StateDefinitions>, states are defined and labelled in the same way as characters.

In the example, the alternate language representations for the character and state labels are provided using a collection of <Representation> elements each containing the label text in a <Text> element. Each <Representation> element references a specific audience using an audience attribute (see the topic Specifying languages and audiences for the document for more information on audiences)

Example 3.3.1.2 defines a simple numerical character (Tail length, measured in mm and specified using maximum, minimum and mean values). Again, labels for the character and for the statistical measures are represented in English and German.

Example 3.3.1.1 - Specification of a simple numerical character in SDD

  <Terminology>
    <StatisticalMeasures>
      <StatisticalMeasure key="1">
        <Label>
          <Representation audience="en">
            <Text>Minimum value</Text>
            <Abbreviation>Min</Abbreviation>
          </Representation>
          <Representation audience="de">
            <Text>Minimum</Text>
            <Abbreviation>Min</Abbreviation>
          </Representation>
        </Label>
      </StatisticalMeasure>
      <StatisticalMeasure key="2">
        <Label>
          <Representation audience="en">
            <Text>Maximum value</Text>
            <Abbreviation>Max</Abbreviation>
          </Representation>
          <Representation audience="de">
            <Text>Maximum</Text>
            <Abbreviation>Max</Abbreviation>
          </Representation>
        </Label>
      </StatisticalMeasure>
      <StatisticalMeasure key="3">
        <Label>
          <Representation audience="en">
            <Text>Mean value</Text>
            <Abbreviation>Mean</Abbreviation>
          </Representation>
          <Representation audience="de">
            <Text>Durschschnitt</Text>
            <Abbreviation>Durschschnitt</Abbreviation>
          </Representation>
        </Label>
      </StatisticalMeasure>   
    </StatisticalMeasures>
    <Characters>
      <Character key="1">
        <Label>
          <Representation audience="en5">
            <Text>Tail length</Text>
          </Representation>
          <Representation audience="de5">
            <Text>Schwanzlänge</Text>
          </Representation>
        </Label>
        <Type>interval</Type>
        <Numerical>
          <StatisticalMeasures>
            <StatisticalMeasure ref="1" key="1" />
            <StatisticalMeasure ref="2" key="2" />
            <StatisticalMeasure ref="3" key="3" />
          </StatisticalMeasures>
          <MeasurementUnit Postfix="true">
            mm long
          </MeasurementUnit>
        </Numerical>
      </Character>
    </Characters>
  </Terminology>

The three statistical measure(s) to be used in defining the numerical character (maximum, minimum and mean in this case) are first defined in the <StatisticalMeasures> collection of <Terminology>. Statistical measures are defined using <Label> and <Representation> elements as discussed for multistate characters above, allowing multilingual and multi-audience representations of all wordings. Note that a statistical measure need only be defined once, and may then be re-used for multiple characters.

Statistical measures must be furnished with Key values (e.g. <StatisticalMeasure key="1">). These are used to refer to the measure elsewhere in the SDD document. Key values for statistical measures must be positive integers and must be unique across the set of measures specified in the document. In large projects, particularly collaborative ones, measures should be furnished with key values that are unique across the entire project, even if only a subset of measures are reported in any given SDD document.

The character is defined using a <Character> element in the <Characters> collection as described for multistate character above. Since the character in this case is of type interval, the <Numerical> element is used instead of the <Categorical> element to hold data necessary to define the character.

Within the <Numerical> element, the particular defined statistical measures to be used for the character are specified using <StatisticalMeasure> elements of the <StatisticalMeasures> collection. Each measure to be used is furnished with a key attribute, and referenced back to the predefined statistical measure using a ref attribute. For example, in the measure defined by <StatisticalMeasure ref="1" key="1" />, key is the value by which the measure will be referred to within the context of this character elsewhere in the document, and ref is the reference to the Minimum value measure as defined.

<MeasurementUnit> is used to define the units appropriate to the character being defined. The attribute Postfix of <Measurement unit> specifies where the unit will be placed when the character is listed: if Postfix is True, the unit will be listed after the value (e.g. 7 mm long); if Postfix is False, the unit will precede the value (e.g. pH 7).

KRT Last Edit: 31 Dec 03