Provides root element as long as terminology and descriptions must be in the same document!
The version of the SDD standard used is defined in the namespace declaration and needs no separate data element.
This information refers to the last process that created this document, which does not imply that the data have been authored there. The information is intended for debugging purposes, and to improve import quality if certain generator versions produce abnormal code.
Optionally allows a generating application to identify which export routine created the document; some applications may have several alternative export routines.
Identifies the authors of the generating application, not the authors of the terminology, descriptions, and resources!
This is the copyright string of the generating application, not the copyright of the terminology, descriptions, and resources!
The date on which the generating application actually updated (or, if no updates occurred, created) the current xml file.
Required information defining the project itself. Covering the entire document, i. e. terminology, descriptions, and resource collection.
Item descriptions are defined as
optional to allow projects which
publish only terminology.
Global resource definitions containing URIs or actually embedded resources (e. g. encoded images).
Application specific information is placed in Processing Instructions. @@@DISCUSS: is this ok? Can PIs be easily parsed out by an application? @@@ Recommendation: Each application may read out its own information. Any other target information present should be preserved and output when a new document is generated. This is designed to support itempotent round tripping data between two applications. This implies that no dependency between the settings and the items and the terminology setting should be relied upon.
The labels of audience definitions are required and must be unique for a given language ('lang' attribute).
The labels of frequency definitions are required and must be unique for a given audience definition (as defined through "keyref" attribute on LinguisticSet).
The labels of modifier definitions are required and must be unique for a given audience definition (as defined through "keyref" attribute on LinguisticSet).
The labels of character definitions are required and must be unique for a given audience definition (as defined through "keyref" attribute on LinguisticSet).
The following xpath selects all CharacterGroupItem anywhere in the document. This is in fact more general and therefore computation intensive than necessary, a better xpath expression would be Terminology/CharacterGroupDefinitions/CharacterGroupDefinition//CharacterGroupItem. It should include all item nodes, regardless of their place in the tree structure. However, combining a defined path with an "all child" path seems to be not possible under the restrictions imposed on xpath expressions in xml-schema identity constraints.
The labels of character group definitions are required and must be unique for a given audience definition (as defined through "keyref" attribute on LinguisticSet).
The labels of character state definitions are required to be unique within each character and audience definition. This is tested by testing global uniqueness in combination with the unique character label. @@@NOTE: THIS DOES NOT WORK PROPERLY YET!@@@
Required information defining the project itself.
Sequence of authors and/or
editors (at least one is required).
A contributor is not fully a Creator of the project, but related to them. The output or the mapping to Dublin Core elements should differentiate between the 3 different elements within Creators.
Either a full date or a year
(1970-2100) are required
Either a full date or a year
(1970-2100) are required
The version number or code as defined by the project creators
The default audience is used whenever the setup of the consuming application has no other preference specified. The user interface of the application may then allow to choose a different audience/language available. Elements which are language, but not audience specific use the language of the default audience.
The terminology is designed by the biological specialist(s). It is the class definition, defining semantics and structure to the data defined in the item description.
An audience is a combination of language (including dialect) and expertise (pupil, beginner, expert). Multiple audiences can be defined for the same language and expertise, distinguished only by their label.
The key attribute of AudienceDefinition is an arbitrary string. It is referenced in all LinguisticSet elements to declare the intended audience.
ExpertiseLevel is restricted to values from 1-5. These categories allow to communicate expected expertise between different applications using the SDD schema. The recommended interpretation is:
1 = elementary school (year 1 to 6);
2 = middle school (year 7 to 10);
3 = high school (year 11 above) and general public (trying to avoid any specialized terminology or jargon);
4 = university students or (partly) trained personnel (using terminology, but avoiding or explaining problematic terminology);
5 = experts (using the full range of terminology).
Defines the semantics and labels of numeric measures (e. g. mean, min, max, s.d.). Unlike most other elements of the terminology, these definitions are constrained by the SDD model. In the current version they cannot be extended by the designer of the terminology.
A single label for each global state set, to identify it in the user interface
Each definition defines a fixed key value, multilingual label and glossary information (user extensible to new audiences) and attributes describing generalized semantics.
Please observe the following "Best practices recommendation": use the method, type, and value attributes rather than relying on the key strings, whenever this information is sufficient (e. g. for formatting routines or many query/identification purposes). Using type/method/value information allows your code to work if the list of definitions of statistical measures is extended.
Globally defined sets of states which can be reused in multiple characters. System and user definable definitions use identical element names to simplify key/keyref definitions.
This set contains "Special states", providing standardized reasons why data are missing. Unlike most other elements of the terminology, these are constrained by the SDD model and cannot be extended.
Special states all identify a reason why data are not known. In a single item they should only occur once per character. However for a class (e. g. a genus) it is up to the collation process whether to create multiple special states or not. An information like "unknown or not applicable" may be of interest for analytical purposes. The type is based on CharacterStateDefinitionType, but key values restricted to enumeration and no resources allowed.
The labels and abbreviations given for special states are only recommendations. They can be freely changed as long as the semantics are preserved.
2nd set for special states applicable to computed characters
These special states are already predefined here; they will be used as soon as a mechanism for computed characters is introduced.
These state sets are user definable.
A single label for each global state set, to identify it in the user interface
Frequency modifiers are used to describe state frequency (usually, rarely, etc.). They are defined globally but must be enabled for each character to be usable.
A group of related frequency modifier definitions, which can, e.g., be used in user interfaces to allow adding a set of definitions in a single step.
A single label for each global frequency definition set, to identify it in the user interface
Note that the upper and lower limits of several frequency modifiers within a set may overlap!
Modifiers are used to modify state expression in descriptions (strongly, at the tip, etc.). They are defined globally but must be enabled for each character to be usable.
A group of related modifier definitions, which can, e.g., be used in user interfaces to allow adding a set of definitions in a single step.
A single label for each global modifier definition set, to identify it in the user interface
Characters are defined in a flat list; multiple hierarchical views are implemented through the char. group definition below.
Character group definitions define flat subsets as well as hierarchical character trees. They are used for hierarchical display or filtering character subsets.
@@@DISCUSS: should character group hierarchies be recursively definable, as long as the resulting tree in acyclic?
Item descriptions may
contain multiple items
The item is a defined object that is described. This may be an abstract taxonomic concept (taxon, disease, etc.) or a physical object (individual specimen, part of individual, etc.).
The item may be defined by its taxonomic name, by the published source of the description, and by a specimen identifier.
These definitions may be free-form text or links to other database components.
@@@ Needs discussion! currently both as attributes on Item and as Elements within. Compare the use of metadata as elements, should this be rather elements or attributes?
This element contains an authored or autogenerated free-form item description ('natural language description'). It may be completely or partially marked up with elements similar to those in the coded description. If all markup except the wording content is removed, the original description can be losslessly recovered.
Retains the full, unchanged original wording of the natural language description. Character group, character, or state markup may be added (partial or complete), but these may not change the original wording sequence.
Character group markup is used to mark organism parts, methodological sections, etc.
In most cases initially the states
are recognized, but character
markup can be deduced from the
associations between char. and
states defined in the terminology.
Wording between characters groups or characters is necessary if markup is incomplete
In most cases states are initially recognized, but character markup
can be deduced from the
associations between char. and
states defined in the terminology.
Wording between characters groups or characters is necessary if markup is incomplete
"authored" descriptions may never be overwritten by a natural language reporting process, whereas "generated" descriptions may be updated. Both "authored" and "generated" descriptions may have markup, but do not need to.
Contains multiple resources (e. g. images) described by a single set of DC metadata. A description may consist of resources alone!
The coded description is entirely controlled by the vocabulary and structures defined in the Terminology section. It contains keyrefs to descriptors and modifiers (plus numerical values for measurements). Free-form text is allowed in Reported- or InternalNotes only. Separating data and terminology allows rearranging and refactoring the terminology, multilingual support through central terminology translations, and multiple hierarchical views.
@@@ Needs discussion! currently both as attributes on Item and as Elements within. Compare the use of metadata as elements, should this be rather elements or attributes?
@@@ Needs discussion! currently both as attributes on Item and as Elements within. Compare the use of metadata as elements, should this be rather elements or attributes?
Global resource definitions containing URIs or actually embedded resources (e. g. encoded images).
Optionally the full resource data may be embedded (as alternative or in addition to uri)
Defines the base type of a single LinguisticSet, contains only label, e.g. used for GlobalStateSetDefinition, FrequencyDefinitionSet, or ModifierDefinitionSet.
Defines an extended base type of a single LinguisticSet
Extends basic label type with a single wording element
Extends basic label type with a single wording element (with additional attributes for modifier wordings)
The wording for modifiers has additional attributes to define position relative to state (Postfix, AddBlank).
If Postfix is true the wording is output after the state wording, else before
Extends basic label type with a complex wording element; used in character grouping nodes and character references
Collection of audience-dependent linguistic sets (label etc., no wordings)
Contains labels and definitions, but no wording information
Collection of audience-dependent linguistic sets (label etc., single wording element)
A separate set of elements can
be defined for each audience.
The basic wording type contains
labels, definitions, and wording.
Collection of audience-dependent linguistic sets (label etc., single wording element)
Collection of audience-dependent linguistic sets (label etc., complex wording element)
A separate set of elements can
be defined for each audience
Allows to point to a the id of a resource as defined in the ResourceDefinition section
This is currently the
only linguistic set
container which is
defined by language
rather than audience!
A short, concise title.
Free-form text containing a longer description of the project.
Free-form description of geographic coverage of descriptions available in the current project.
Comma separated list. Use TDWG geographical standard. Use global' for world-wide scope.
Free-form text defining reference if information came mostly from one specific publication (printed or digital).
Currently contains only a ReportedNotes element but may be extended later
Defines a character in the terminology
Note on LinguisticSet contents: Character labels should be unique within the entire CharacterDefinitions collection (separately for each audience definition).
At least a single descriptor must be present (@@@ currently not implemented in the schema! @@@)
Optionally globally defined state sets may be referenced and thus defined for a character. In general at least the special states should be referenced here.
If Selections is missing, all descriptor states will be included, else only the selected ones.
These definitions constrain which global measure definitions can appear in items of this character.
Unit like mm, µm, °C. Attributes allow to output before value (e.g. pH 7.0) or without blank. The content allows some xhtml formatting to support "mm 2".
The key attribute must be unique and is referred to in the item descriptions. The keyref refers to the measure semantics defined in the global measure definitions.
(@@@PLACEHOLDER!@@@) Enabling of 0-n single frequency modifiers or 0-n frequency modifier sets for all states in a character.
(@@@PLACEHOLDER!@@@) Enabling of 0-n single modifiers or 0-n modifier sets for all states in a character.
Refers in coded item descriptions to a Character. MAY LATER NOT BE NEEDED!
Like CharacterReferenceType, but for usage in the NaturalLanguageDescription markup container (including Wording elements)
CharacterReferenceType_NL is similar to CharacterReferenceType, but not derived by extension, since not only the Wording element has to be added, but also state element changed to a different type allowing Wording inside.
Defines an entire character group (flat list or tree hierarchy)
Defines label displayed when a grouping is selected in the user interface
Purposes are standardized to simplify application interoperability.
Setting this purpose in a character grouping is a recommendation to applications with a user interface to use this as the default hierarchy for any editing or reporting purpose. The application may, however, enable the user to select any character grouping.
Setting this purpose in a character grouping is a recommendation to applications with a user interface to use this as the default hierarchy for editing the item description data set. The application may, however, enable the user to select any character grouping.
Setting this purpose in a character grouping is a recommendation to applications with a user interface to use this as the default hierarchy for editing the terminology. The application may, however, enable the user to select any character grouping.
Setting this purpose in a character grouping is a recommendation to applications to use this as the default hierarchy for building guided keys (e. g. dichotomous keys).
Setting this purpose in a character grouping is a recommendation to applications to use this as the default hierarchy for interactive identification.
Setting this purpose in a character grouping is a recommendation to applications to use this as the default hierarchy for natural language reporting.
MinimumExpertiseLevel: the designer of the subset expects the user to have a certain minimum expertise level. @@@ Needs discussion! @@@
The designer of a character grouping defines it as 'complete' to declare that it is intended to include all characters of the terminology. A terminology editing application can use this information e. g. to warn the designer about missing characters, to display special dialog boxes after the creation of a new character, etc.
This attribute is currently not used, instead the key of the StateSet has been fixed for special states to SpecialStates. Needs discussion!
Categorizing characters into basic property types (e. g. color, 2-dim. shape, 3-dim. shape, surface texture, taste, smell, behaviour, physiology, measurements, etc.) greatly improves the analysis and management of larger character sets and is therefore recommended. Note: Only a single character grouping should have this hierarchy type. (not enforced in schema, how can it be enforced? Other types occur multiple, i. e. one cannot make a UNIQUE statement on attribute!
A hierarchy that organizes characters by method, e.g. field observation, light microscopy, electron microscopy, molecular methods, culture techniques, etc.
A hierarchy that organizes characters by a morphological "contains" hierarchy: plant = root/stem/leaf, leaf = base/stipules/petiole/lamina, etc.
Defining a grouping as flat subset marks it as being intended only for filtering purposes and prevents it from being displayed as a choice for a hierarchy in a user interface. Note that conversely, the filter selection dialog should not be restricted to these groupings. Any character grouping, including part, method or basic property type hierarchies are valuable filters defining character subsets.
used for character groupings that fall into none of the categories above.
A node in a character group
Enable designer to annotate nodes in the grouping and add management comments
Natural language wording for character node needs to be here. The relevant or possible wording is constrained by the path in the tree, so it needs to be defined in the tree, not in the flat character list. In a methodological tree the wording may have to add part hierarchy information, in a part hierarchy methodological information.
The key for the character group item has been defined as required to document that an xs:key constraint exists on this attribute. It seems impossible to make existence of key optional and require keyrefs to only point to these existing keys.
Used in global and local character state definitions
Internal notes of the designer,
not multilingual.
Resources linked to a state definition, e. g. images illustrating expression of the state.
This attribute is currently not used, instead the key of the StateSet has been fixed for special states to SpecialStates. Needs discussion!
Observation of a character state in an item description (compare also StateDefinitionType!)
The three frequency
element variants are
distinguished
by their attributes!
direct single frequency value
direct frequency range
reference to globally defined
frequency modifier.
Contains ReportedNotes
Internal notes are present
only once, not multilingual!
Like CharacterStateType, but for usage in the NaturalLanguageDescription markup container
Analog to the CharacterStateReferenceType, for measures
Allows to point to a the id of a resource as defined in the ResourceDefinition section
Container for a to a list of resources references (as defined in the ResourceDefinitions section)
NEEDS DISCUSSION!
A single person or an author team.
A single person or an author team.
Either a full date or a year
(1970-2100) are required
Either a full date or a year
(1970-2100) are required
@@@ NEEDS DISCUSSION! Originally we proposed to use a set close to DublinCoreMetadata. This is still documented here, although it seems to be not fully appropriate!
Wording as used in the item: NaturalLanguageDescription container
string restricted to 1..20 character length
string restricted to 0..255 character length
string restricted to 1..255 character length
string required to be at least 1 character long (excluding empty string)
Double precision numeric value in the range of [0..1]
Restricted to specific integer values, indicating expertise from schoolchildren to taxonomic expert. See the separate documentation for the interpretation of values.
Allows basic character formatting using xhtml elements plus three semantic elements (citationauthor, taxonauthor, taxon; intended to be rendered formatted and for analysis). Note that no further formatting is supported within the semantic elements.
logical markup: emphasis: usually rendered italic
physical markup: italic that could not be interpreted as em or taxon markup
physical markup: subscript
physical markup: superscript
logical markup: strong: usually rendered bold
Author of a referenced citation. Recommended report rendering: may be either stripped or rendered as small caps
Author of a taxon. Recommended report rendering: see citationauthor
Recommended report rendering: italics
Extends the FormattedSimpleTextType and allows in addition to basic character formatting with <sup>, <sub>, <i>, <b>, etc. also the use of <img> and <a> elements. Further elements may be added in later versions of this schema.
image element, needs further attributes added to work!!
anchor/hyperlink element, needs attributes added to work!!
Extends the FormattedExtendedTextType and allows the following block level elements as well: p, ol, ul, li, h1-h6. THIS could be set to a full xhtml fragment definition, but must be without html/header/body elements!
p element, needs attributes added to work!! Change to mixed content. If possible reuse xml!