### Copyright © TDWG (Taxonomic Databases Working Group, www. tdwg.org), 2004. This file is a special version of the SDD XML schema. It may be used only for viewing convenience and may not be distributed independently from the primary schema files (SDD.xsd, UBIF.xsd, UBIF_TypeLib.xsd, etc.). The inclusion of all parts starts below: !###

XML schema to encode descriptive data in biology and other subjects. The primary goal of the design is to increase the knowledge and availability of knowledge about the diversity of life on earth. However, it may be used in many other areas (including medicine, pathology, archeology, anthropology) wherever objects or classes of objects are described for later reidentification.

The schema was designed by the Structure of Descriptive Data (SDD, http://160.45.63.11/Projects/TDWG-SDD/index.html) group. SDD was established 1999 as a subgroup of the Taxonomic Databases Working Group (TDWG, www. tdwg.org) of the International Union of Biological Sciences (IUBS). The author of the current schema version and of all annotations is G. Hagedorn, Berlin. The requirements for an SDD schema where elaborated in 6 major meetings of the SDD group and in discussions over the SDD email list. Over 60 people contributed to these discussions. However, the help, criticism and energy of Bob Morris, Kevin Thiele, Bryan Heidorn, Guillaume Rousse, Steve Shattuck, Donald Hobern, Trevor Patterson and Nicolas Bailly is specially acknowledged!

Copyright © TDWG, 26. June 2004. Licensed under GNU GPL 2 (http://www.gnu.org/licenses/gpl.html) - with the following restriction: This is a preliminary version (0.91!) for testing purposes. Permission to use this schema is granted to all scientific or commercial projects for a testing period of up to 3 years. After this time computer programs using this schema must either be discontinued or converted to the final version of this schema.

Conventions:
Element or attribute names starting with underscores (__) are present in the schema for discussion purposes only and should be only experimentally used. Annotations containing @ indicate unfinished points of discussion.

Note: blockDefault="#all" in xs:schema prevents substitution and that in instance documents derived types can be used in elements typed to the base type (which otherwise is possible using xsi:type=""). - finalDefault is not set, further type derivation is currently not considered problematic. Please contact us if you believe otherwise. Note that according to the w3c discussion forum, the developers of xml Schema consider to drop the final attribute in the upcoming XML Schema version 1.1. - Nillable: xsi:null is not supported in SDD documents (schema declaration nillable="false" is default, not explicitly stated).

This imports the UBIF schema file (SDD uses the same namespace as UBIF!). DescriptiveData must be placed inside the UBIF top-level Datasets/Dataset structure as the last element. Because of keyref constraints, this schema depends on the imported UBIF root! Descriptive data itself that are specific to SDD, i. e. descriptive terminology, coded and natural language descriptions and stored identification keys. The labels of audience definitions are required and must be unique for a given language. The labels of glossary definitions are required and must - in combination with the SensuLabel - be unique for a given language/audience combination. The key of coding status values must be unique. The labels of coding status values are required and must be unique for a given language/audience combination. Modifier set keys are required and must be unique. The labels of modifier sets are required and must be unique for a given language/audience combination. Modifier keys are required and must be unique regardless of modifier type (certainty, frequency, etc.). The labels of modifier definitions are required and must be unique for a given language/audience combination. Additional character-type-specific key. Additional character-type-specific key. Additional character-type-specific key. The labels of character definitions are required and must be unique for a given language/audience combination. This provides a joint key for states either defined locally within a character (= StateDefinition), or referenced from ConceptStates (StateReference; provides a new local key). This is the only key for them; no separate keys for locally defined/concept reference within the character are defined. Note that state keys are unique across all characters, not only within each character. The labels of concept tree definitions (= entire trees, not the nodes within the trees) are required and must be unique for a given language/audience combination. This collects all keys of Concept elements. Note that no UniqueLabelText constraint is defined for concept tree nodes; the labels on tree nodes are optional and not required to be unique. They are expected to be displayed together with their parents and thus obtain their uniqueness from the context (the path from the root should be unique, but this is not expressed in the schema). Also, the xpath selector selects all Concept elements anywhere in the document, which is more general (and therefore computation intensive) than necessary. A better xpath expression would be Terminology/ConceptTrees/ConceptTree//Concept, which includes all nodes regardless of their place in the tree structure. However, combining a defined path with an "all child" path is impossible under the restrictions imposed on xpath expressions in xml-schema identity constraints. Further, since keyed elements in a tree are collected from the document root rather than from the tree root, the node element names must be unique names in the entire schema. A related problem is that we desire to use the same element name (Concept) for node definitions and references to these nodes (from within NaturalLanguageDescriptions). Although the latter has a ref instead of an id, Schema would normally complain that the id is missing, rather than understanding that only elements possessing a id are to be selected for the xs:key constraint. Using a combined xpath ".//Nodes/Concept" is a solution. It does, however, work only if the id of the root node is not included. Since no reason exists to do so (this id is redundant) we accept this. All concept state id values must be unique in an entire project. Compare ConceptKey. A joint key for all CodedDescription or NaturalLanguageDescription elements. This identifies an entire stored key (i. e. not the nodes/steps in the key) The labels of stored key definitions (i. e. for the entire key) are required and must be unique for a given language/audience combination. This collects all ids of nodes in stored keys (Lead elements). Compare the note on Concept about a potentially better xpath. Within UBIF (Unified Biosciences Information Framework), this represents the node where the SDD-specific data start Default settings etc. = SDD- specific additions to general dataset metadata Default Audience and Concept tree for interactive identification. If configuration data exist at all, at least the default language should be configured! The default language and audience is used whenever the consuming application has no other preference specified. The user interface of the application may start with the default audience representations and then allow to choose a different audience. The default concept tree used to arrange characters in interactive identification. [ATTR: ref] If a class hierarchy is present, the hierarchy used for inheriting and aggregating descriptive data in the class hierarchy must be defined. [ATTR: ref] Defines the operational terminology (parts, characters, states, etc.) in which descriptions are expressed. The terms are defined by the biological specialist(s). They are used in the descriptions through references to their 'id' attributes). General vocabulary lists and rules; applicable outside of the domain of descriptive data. In contrast to enumerated types in the schema, new values and labels for new languages/audiences can be defined. A list of audiences addressed in the project. An Audience is an extension of language (including dialect), esp. with expertise (pupil, beginner, expert). @@ to be discussed! @@ For natural language reporting some rules can be defined per language rather than per audience. If a rule for a language used in an audience definition is missing, applications may add a default language rule to the project data. [ATTR: language, dir] @@This whole sequence is not functional, just a bunch of ideas for discussion! @@ Should for each of Or, And, etc. an entire delimiter-group be defined? @@ Should only 'Or' be defined and 'And' etc. left to the override mechanism available anyways in the concept trees? @@ unclear whether used. DeltaAccess defines on the character level whether states are combined with or, and, to, or with. This has not yet been worked out for SDD! Instead originally SDD attempts to succeed just with delimiters. Combining delimiter rules with conditionally different operators is a problem, however!@@ @@ unclear whether this would be used For states that intergrade: 'red to orange' The words listed in this collection should no be capitalized at the start of a sentence, or decapitalized when sentences are joined. The labels of audience definitions are required and must be unique for a given language ('language' attribute). Defines the semantics and labels of coding status values (e. g., unknown, not applicable, not interpretable). Coding status values (= 'missing data indicators', = 'special states') provide standardized reasons why data are missing. Unlike most elements in Terminology, these are constrained by the SDD model and can only be extended by revising the SDD standard (may be changed to user-definable in a later version of SDD). Labels are already user-definable to support multiple audiences. The labels and abbreviations given are only recommendations. They can be freely changed as long as the semantics are preserved. [ATTR: id] Contains detailed and fundamental terminological definitions (of object/part, object types, property, method, property state, etc.). These are referenced in character, state, etc. definitions, but may also be used independently. The Glossary entry for a single concept (object, method, property, state, etc.), which may be expressed in multiple audience-specific representations. [ATTR: id] The audience values must uniquely identify the Representations within each GlossaryEntry. Certainty, frequency, and other modifiers modify categorical states or statistical measures. Modifiers are defined for the entire project, but must be enabled for characters in concept nodes to be available when editing descriptions Modifiers within a group may be ranked/ordered; in descriptions, modifiers from the same group are combined with 'or'. All modifiers in a group must be of the same type (certainty, frequency, etc.). Characters are the operationalized concepts used in descriptions. They define categorical states and quantitative values or statistical measures. Characters are defined in an unordered flat list. They can be used alone or in combination with modifiers or concept trees. Hierarchies of property, object part, observation methods or other concepts. Concepts can be operationalized by referring to characters (only these allow scoring of data in the descriptions). Concept states (property or kind-of-part, reusable in multiple characters) and char. dependencies are expressed here as well. Concept trees may also be used to define flat character subsets for filtering purposes. @@DISCUSS: should concept tree hierarchies be recursively definable, as long as the resulting tree is acyclical?@@ Importantly, this would allow to define generalization and part-of relations between parts/structures! [ATTR: id] Authored or auto-generated free-form descriptions, which may be completely or partially marked up with elements similar to those in coded descriptions. Note: If all coded markup except the Text content is removed, the original natural language text description can be recovered without changes (lossless). [ATTR: id] Largely language-independent descriptions entirely controlled by Terminology. Both coded and nat. lang. may describe either abstract class concepts (taxon, disease, etc.) or physical objects (individual specimens). [ATTR: id] Dichotomous or multifurcating authored keys (incl. legacy data) (Note: Identification keys may also be created dynamically based on data in terminology and descriptions. These keys are intended to represent only manually authored keys, whether from capturing legacy data or newly designed.) [ATTR: id] ==== GENERAL DECLARATIONS START === The following types are used in the Terminology/General section. They define generic concepts not in principle restricted to the use in descriptive data. --- Audiences are an extension of language/culture codes, to capture expertise and other factors (= registers within a language). An Audience is a combination of an enumerated expertise category (pupil, beginner, expert) and a free-form scope definition. As a result, multiple audiences can be defined for the same expertise, distinguished only by their label. A concise label for the audience; plus optional details clarifying the role / definition of the audience. Properties describing machine-readable partial semantics for an audience definition. ExpertiseLevel is restricted to values from 0-5. These categories allow to communicate expected expertise between different applications using UBIF. Recommended interpretations: 0 = expertise level undefined 1 = elementary school (year 1 to 6); 2 = middle school (year 7 to 10); 3 = high school (year 11 above) and general public (trying to avoid any specialized terminology or jargon); 4 = university students or (partly) trained personnel (using terminology, but avoiding or explaining problematic terminology); 5 = experts (using the full range of terminology). The value that is referenced whenever an audience="..." attribute is used in audience-specific elements. Collection of AudienceDef-type elements [ATTR: audiencekey] The audiencekey string is referenced in all audience specific elements (labels, definitions). Recommendation: audience keys should consist of the expertise level from 0-5 (followed by period-number if a second audience for the same language and expertise level is defined). Examples: '0', '3.12' Audience references are always an optional addition to a language reference! (Note: If audience definitions are present, the audience attribute of AudienceRef should be treated as having a default value pointing to the first audience with expertiselevel=0 (undefined). Setting audience "default=0" in schema (together with the keyref identity constraint on audience) would require all documents to have audience definitions, which was considered undesirable.) --- Coding status allows to express reasons why data are missing (not coded) Project-wide definition of CodingStatus values Properties describing machine-readable partial semantics for a coding status value. Provided to support generic application code that continues to function if additional codes are defined. @@ Both proposals need elaboration and discussion! To be coded / Not to be coded / Cannot be coded / coded successfully NotEvaluated / CannotExist / DoesNotExist / Exists Enumeration used in CodingStatus/Specification. These required values enable applications to interprete user-defined coding status values. To Be Coded -- Information has not yet been entered, but is is planned to do so. Not To Be Coded -- Information has not yet been entered, and is is not planned to do so (esp. because resources are lacking and other characters should have priority). Cannot Be Coded -- Information cannot be entered due to objective (inapplicable character) or subjective (cannot interpret available data) reasons. Coded Successfully -- Information has been entered successfully. Enumeration used in CodingStatus/Specification. These required values enable applications to interprete user-defined coding status values. Not Evaluated -- The presence of information has not yet been evaluated. Cannot Exist -- Information cannot exist for logical reasons (i. e. a character with a coding status having this value is inapplicable). Does Not Exist -- Information should exist, but extensive research has failed to find it. Exists -- Information has already been found, but may not yet have been entered. Refers to CodingStatus values (e. g., from within descriptions) Refers to a CodingStatus value (Terminology/General/CodingStatusValues/Status/@id) -------------------------------- START Modifiers (uses polymorphism!) --------------------------- The modifier type system covers expressions of certainty, frequency, manner, degree, etc. that can be added to existing character value or state expressions. The modifier system is complex and uses abstract base and derived types both for modifier definitions and for references applying these modifiers to statements in descriptions. Quick overview over the primary entry types: Modifiers are defined in ModifierSet elements. Recommended applicability of sets to characters is defined in the concept trees. Single modifiers are applied to descriptive statements using the PolymorphicModification/PolymorphicModificationMarkup groups. 1. --- Modifier definitions a) Modifiers are grouped into sets because of ranking (ordering) within a set (and for management purposes). All modifiers in a set must be of the same modifier type (e. g., all are frequencies), else ranking would not be meaningful. A set of modifiers of a single type that has a label and may define order/rank.for the contained modifiers Label expressing the concept or scope of each modifier set. All Representations within a Label must have different language values. = 'Modifiers are ranked'. If true, the sequence of modifier elements in instance documents is semantically meaningful (as in 'weakly' - 'moderately' - 'strongly'). If false the sequence is intended for display purposes only. Refers to a ModifierSet, used in ConceptTree//Concept to define recommended modifier sets The ref attribute refers to a modifier set (Terminology/Modifiers/ModifierSet) b) Single modifier definitions. Abstract base type and derived types to be used in instance documents. Note that 'Frequency'Modifier, 'CertaintyModifier', etc. may have been named 'FrequencyModifierDef', etc.; they have been abbreviated to improve the readability of instance documents in case xsi:type would have been used. Abstract base type for state or character modifier definitions (certainty, frequency, etc.) -- Character modifiers: Abstract base type for modifiers applicable to character types in principal Definition of certainty modifiers (perhaps, probably, etc.) An estimate of a probability range for verbal modifiers, defined through two attributes. The upper/lower limits of probability modifiers may overlap. The default values are 0-1, indicating that no estimate was possible. If present and true the current modifier indicates that the state to which it refers is present or true only due to a misinterpretation. The probability range should be 0 to 0 = certainly false. Definition of spatial modifiers (proximal, distal, at base, at tip, etc.) In version SDD 1.0 this element is defined only to support forward compatibility; no specification details are defined for this modifier type yet. Definition of temporal modifiers (earlier, later, in summer, in spring, etc.) In version SDD 1.0 this element is defined only to support forward compatibility; no specification details are defined for this modifier type yet. Definition of character modifiers not yet covered by the categories above (open extension!) -- Categorical state modifiers: Abstract base type for modifiers applicable only to categorical states Definition of frequency modifiers (rarely, usually, etc.) An estimate of a frequency range for verbal modifiers, defined through two attributes. The upper and lower limits of several frequency modifiers may overlap. The default values are 0-1, indicating that no estimate was possible. Definition of modifiers restricted to single categorical statmenents, esp. modifiers/adverbs of degree and manner (strongly, very, darkly, etc.). (Note: the grammatical concept of adverbs of manner often includes the certainty modifiers, which should not be included here!) - (It is expected that this list may have to be extended in future SDD versions, creating additional specific modifier types for those lumped in OtherModifiers) - (Open questions: a) can approximations ('ca.', 'roughly') be handled as CertaintyModifiers or is a separate type desirable? b) should manner, degree, intensity become separate types? c) Specification of spatial and temporal modifiers must be elaborated!) c) Collections of modifier definitions. Abstract base type and derived types to be used in instance documents. The ModifierSet type refers to these collections in a polymorphic way. This allows to define a collection of ModifierSet elements, each set containing multiple modifiers of a single modifier type. Abstract base type of a collection of modifiers of a single type. In instance documents one of the following non-abstract types must be used. (This is an abstract type, specific derived types will be used in instance documents!) [ATTR: id] -- Character modifiers: (Collection; derived from ModifierDefs abstract base type, restricted to specific modifier type) (Collection; derived from ModifierDefs abstract base type, restricted to specific modifier type) (Collection; derived from ModifierDefs abstract base type, restricted to specific modifier type) (Collection; derived from ModifierDefs abstract base type, restricted to specific modifier type) -- Categorical state modifiers: (Collection; derived from ModifierDefs abstract base type, restricted to specific modifier type) (Collection; derived from ModifierDefs abstract base type, restricted to specific modifier type) d) Group combining the various derived collection types into a polymorphic structure (options are an explicit choice or the use of base type plus xsi:type). (General note: Places in the schema where a polymorphic design is intended (different derived types in place of an abstract base type) are highlighted using model groups the name of which starts with 'Polymorphic'. @@If two similar groups names are present in the schema, only the first will be used, the one starting with '__' shows an alternative design!) Collection of modifier definitions of a single type: Point for type polymorphism (multiple derived types in place of an abstract base type) Collection of modifier definitions of a single modifier type. Requires an xsi:type specification in instance documents: "CertaintyModifiers": express the certainty of categorical or statistical statements ('perhaps', 'probably', 'almost certainly'). 'True-by-misinterpretation'- modifiers are included as a special case of 'certainly false'. "FrequencyModifiers": used to describe state frequency (usually, rarely, etc.). In descriptions frequency range estimates can also be stated numerically! "Spatial-/TemporalModifiers": only predefined, no specifications yet! It is believed that specifications may be desirable here in the future. Defining them requires significant work, however. "OtherModifiers": (manner, degree, intensity; e. g., 'strongly', 'weakly'). These convey their specific semantics only to human consumers (or processors parsing and interpreting label text). Collection of modifier definitions of a single type: Point for type polymorphism (multiple derived types in place of an abstract base type) (Unlike most similar poly- morphisms, this is not a collection; each set may occur only once.) Express the certainty of categorical or statistical statements ('perhaps', 'probably', 'almost certainly'). 'True-by-misinterpretation'- modifiers are included as a special case of 'certainly false'. Only predefined, no specifications yet! It is believed that specifications may be desirable here in the future. Defining them requires significant work, however. Only predefined, no specifications yet! It is believed that specifications may be desirable here in the future. Defining them requires significant work, however. Other, so far untyped modifiers of manner, degree, intensity (e. g., 'strongly', 'weakly'). These convey their specific semantics only to human consumers (or processors parsing and interpreting label text). Used to describe state frequency (usually, rarely, etc.). In descriptions frequency range estimates can also be stated numerically! Modifiers of degree or manner, specific to categorical states (very, strongly, etc.). 2. --- Simple Modifier references (used in coded descriptions). a) Abstract base types Abstract base type for an actual modification of a statement. In instance documents the following derived types will be used, either referring to a defined modifier category, or giving explicit numerical ranges/values. Refers to a any kind of modifier definition type (Terminology/Modifiers/ModifierSet/*/Modifier/@id) Abstract base type including all references to CharacterModifierDef Abstract base type including all references to StateModifierDef Abstract base type, adding ProbRangeAttributeGroup. Currently used only for Frequency modifiers, where exact frequency values may optionally be given in descriptions. (Attribute modeling group used in StateModificationPlusProbabilities/Markup. In theory the attributes could be inherited from UBIF complex type ProbabilityRange, but this would require multiple inheritance!) Lower value of a probability range (values 0 to 1 inclusive). Note: to specify a single, exact value set both lower and upper attributes to this value! Upper value of a probability range (values 0 to 1 inclusive). Note: to specify a single, exact value set both lower and upper attributes to this value! b) Derived types to be used in instance documents. Note that 'Frequency', 'Certainty', etc. may have been named 'FrequencyModifierRef', 'CertaintyModifierRef', etc.; they have been abbreviated to improve the readability of instance documents in case xsi:type would have been used. -- Reference to character modifiers: Refers to a certainty character modifier Refers to a certainty modifier (Terminology/Modifiers/ModifierSet/CertaintyModifiers/Modifier/@id) Refers to a spatial character modifier Refers to a spatial modifier (Terminology/Modifiers/ModifierSet/SpatialModifiers/Modifier/@id) Refers to a temporal character modifier Refers to a "Temporal" modifier (Terminology/Modifiers/ModifierSet/TemporalModifiers/Modifier/@id) Refers to a character modifier not covered by the types above Refers to an "OtherModifer" modifier (Terminology/Modifiers/ModifierSet/OtherModifers/Modifier/@id) -- Reference to categorical state modifiers: Refers to a frequency modifier (e. g., from within categorical character data) Refers to a Frequency modifier (Terminology/Modifiers/ModifierSet/FrequencyModifiers/Modifier/@id) Refers to a state modifier (e. g., from within categorical character data) Refers to an "StateModifer" modifier (Terminology/Modifiers/ModifierSet/StateModifiers/Modifier/@id) c) Groups combining the various derived types into a polymorphic structure (options are an explicit choice or the use of base type plus xsi:type). Modifier reference in CodedDescr.: Point for type polymorphism (multiple derived types in place of an abstract base type) (The element sequence in instance documents is informative!) Abstract modifier reference, applying a modifier to a descriptive statement. In instance documents a specific derived type must be used. [ATTR: ref] Modifier reference in CodedDescr.: Point for type polymorphism (multiple derived types in place of an abstract base type). In object-oriented programming the following choice should be replaced with a polymorphic design, using a collection of the common base type! (The element sequence in instance documents is informative!) [ATTR: ref (= for all elements above)] Modifier reference or value in CodedDescr.: Point for type polymorphism (multiple derived types in place of an abstract base type) (The element sequence in instance documents is not informative!) Abstract modifier reference, applying a modifier to a single categorical state. In instance documents a specific derived type must be given (e. g., xsi:type = 'Frequency'). [ATTR: ref, and depending on derived type: lower/upper] Modifier reference or value in CodedDescr.: Point for type polymorphism (multiple derived types in place of an abstract base type) Currently limited to a single frequency and modifier per state, in an explicit attempt to simplify the SDD data model! [ATTR: ref (= for all elements above)] Public notes or comments, for multiple languages. Applications may, e. g., report the text in brackets after the character state. The audience values must uniquely identify the Representations within each Note element (no duplicate audience allowed). Flag states, which applications may use as a template for new descriptions. Templates may be categorical states or coding status values (but currently not measures). Rules for finding templates: a) For class description (i. e. using Header/ClassName) find all higher classes (according to ClassHierarchy) and copy the template states from there. b) For unit/object description (Header/Unit) find the Class assigned to the Unit and copy template states directly from this class (@ and higher classes?). It is expected that the scoring is revised by an expert; thus template states may be defined in cases where they apply only to the majority of subclasses. @@Can this perhaps be handled by new kind of CodingStatus instead?@@ 3. --- Modifier references extended with Text element (used in natural language markup). a) Abstract base types Abstract base type adding a Text element for markup Abstract base type including all references to CharacterModifierDef Abstract base type including all references to StateModifierDef Abstract base type, adding ProbRangeAttributeGroup. Currently used only for Frequency modifiers, where exact frequency values may optionally be given in descriptions. b) Derived types to be used in instance documents. Note: each of the derived types could alternatively be derived from the simple reference (CertaintyMarkup from Certainty, etc.). To allow a polymorphic collection of any markup modifier type, all are derived from the abstract StatementModificationMarkup type, however. The derivation by restriction currently changes only the annotation of the ref attribute! To make it more specific, a future schema version could use modifier-specific simple types derived from ModifierRelationID as types of the ref attribute. -- Reference to character modifiers: Variant of Certainty (modifier reference), with Text inside. Refers to a Certainty modifier (Terminology/Modifiers/ModifierSet/CertaintyModifiers/Modifier/@id) Variant of Spatial (modifier reference), with Text inside. Refers to a Spatial modifier (Terminology/Modifiers/ModifierSet/SpatialModifiers/Modifier/@id) Variant of Temporal (modifier reference), with Text inside. Refers to a "Temporal" modifier (Terminology/Modifiers/ModifierSet/TemporalModifiers/Modifier/@id) Variant of OtherMod (modifier reference), with Text inside. Refers to an "OtherModifer" modifier (Terminology/Modifiers/ModifierSet/OtherModifers/Modifier/@id) -- Reference to categorical state modifiers: Variant of Frequency (modifier reference), with Text inside. Refers to a Frequency modifier (Terminology/Modifiers/ModifierSet/FrequencyModifiers/Modifier/@id) Variant of StateMod (modifier reference), with Text inside. Refers to an "OtherModifer" modifier (Terminology/Modifiers/ModifierSet/StateModifers/Modifier/@id) c) Groups combining the various derived types into a polymorphic structure (options are an explicit choice or the use of base type plus xsi:type). Modifier reference in NLD: Point for type polymorphism (multiple derived types in place of an abstract base type) (The element sequence in instance documents is informative!) Abstract modifier reference, applying a modifier to a descriptive statement. In instance documents, a specific derived type must be specified (e. g., xsi:type = 'FrequencyMarkup'). [ATTR: ref, and depending on derived type: lower/upper] Modifier reference in NLD: Point for type polymorphism (multiple derived types in place of an abstract base type) (The element sequence in instance documents is informative!) [ATTR: ref (= for all elements above)] Modifier reference or value in NLD: Point for type polymorphism (multiple derived types in place of an abstract base type) (The element sequence in instance documents is informative!) Abstract modifier reference, applying a modifier to a descriptive statement.In instance documents, a specific derived type must be specified (e. g., xsi:type = 'FrequencyMarkup'). [ATTR: ref, and depending on derived type: lower/upper] Modifier reference or value in NLD: Point for type polymorphism (multiple derived types in place of an abstract base type) (The element sequence in instance documents is informative!) [ATTR: ref (= for all elements above)] -------------------------------- END Modifiers --------------------------- ==== GENERAL DECLARATIONS END === ==== GLOSSARY START === Glossary entries are largely (but not exclusively) defined by audience-specific representations An entry in the terminological glossary, providing an attribute "id" by which the entry can be referred to. Audience-specific representations of a glossary entry. All audience-specific versions must define the same concept. If, for example, a fructification would be considered a 'berry' in French but not in Chinese (i. e. the definitions have different widths) and both concepts are used in different descriptions in a project, these concepts must be placed in different GlossaryEntry elements (not in different Representations).The different concepts could still be translated the to other languages (term allows phrases as well as words). [ATTR: audience] Multiple citations (publication + page number) Creators, Revision status, and dates of this audience-specific glossary definition. Constrained vocabulary (Structure, PropertyTerm, FunctionalConcept, MethodOrProcedure, ChemicalCompound, Modifier, NomenclatureTerm, OtherTerm) @@ Please comment on necessity of this! @@ Ontological relations General relations between terms Kind-of or is-a relationship (class inheritance hierarchy). This may inform about structures (a sepal is a kind of leaf), properties (metallic color is a kind of color, 2-dimensional shape is a kind of shape), property states (ochre is a kind of brown, subglobose is a kind globose), functional terms (anther is kind-of sexual organ) etc. {directional!} Each concept term may occur only once in the collection. Terms that have identical or nearly identical definition (e. g., technical terminology and plain English equivalent). {bidirectional} Each concept term may occur only once in the collection. A term of opposite meaning. Usually a single term, but may refer to two synonymous terms. {bidirectional} Each concept term may occur only once in the collection. Related concepts and terms. Used to express unspecific relations not yet expressed in the previous relationships. The list of related terms may also be viewed as a keywords list! {bidirectional} Each concept term may occur only once in the collection. Misinterpretations are especially interesting to improve error tolerance in identifications. May refer to structure and property terms. Example: Cyathium is misinterpretable-as flower. @@?? necessary in addition to structural kind-of relations? {directional} Each concept term may occur only once in the collection. (applicable only to parts/structures) Part-of (= aggregation) relationship (class composition hierarchy). Notes: Both KindOf and PartOf relationships define 'broader terms'. Multiple part-of parents are possible if different composition concepts exist. {directional!} Each concept term may occur only once in the collection. Only for structures. Example: The thumb is adjacent to the index finger, connected to the palm of the hand, and part of the hand. {bidirectional} {@@This term seems to be particularly problematic and will not be included in the first release of SDD} Each concept term may occur only once in the collection. Only for structures. Example: The thumb is adjacent to the index finger, connected to the palm of the hand, and part of the hand. {bidirectional} Each concept term may occur only once in the collection. Developmental and evolutionary relations To express developmental (temporal, = "develops from") processes that change one structure into another. Examples: seedling develops-from seed; zygote develops-from male and female gametes (i. e. multiple parent terms).{directional!} Each concept term may occur only once in the collection. All children and the parent are homologous. Usually applied to structure terms. {directional} Each concept term may occur only once in the collection. Used as an alternative to Phyl.- DerivedFrom if no ancestral term is available. {bidirectional} Each concept term may occur only once in the collection. Audience-specific definitions primarily aimed at human consumption, but with the intent to be useful to computer linguistic ontological agents as well. The head term (phrase of one to several words) representing the concept (structure, property, method, modifier, character, state, etc.) that is being defined. If different definitions exist for a term (e. g., following different scientific schools), a distinguishing label (to be added after Term + "sensu") should be provided. @@ alternative names for element: ConceptLabel, ConceptQualifier?@@ @@ConceptLabel has been added to the UNIQUE definition, but needs testing what happens when missing. Will terms still be required unique? Definition text, explaining the concept (meaning, semantics) of a structure, character, state, etc. A single paragraph long; but new line (
) may be used.
Optional URI of an external definition, in addition to the internal Definition above. Audience-dependent resources used in the definition (e. g., images with text, videos with speech, or images intended for audiences of different expertise). Each media resource may occur only once in the collection.
Refers to a Glossary entry (e. g., from tree nodes or character states) Refers to a glossary entry (Terminology/Glossary/GlossaryEntry) GlossaryEntry reference. Note: a model group is used so that the keyref identity constraint may be defined only once in a central place. Schema 1.0 does not allow to define keyrefs on types, only elements! Reference to the definition of term or concept in the glossary. This glossary entry may provide definitions for multiple audiences and may include media resources like images. [ATTR: ref] (This identity constraint is placed on a global element!) Collection of glossary entries (identified by their id) Defines the type of a concept tree (list of enumerated values to support application interoperability). ==== GLOSSARY END === ==== TERMINOLOGY START === -------------------------------- START Characters and dependent objects (states, statistical measures) --------------------------- 1. --- Character definitions (characters = data recording and analysis variables, depending on observed part, property, and observation or measurement methodology) a) Abstract base type and derived types to be used in instance documents. Defines a character in the terminology. Abstract base type, one of the extensions below must be used in instance documents Only a simple label for presenting characters in a flat list is defined here. (Abbreviated char. labels for tabular reports, natural language wordings, etc. can be defined in concept trees!) All Representations within a Label must have different language/audience values. Meta information, rating characters under various aspects. Intended to guide a best- next character algorithm. # Derived from AbstractCharacter to be used in instance documents (non-abstract type). Categorical data include nominal and ordinal data (DELTA types UM/OM and NEXUS types). Other terms for categorical data in statistics are 'qualitative data' or 'attributes'. The term 'attribute' has been avoided in SDD because it has different definitions in statistics, programming, databases, DELTA, etc. Both 'qualitative' and 'attribute' are ambiguos as to whether ordinal/ ranked variables are in- or excluded. Extension of the common character properties with those specific to categorical data (= 'states'). An optional specification of the kind of categorical character variable. The available measurement scales are 'nominal', and 'ordinal'. The distinction between linear ordering and other kinds of ordering is made separately! Any categorical variable can assume only a limited number of discrete values. Thus data recorded in a CategoricalCharacter are always discrete (= discontinous or meristic). However, the measured property may either be naturally discrete ('male/ female', 'aseptate/ uniseptate/ biseptate/muriform'), or it may be continuously varying and partioned into into discrete categories ('no/few/many hairs', 'orange to red'). Only in the latter case the between-operator can be used on neighboring states. Some characters may have complex states relations (trees) or the homology of multiple state may be unknown. A conservative phylogenetic analysis may want to treat each state as a separate column with a binary coding of presence/absence of a specific state value. What would be a good term for this? AnalyzeStatesSeparately AnalyzeStatesAsPresentAbsent TreatStatesAsIndependentVariablesInAnalysis Mappings between categorical states (e. g., subovate may be mapped to ovate to simplify identification choices). Each mapping defines a source and a destination state. Both From and To may point multiple times to the same state, but the combination From + To must be unique. Both state must be defined in the current character (validated through identity constraint!) [ATTR: ref] Both To and From should point to a different character than the current (not validated). No explicit character reference is required, since state references are unique within a dataset. [ATTR: ref] A state may be mapped to multiple other states in the same character, or multiple states may be mapped to a single state, but the combination of From and To may only occur a single time. (States are defined outside the type specific tree, since categorical states may be present in addition to numerical data) (The element sequence in instance documents is informative!) Local definition of a state [ATTR: id] Reference to a single concept state (as defined project-wide at a concept tree node); extended with an id definition so that the state in the context of the current character can be referred to from descriptions. [ATTR: id, ref] References to project-wide defined ConceptStates (defined at the nodes of concept trees) must be unique within each character. This is achieved by a uniqueness constraint (local to each character) on the ref attribute of StateReference. The id attribute is already unique through the general CharacterStateKey. The labels of character state definitions are required to be unique within each character and audience definition. Note that this includes both the locally defined states and the referenced concept states. # Derived from AbstractCharacter to be used in instance documents (non-abstract type) Quantitative data include data like the DELTA types IN/RN. They are not supported by NEXUS. Extension of the common character properties with those specific to numerical measurements Especially including a more detailed measurement scale. --- Note: Unlike the states in categorical characters, the applicability of statistical measures to a character is not defined in the character. Any measure used in a description constitutes valid information. However, a list of recommended measures for sets of characters may be defined in concept nodes. An optional specification of the kind of numerical character variable. The numeric scales are 'interval', and 'ratio'. Interval differs from Ratio that the 0-value is an arbitrary point (e.g. in °C/°F) so that ratios should not be calculated. If true, an application may issue a warning if sample measurements are not integer. Note that most statistical measures are real values for integer data (min/max/TotalRange being exceptions). Data are continous if theoretically any value is possible with a sufficiently fine measurement method. They are discrete if only certain values are possible and gaps between values exist. The value must be false for ValuesAreInteger= true. It may also be false for real numeric values (esp. for ratio data based on counts). An inclusive range defined through two attributes into which all measured values and most statistics (mean, extremes, ranges, etc.) should fall. Only dimensionless statistics (variance, sample size) are not to be tested against the plausibility range. This does not define a schema constraint; applications may ignore this, enforce it strictly, or issue warnings when violated. [ATTR: lower, upper] Circular data are a special kind of MeasurementScale='interval'. If this data element is present, lower and upper define the values joining the circle. Example: '0, 360' for compass values, '0, 24' for hours of day. Compare Zar 1984: 422ff. Mappings of numerical ranges to categories (like DELTA Key States) Each mapping defines a lower and an upper value to map numerical ranges to categorical states in the same character. A CompareWith attribute defines which kind of statistical measure (mean, confidence interval, or min/max) is used for the comparison. An inclusive range defined through two attributes ('lower', 'upper'), plus a 'comparewith' attribute defining the preferred kind of measure. [ATTR: lower, upper, comparewith] The type of statistical measure with which the mapping range defined through Lower/UpperValue is compared. This may be a central value (mean, median), the range (quantile, confidence interval, etc.) or the extremes (minimum/maximum). Currently only these three categories are defined. The categorical state corresponding to the range defined in From. [ATTR: ref] Refers to a measurement unit (like mm, µm, °C) defined in Terminology/General. To simplify integration of descriptions from different data sources, different (but compatible!) measurement units may be used in different descriptions. However, the unit set here is recommended for data input and reports. Further, this is the default measurement unit if numerical data in a description declare no unit. Measurement units apply only to those statistical measures not marked with IsDimensionless='true'. The number of figures in normal (non-dimensionless) measures assumed to be significant for all data in this character. Note that in sample values the 'significant' attribute also records the number of significant figures for individual measurements. Free-form information about accuracy of measurement?? Accuracy characterizes the how close a measured value is expected to be to the true value. @@ Free-form would mean language and audience dependent and can not be included in analysis, i.e. this would be a rather specific internal annotation. Any way to improve this? Ideally a numeric value for the accuracy of measurements would be desirable! Free-form information about precision of measurement?? Precision measures how close repeated independent measurements agree with each other (but not necessarily with the true value; compare accuracy). @@ Methods should ideally be defined in Glossary entries. Or should this become free-form text? [ATTR: ref] # Derived from AbstractCharacter to be used in instance documents (non-abstract type) Extension of the common character properties with those specific to color measurements (i.e. color expressed as a color range/area, rather than as named categories). (Not yet used!) Mappings of color polygon values to categorical states An inclusive range defining a color range through color vertices forming a polygon in color space. The categorical state corresponding to the range defined in From. [ATTR: ref] Note to above: The color character above is only one example of future derivations expected, like algorithimically described shapes, molecular sequences (genome/proteome), or molecular patterns (RFLP, AFLP, etc) c) Collections of modifier definitions. Abstract base type and derived types to be used in instance documents. The ModifierSet type refers to these collections in a polymorphic way. This allows to define a collection of ModifierSet elements, each set containing multiple modifiers of a single modifier type. For categorical states. Used in concept (= 'project-wide') and local character state definitions. Any use of a character state in descriptions is a reference to an object of this type or one of its derivations. If present and true, the current state/ category allows unconstrained text not tied to a truly analytical state. Such states (which may be labeled: 'Text', 'Other:', 'none of the above, please specify:') prevent, especially if the terminology is still under development, that during data entry potentially inappropriate category must be chosen. DELTA text character are modeled using these states, but they also can occur in combination with categorical states. UnconstrainedText states are somewhat similar to the 'unknown' coding status, since the free-form text information is not available to most analytical processors (incl. identification programs). (This 2nd annotation contains detailed informations not entered in the first annotation, which is visible in the standard schema diagrams.) The name for this data element was contentious. Proposals were: Bob: IsIsolatedState with default false. Gregor: IsAnalyticalState, StateComparisonIsRecommended, or IsWellDefinedState, all with default true. ImpreciseEquality with default false? Furthermore, one may want to make a distinction between a category saying "enter free form text here" and one explicitly saying "none of the above". However, the action of choosing a separate free form text state instead of scoring a category (if available) and adding free-form note text, implies that choosing free-form text is always of the type "none of the above", whether this is explicitly stated in the text state label or not. CharacterAbstractStateDef plus a new, character-local CharacterState id CharacterAbstractStateDef plus ConceptState id, used to define generic states at concepts that can be re-used in multiple characters Character, state, and measure references Refers to a character (e. g., from within concept trees or from descriptions). It consists only of a reference to a Character definition id. ref refers to a character definition id (Terminology/Characters/Character) Refers to a character state (e. g., from descriptions). It consists only of a reference to a Character state definition id. The ref attribute refers to a character state id. A collection of state references (CharacterStateRef type) [ATTR: ref] Refers to a project-wide definition of a categorical state at a concept node Refers to a concept state (those defined within the concept tree, which may be used in multiple characters). Statistical measures: The base semantics and labels are already available through UBIF. At concepts node further elaboration may occur: a) wording and value formatting b) definition of recommended measure sets. A kind of local extension of the base definition of a statistical measure; used inside in concepts, adding, e. g., formatting information. Properties describing machine-readable partial semantics for a statistical measure. Provided to support generic application code that continues to function if additional measures are defined. Simple statistical measures not requiring a parameter (mean, variance, sample size). Statistical measures with a parameter value like confidence interval, percentile, etc. A default value for the parameter of the measure. Example: 0.95 for the upper limit and -0.95 for the lower limit of the 95% confidence interval. Format rules as used in the xslt format-number function. # = significant digits; 0 (zero) = signif. digits or insignif. leading/trailing zeros; '.' = decimal point, ',' = group separator. Note that this is NOT culture sensitive in xslt!!! - Examples: "0,0#" formats 5 / 0.59 as 5,0 / 0.59. "# ###,#" formats 5000 / 0.59 as 5 000 / .6. (Rules for exponential formats or percent may be added in later versions of SDD!) @@ (to be deleted if the simple pattern approach above is sufficient!) @@ This or a format string ?@@ @@ This or a format string ?@@ @@ This or a format string ?@@ Note: How can we handle measures as well as values from repeated observations (samples) with the same mechanism? When mapping numerical ranges to categorical states (essentially creating a histogram), several methods which statistical measures are used for the mapping are possible. Using the central value compares a point with the mapping range, whereas using ranges or extremes results in a comparison of two kind of ranges for overlap. Only the central value method can guarantee an unambiguous partitioning into categories. However, the ranges or extremes methods may be desirable because of their improved error tolerance. Central measure -- The first central measure encountered (mean, median, mode) is used as the basis of comparison. If none is found, but ranges or extremes are present, a central value is calculated based on the these. Ranges -- Any ranges that are not the extremes (quantile, percentile, confidence interval, mean plus/minus s.d., etc.) is attempted to use for comparison. If none is found, Extreme values are used. Extremes -- The extreme range values (= minimum and maximum) are used as the basis of comparison. d) Group combining the derived character types into a polymorphic structure (options are an explicit choice or the use of base type plus xsi:type). Character definition: Point for type polymorphism (multiple derived types in place of an abstract base type) Collection of character definitions; requires an xsi:type specification in instance documents. Character definition: Point for type polymorphism (multiple derived types in place of an abstract base type) Non-abstract types derived from AbstractCharacter. In OO programming, a polymorphic collection of the base type may be used! ---------------- The following types are used in the descriptions to code data by reference to characters, states, and modifiers defined in the Terminology. a) abstract and non-abstract derived types used in coded descriptions Note: The non-abstract derived types are to be used in instance documents. The type names have been shortened to simplify the appeareance in instance documents, especially if an xsi:type would be used (Char xsi:type='CatSummaryData'). Abstract base type. Used in CodedDescription/CodedData/Char to make statements for a single character in a class or unit. [ATTR in CharSummaryData base type:] ref (= to char. definition) origin (= enumeration; data may be original data or derived from other sources like calculation, mapping, aggregation/ generalization, inheritance @@Is there a better name for 'origin'? Character modifiers, modifying the all categorical states, statistical measures, etc. collectively. A character may occur multiple times in a description with different modifiers ('in winter/summer', 'at base/tip', etc.) or origins (e. g. from samples). If origin='Calculated' and data are based on a specific sample that is present within the description, this sample may be identified here. [ATTR: ref] Media specific to the character and the current object or class described. Example: microscopic picture of spore shape in a specimen. Coding status values like Inapplicable, unknown, etc.; may have a free-form Note, but not modifiers. [ATTR: ref] Note: In a unit (= specimen) description this should be an alternative to categorical or numerical data and limited to 1 status value per character and (not enforced by schema). However, for classes (e. g., a genus) it is up to the aggregation/generalization process whether to create multiple status values ("unknown or not applicable") or not. Public notes or comments on the entire character statement, i. e. all status values and states, measures (depending on type), etc. together. Multiple languages are supported. Applications may, e. g. report the text in brackets after all other data. The audience values must uniquely identify the Representations within each Note element (no duplicate audience allowed). Provenance of value/state. The current data may be original data or may be cached information derived from other sources. The origin of the derivation may be a calculation, a mapping, an aggregation/generalization (class hierarchy, from below), or an inheritance (class hierarchy, from above). # Derived from abstract CharSummaryData to be used for categorical (char. state) data in instance documents (non-abstract type) Type-specific extension of the base character data type. States are 'scored' in a description by referring to a state defined in the current character. [ATTR: ref] Distinguishes different types of state collections. 'AndSet' and 'OrSet' define state distributions that are not explicitly ordered in instance documents. Applications may reorder states using the state order defined in Terminology or state frequency values/ranking. For the corresponding 'AndSeq'/'OrSeq' the sequences of states in instance documents defines the preferred order of states (distinguishing, e. g., between 'round or elliptic' and 'elliptic or round'). WithSeq expresses a specially worded form of 'AndSeq'. With 'Between' the scored states form a range around the true value ('orange' to 'red'). # Derived from abstract CharSummaryData to be used for numerical (statistical measures) data in instance documents (non-abstract type) Type-specific extension of the base character data type. Refers to a measurement unit like mm, µm, °C, defined in Terminology/General. If missing, the 'recommended measurement unit' declared in the character definition is to be assumed. Note: although each data item may use different units, they should all be compatible and convertible (like °F/°C/°K, ml/mm3). This is not controlled by the schema! # Derived from abstract CharSummaryData to be used for numerical (statistical measures) data in instance documents (non-abstract type) An inclusive range defining a color range through color vertices forming a polygon in color space. d) Group combining the derived character types into a polymorphic structure (options are an explicit choice or the use of base type plus xsi:type). Character reference in CodedDescriptions: Point for type polymorphism (multiple derived types in place of an abstract base type) (The element sequence in instance documents is not informative and may be changed at any time.) [ATTR: ref] Within a single coded description each character state reference may occur only once, i. e. it is not possible to state "flowers blue, or blue". (The uniqueness constraint must involve the id of the description, otherwise no two descriptions could use the same character! However, the character reference is not necessary, since state ids are unique across characters.) Note that this still allows repeated occurrence of character states in the Sample containers. Within a single coded description and within each character, a coding status reference may occur only once, i. e. it is not possible to state "not applicable, or not applicable" (it is possible to state "unknown, or not applicable"). (The uniqueness constraint must involve the id of the description, and the character id, since coding status values are not defined globally for all characters. Character reference in CodedDescriptions: Point for type polymorphism (multiple derived types in place of an abstract base type). In object-oriented programming the following choice should be replaced with a polymorphic design, using a collection of the common base type! (The sequence of elements in instance documents is not informative and may be changed at any time) [ATTR: ref] Within a single coded description each character state reference may occur only once, i. e. it is not possible to state "flowers blue, or blue". (The uniqueness constraint must involve the id of the description, otherwise no two descriptions could use the same character! However, the character reference is not necessary, since state ids are unique across characters.) Note that this still allows repeated occurrence of character states in the Sample containers. Within a single coded description and within each character, a coding status reference may occur only once, i. e. it is not possible to state "not applicable, or not applicable" (it is possible to state "unknown, or not applicable"). (The uniqueness constraint must involve the id of the description, and the character id, since coding status values are not defined globally for all characters. [ATTR: ref] Within a single coded description and within each character, a coding status reference may occur only once, i. e. it is not possible to state "not applicable, or not applicable" (it is possible to state "unknown, or not applicable"). (The uniqueness constraint must involve the id of the description, and the character id, since coding status values are not defined globally for all characters. [ATTR: ref] Within a single coded description and within each character, a coding status reference may occur only once, i. e. it is not possible to state "not applicable, or not applicable" (it is possible to state "unknown, or not applicable"). (The uniqueness constraint must involve the id of the description, and the character id, since coding status values are not defined globally for all characters. a) abstract and non-abstract derived types used in sample data Abstract base type. Used in CodedDescription/SampleData/ Sample/SamplingUnit. [ATTR: ref (to def. of character)] # Derived from abstract CharSampleData to be used for categorical (char. state) data in instance documents (non-abstract type) States are 'scored' in a description by referring to a state in the character definition. All notes and modifiers are applicable to this element. [ATTR: ref] # Derived from abstract CharSampleData to be used for numerical data in instance documents (non-abstract type) in coded descriptions (Sample/ SamplingUnit). [ATTR: value (xs: double, a directly measured/observed value. Not for statistical measures; these cannot occur in sampling units)] Public notes or comments, for multiple languages. Applications may, e. g., report the text in brackets after the value. The audience values must uniquely identify the Representations within each Note element (no duplicate audience allowed). A single value of a single measurement for a character in a sampling unit. This may not be used for ranges, minimum, mean, etc., which cannot possibly occur on sampling units. Significant figures. 1.300 has 4 significant figures, 72000 may have 2, 3, or more significant figures. # Derived from abstract CharSampleData to be used for ColorRange data in instance documents (non-abstract type) An inclusive range defining a color range through color vertices forming a polygon in color space. d) Group combining the derived character types into a polymorphic structure (options are an explicit choice or the use of base type plus xsi:type). Character reference in SampleData: Point for type polymorphism (multiple derived types in place of an abstract base type) (The sequence of elements in instance documents is not informative and may be changed at any time) [ATTR: ref] Within a single character inside a sampling unit each character state reference may occur only once. Character reference in SampleData: Point for type polymorphism (multiple derived types in place of an abstract base type) In object-oriented programming the following choice should be replaced with a polymorphic design, using a collection of the common base type! (The sequence of elements in instance documents is not informative and may be changed at any time) [ATTR: ref] Within a single character inside a sampling unit each character state reference may occur only once. [ATTR: ref, value] Within a single character inside a sampling unit each character state reference may occur only once. [ATTR: ref] c) types used inside the CharSummaryData-derived types A categorical state including frequency, state modifier, and Notes Similar to StateData, this one is intended for CodingStatus references. It support notes, but no modifiers! [ATTR: ref] Public notes or comments, for multiple languages. Applications may, e. g., report the text in brackets after the status value wording. The audience values must uniquely identify the Representations within each Note element (no duplicate audience allowed). (Compare discussion in StateData) Measure references and values in CodedDescr.: Point for type polymorphism (multiple derived types in place of an abstract base type) --- Summary statistics (univariate statistical measures) like distribution parameters, sample size, etc. Two alternative types (with/without parameter) may occur in any sequence. (The sequence of elements in instance documents is not informative and may be changed at any time.) (The element sequence in instance documents is not informative!) Abstract measure reference, applying a modifier to a descriptive statement. In instance documents a specific derived type must be given (e. g., xsi:type = 'UnivarStatMeasureData'). [ATTR: ref] --- Individual measures have no separate Modifiers/Notes. However, a numerical character may occur multiple times in coded descriptions, e. g., to separately express width at base and at center. Measure references and values in CodedDescr.: Point for type polymorphism (multiple derived types in place of an abstract base type) --- Summary statistics (univariate statistical measures) like distribution parameters, sample size, etc. Two alternative types (with/without parameter) may occur in any sequence. (The element sequence in instance documents is not informative and processors may reorder it.) (The element sequence in instance documents is not informative!) Simple measures like mean, variance, or sample size. [ATTR: ref, value] Statistical measures like confidence interval or percentile, expressed using an additional parameter par. [ATTR: ref, par, value] --- The ref attributes in both types point directly to enumerations in UBIF (UnivarStatMeasureEnum/WithParam). An elaboration for measure definitions is supported at concept nodes but optional. --- Individual measures have no separate Modifiers/Notes. However, a numerical character may occur multiple times in coded descriptions, e. g., to separately express width at base and at center. d) abstract and non-abstract derived types used in natural language descriptions. Lacking multiple inheritance mechanisms in xml schema, these Markup versions have been derived independently. They are designed to be closely related to corresponding types in the coded description, however. Abstract base type. Used in NaturalLanguageDescriptions. Note: although Text and CodingStatus scoring is common to all derived types, it can not be defined here. The markup of natural language should follow the original text sequence and type derivation would impose an xml schema sequence constraint. # Extends the abstract CharacterMarkup for use with categorical (char. state) data Inapplicable, unknown, etc. It may have an associated Note, but no modifiers. [ATTR: ref] Character state data permitting Text elements within. [ATTR: ref] Text related to a specific state that is not covered by either the state definition or modifiers. When converting NLD to coded descriptions, this will become a free-form text note. Any text within a char. that has not yet been identified as one of the following elements. [ATTR: parsed; normally = false] # Extends the abstract CharacterMarkup for use with numerical (statistical measures) data as well as a list of sample measurement values. Inapplicable, unknown, etc. It may have an associated Note, but no modifiers. [ATTR: ref] A univariate statistical measures like mean, variance, or sample size. [ATTR: ref, value] A univariate statistical measures like confidence interval or percentile, expressed using an additional parameter. [ATTR: ref, par, value] The value is stored in an attribute of type double. The original text of the value may follow inside in the optional Text element. Note that the string in text will usually use a different number format than the English format required by xml [ATTR: value] Text related to a sample value. Any text within a char. that has not yet been identified as one of the following elements. [ATTR: parsed; normally here = false] (A "ColorRangeMarkup" is not supported at the moment, since color polygon measurement data embedded in natural language descriptions are not known!) d) Group combining the derived character types into a polymorphic structure (options are an explicit choice or the use of base type plus xsi:type). Character reference in NLD: Point for type polymorphism (multiple derived types in place of an abstract base type) During NLD parsing usually the states are initially recognized. However, character markup can always be deduced from the associations between char. and states defined in the terminology. [ATTR: ref] Character reference in NLD: Point for type polymorphism (multiple derived types in place of an abstract base type) In most cases initially the states are recognized, but character markup can always be deduced from the associations between char. and states defined in the terminology. [ATTR: ref] [ATTR: ref] The following NLD type refers to concept nodes and has no corresponding types in SummaryData/SampleData: Used in NaturalLanguageDescriptions. Refers to concepts (i. e. nodes defined in concept trees) This is necessary if markup is incomplete. [ATTR: parsed (= should implicitly be false, not modeled!)] c) types used inside the CharacterMarkup types Variant of CodingStatusData to be used inside the NaturalLanguageDescription markup container. Additional information regarding the coding status The text representing the coding status information itself Variant of StateData to be used inside the NaturalLanguageDescription markup container. Text related to a specific state that is not covered by either the state definition or modifiers. When converting NLD to coded descriptions, this will become a free-form text note. Similar to ReportedNote in coded descriptions, but without the Representation/language layer. Used to markup parts that do not correspond to the terminology and shall remain free-form text when a conversion to coded description is attempted. For single values (singleton observation or values in a sample). Ultimately this should contain only the value itself. Text related to a sample value. A single value of a single measurement for a character in a sampling unit. This may not be used for ranges, minimum, mean, etc., which cannot possibly occur on sampling units. (Used inside Quantitative markup) (Used inside Quantitative markup) -------------------------------- END Characters and dependent objects (states, statistical measures) --------------------------- Concept tree and node definitions Defines an entire concept tree (which may be a single tree node containing a flat list) Label to identify the current object in the user interface All Representations within a Label must have different language values. Concepts describing the entire tree using a constrained vocabulary to support application interoperability The type of a tree is constrained to an enumerated list to support application interoperability. Usage of concept tree that is intended by its designers; constrained to an enumerated list to support application interoperability. Important Roles are InteractiveIdentification, NaturalLanguageReporting, or Filtering. Trees that have no value are implicitly visible in applications only when designing the terminology! They are not normally shown to consumers of descriptive data. Each enumerated role value may be listed only once. True if the intention of the designer of a concept tree is that all characters should be included in the tree. A terminology editing application may use then warn about missing characters or directly offer inserting newly created characters in all such trees. @@Only a placeholder for discussion! Many concept trees, especially those defining structures are specific to taxa! Since taxa are generalized to classes in SDD, this should not be called "TaxonomicScope". However, "ClassScope" seems to be very confusing as well. The root node of the tree. Note that it has a label in addition to the tree label. The tree label uniquely identifies a tree when selecting it among a list of all trees, whereas the root node label can be very short and is shown when a single tree is displayed. [ATTR: id] A node in a concept tree. Concepts may be basic properties (color, shape, texture), structural types (fruit types), methods (naked eye, hand lens, microscope) or other hierarchical generalizations that can be applied to characters (e. g., relative region: tip versus base of structure) Tree nodes may remain unlabeled! All Representations within a Label must have different language/audience values. A set of project-wide state definitions tied to the part (e. g., for fruit: capsule, berry, nutlet, ...), property (e. g., for color: red, green, ..., for shape: round, ovate, ...), method, etc. described in the current concept tree. ConceptStates become operational for descriptions only when a StateReference has been added in a specific character. The definition of concept states is identical to the local definition of states within a character. Using concept states simplifies the management of terminology and improves data analysis (states from different characters can be compared if they refer to identical concept states). [ATTR: id] The labels of concept state definitions are required to be unique within each concept state set (i. e. at a node in the tree) and audience definition. If a concept state has been added to the set above, a state ref. should immediately be added to characters listed here. This occurs not through schema mechanisms, but is a contract with SDD applications. Only applications modifying concept states/measures sets or the Character references herein need to fulfill this contract. [ATTR: ref] References to project-wide defined ConceptStates set (i. e. nodes within concept trees) must be unique within each character. Inheritable def. automatically apply to all characters/concepts starting at this node. The modifiers contained in the listed modifier sets are considered applicable to all characters placed in the current branch of the concept tree. Note. In descriptions, all modifiers are prinicpally valid in all characters. However, editing tools are expected to offer only the recommended modifiers defined here. Reference to a set of modifiers. The element sequence in instance documents is not informative! [ATTR: ref] A set of univariate statistical measures (e. g., mean, min, max, s. d., sample size) considered applicable to all numerical (sic!) characters placed in the current branch of the concept tree. In descriptions, all measures are valid in all numerical characters. However, editing tools are expected to normally offer only the recommended measures defined here. In addition to listing measures, these elaborations provide for improved report generation (wording, value formatting). Note: Statistical measures applicable to ordinal or nominal data (min, median, mode, etc.) are yet supported, since they can easily be calculated ad-hoc from the frequency distributions of categorical states that are supported. Within each description, the applicability of all characters references within the branch or the tree starting with the current concept may optionally be governed by rules depending on the presence of categorical states in the same description. Note: rules for individual characters (rather than a set) can be defined in the terminal nodes. By default the characters below this node are inapplicable. They become applicable if any of the listed controlling character/state combinations is present in a description. Modifier references must be unique within each set (but different sets may contain the same modifier) By default the characters below this node are applicable. They become inapplicable if any of the listed controlling character/state combinations is present in a description. Modifier references must be unique within each set (but different sets may contain the same modifier) Meta information, rating characters under various aspects. Intended to guide a best- next character algorithm. A node either contains other nodes, or contains a single character reference. It may also be empty to decouple the definition of hierarchies (e. g., a complete part hierarchy) from characters defined at a given moment. Element may be missing, which results in the option to have empty nodes with neither a character nor further nodes. [ATTR: id] Characters are the 'leaves' of the tree. Each character is embedded in a node providing labeling information in the context of the current tree (which is usually different from the default character label). A single character may appear in several places in the tree, if this is desired. [ATTR: ref] Concept tree and node references Refers to an entire concept tree Refers to a node in a concept tree (Terminology/ConceptTrees/ConceptTree/...) Refers to a node in a concept tree (e. g., to refer to a set of concept states defined at this node) Refers to a node in a concept tree (Terminology/ConceptTrees/ConceptTree/...) ==== TERMINOLOGY END === ==== DESCRIPTIONS START === Descriptions are either natural language with optional markup or coded descriptions. Both are derived from the same base type: Abstract base type for NaturalLanguageDescription and CodedDescription. The id attribute is currently not used in keyrefs from within this schema. However, it is considered generally useful to uniquely identify descriptions in federated situations. Subject of the description is either an abstract class (e. g., a biological species) or an individual object or unit (e. g.,a specimen). Refers to a class name (= in biology a taxon name) [ATTR: ref, @@check classifier design: add. attributes?] Refers to an individual physical object (e. g., a biological specimen). This may refer to observed objects as well as to collected and preserved objects. The identification (= a class name) is defined in the ExternalDataInterface/Units list. [ATTR: ref] A description may be further defined through a published data source for the nat. language or coded description. If Citation is missing, it is assumed that the compiler or editor of the data is the original source of information. A description may have a limited geographical scope, if geographical variability is know to exist or is expected. @@Should we define additional scopes for the description, e. g., host plants for pathogens, or should be simply provide a free-form text element like this? @@Also compare the Scope/GeographicalScope and Scope/SourcePublications structure in the metadata for the entire dataset. This should probably be reflected here! Creators, Revision status, and dates of individual description (compare RevisionData in Metadata) Contains resources like images that are not specific to a character (else add them to character elements below). Each media resource may occur only once in the collection. Defines a id for a coded or natural language description Descriptions entered as free-form text with optional (and potentially incomplete) markup referring to concepts (= char. tree nodes), characters, and states as defined in the terminology. Retains the full, unchanged original wording of the natural language description. Concept, character, or state markup may be added (partial or complete), but these should not change the original wording sequence. In contrast to CodedDescriptions, no uniqueness constraints are defined. Characters may occur multiple times and even a state may occur multiple times within a character (e. g., if differences are treated by by modifiers in the terminolgy, but in the NLD other states occur between them. The element is optional to allow generation of empty descriptions and to support descriptions with media resources alone! (The element sequence in instance documents is informative!) This is necessary if markup is incomplete. [ATTR: parsed (= should implicitly be false, not modeled!)] Markup of concepts above the character level, e. g., organism parts or methodological sections. [ATTR: ref] Coded description data are highly controlled by the vocabulary and structures defined in the Terminology, using references to characters, states, modifiers, numerical values for measurements. They also support a limited amount of free-form text (in Notes or Annotation only). Separating data and terminology allows rearranging and refactoring the terminology, multilingual support through central terminology translations, and multiple hierarchical views. Coded descriptions must fulfill more rigorous consistency requirements than natural language descriptions and are more suitable for analysis. Furthermore, language-dependent annotations are minimized so that data can be easily reorganized and translated into multiple languages. Summary data for aggregated or summarized data (using statistical measures, state distributions, etc.). The element is optional to support descriptions containing only sample data or media resources. Note: Characters are NOT required to have unique ref attributes! Data for one character may be recorded with different modifications (in spring/autumn, at tip/base). Raw sample data are recorded here. The analysed and generalized (e. g. using statistical measures) results are normally also reflected under SummaryData (with origin='calculated' and BasedOnSample identifying a sample ID. (The sequence of Sample elements in instance documents is not informative and may be changed at any time!) A container for direct ('raw') measurement results in a study. All sampling observations are assumed to be made under identical conditions. Descriptions may contain an unlimited number of Samples. [ATTR: id, random (= is random sample, if false sample may or may not be biased)] [Currently not used!] Refers to either a natural language or coded description Refers to a specific sample (//CodedDescription/SampleData/Sample) A special subtype of CodedDescription are original sampling data, which are organized into referable Sample containers: A container for a sampling, with repeated sampling units, each of which may record multiple characters that are observed together. Public notes on the sample (circumstances, etc.) that are not already identified in the description header. Multiple languages are supported (although rarely required). The audience values must uniquely identify the Representations within each Note element (no duplicate audience allowed). Optionally a fully or partial date/time of start and end of the sampling event may be recorded. A sampling unit may be an individual organism, a leaf of a tree, a piece of tissue, etc. In each sampling unit multiple characters may have been observed together ('paired observations'). Example: 'leaf shape, length, and width' of a single leaf). Value frequencies (e. g., '2.3': observed 4 x) are not supported; they are useful when only a single character variable is supported, but complicate paired observations unnecessarily. Char. values with a frequency should be entered in repeated SamplingUnits. (The sequence of SamplingUnit elements in instance documents should be preserved. It has no analytical semantics, but it may be relevant if data entry is compared with the source.) Within a single SamplingUnit each character reference may occur only once. Multiple SamplingUnits in an Sample may and should use the same characters. (The latter, i. e. consistent presence of characters, is not validated by the schema.) Defines a id for a sample. This is used when analysis data in coded description are based on a specific sample If true, the sample is a random sample. If false, the sample may or may not be biased. Refers to a specific sample inside CodedDescriptions Refers to a specific sample (//CodedDescription/SampleData/Sample) ==== DESCRIPTIONS END === ==== IDENTIFICATION KEYS START === Stored identification keys (esp. manually designed as opposed to automatically generated) are stored in a separate section: Defines a stored identification key (dichotomous or multifurcating key) that has been digitized from printed publications or manually created to express expert knowledge that would not be available in dynamically created dichotomous keys (using Ratings from terminology and a 'find next best character' to minimize the average search tree). Label to identify the current object in the user interface. All Representations within a Label must have different language values. If the key is derived from a published data source this is cited here. If Citation is missing, it is assumed that the compiler or editor of the data is the original source of information. A description may have a limited geographical scope, if geographical variability is know to exist or is expected. @@Should we define additional scopes for the description, e. g., host plants for pathogens, or should be simply provide a free-form text element like this? @@Only a placeholder for discussion! Unit/Class descriptions do not have taxonomic scope, only the project. An identification key could have it, but on the other hand it could also be inferred from the taxa contained in the key nodes!!! Creators, Revision status, and dates for this key. The root node of the stored identification key. Note: Applications will generally ignore the Statement element in the root node when the key is selected as a whole. However, if a key shall be used both as independent key and as a branch node in another key, Statement must be defined. In both cases CodedStatements may be used to define statements that are applicable to the entire key (i. e. they are implied in the selection of the key). [ATTR: id] A node in a stored identification key, containing the lead statement to follow and optionally the next question, or terminating at class identification, subkey, or node reference. The id attribute for nodes in a stored key is required because an xs:key constraint exists on this attribute. It seems impossible in xml schema to make existence of ids optional but require those present to be unique and the target of keyrefs that point to these existing keys. If the user agrees with the statement (expressed as free-form text), then the node will be followed. (The audience-specific representations provide abbreviations, which in picture keys may be used as alt-text of the image. ExportToken will usually not be used, but a separate type seemed to be unnecessary.) The audience values must uniquely identify the Representations within each Statement. Statements in coded terminology that are equivalent to the Statement text. This information may be used to switch between stored identification keys and interactive identification (multiple entry keys). Boolean statements like 'calyx black or petals white or cream (but not yellow)' can be expressed. Within CodedStatements, each state reference may occur only once. A node contains either further nodes (= Leads), a single reference to another identification key or key node, or a class reference (biology: a taxon) as the result of an identification. Optional question that is answered by the Statement elements in each of the Leads below. Note that in most traditional keys the question is empty and only the alternative statements are written. The audience values must uniquely identify the Representations within each QuestionText. The set of alternative lead statements (which may be answers to QuestionText) At least two alternatives leading further on in the key must be provided. This element defines the tree recursively. Refers to a class name (in biology a taxon name) [ATTR: ref, @@check classifier design: add. attributes?] Refers to another stored identification key in the Keys section. This feature allows cross references between keys. [ATTR: ref] Refers to arbitrary identification key nodes within the current or other keys, to allow building reticulations into the key. @@ This may need further discussion and testing! Allowing to jump into other keys requires the leads (=node) key to be unique across all keys, not only within a key!@@ [ATTR: ref] Refers to an entire stored identification key (e. g., if a key is referenced as a subkey from within another key) Refers to the key attribute of an entire stored identification key Refers to a node in a stored key (e. g., for reticulating keys) Refers to a node in a stored identification key Boolean combination of states are currently supported only in CodedStatements inside IdentificationKeys. A discussion about whether this is generally desirable is encouraged! boolean operators, modeled after usage in MathML unary boolean function n-ary boolean functions (inclusive or, 'and/or') (exclusive or, 'either/or') Choice of state, measure, recursion. The 'group' schema model has been used because of different multiplicity of unary and n-ary functions apply is a kind of bracket around a function, the (boolean) function is defined by the first element inside. Modeled after MathML2. @@we have to discuss, whether these should be full coded data types (including modifiers) or not States are 'scored' by referring to a state in the character definition. All notes and modifiers are applicable to this element. [ATTR: ref] ==== IDENTIFICATION KEYS END === ==== Other basic types used by SDD (compare also the types used by UBIF) Used in descriptive data (not in terminology): Collections of states in instance documents may be ordered (sequence) or unordered (set), and may be connected with 'and', 'or', 'with', or 'between'. Since set/sequence and operators are dependent on each other, the two aspects are combined into a 'model' enumeration Unordered set of states, combined with 'or' -- Multiple states scored for a character in a description form a set. The order of states has no special meaning and may be changed. In natural language output the states should be combined with 'or' to express that in individual objects (that belong to the class that is being described), the states may occur together or alone. Ordered sequence of states, combined with 'or' -- Multiple states scored for a character in a description form a sequence, i. e. the state order carries some semantics and should be preserved in output. The sequence semantics is not explicitly defined, but intelligable to human consumers and presumably relates to some concept of relevance or importance. In natural language output the states should be combined with 'or' to express that in individual objects (that belong to the class that is being described), the states may occur together or alone. Unordered set of states, of states, combined with 'and' -- Multiple states scored for a character in a description form a set. The order of states has no special meaning and may be changed. In natural language output the states should be combined with 'and' to express that in any individual object (that belong to the class that is being described), the states will always occur together. Example: two colors that occur together in a pattern. Ordered sequence of states, combined with 'and' -- Multiple states scored for a character in a description form a sequence, i. e. the state order carries some semantics and should be preserved in output. The sequence semantics is not explicitly defined, but intelligable to human consumers and presumably relates to some concept of relevance or importance. In natural language output the states should be combined with 'and' to express that in any individual object (that belong to the class that is being described), the states will always occur together. Example: a black part with small red markings, is more appropriately described as 'black and red' than 'red and black'. One state occurring together with others of secondary relevance. -- This is a special case of AndSeq, and in many circumstances (except natural language generation) may be treated as AndSeg. Example: "Green with brown" (often this may be two characters, e. g. base color and dot color). True value lying between (usually two) states -- Example: "Between oval and elliptic" = "Oval to elliptic". Defines the type of a concept tree (list of enumerated values to support application interoperability). Categorizing characters into basic property types (e. g., color, 2-dim. shape, 3-dim. shape, surface texture, taste, smell, behavior, physiology, measurements, etc.) greatly improves the analysis and management of larger character sets and is therefore recommended. [@ Note: Only a single concept tree should have this hierarchy type. (not enforced in schema, how can it be enforced? Other types occur multiple, i. e. one cannot make a UNIQUE statement on attribute! @] A hierarchy that organizes characters by observation method or instrumentation, e. g., field observation, light microscopy, electron microscopy, molecular methods, culture techniques, etc. A hierarchy that organizes characters by a morphological or anatomical "contains" or "part-of" hierarchy: plant = root/stem/leaf, leaf = base/stipules/petiole/lamina, etc. A hierarchy that organizes structural parts in a kind-of hierarchy (e. g., a 'teliospore' is a kind of 'spore') Used for concept trees that fall into none of the categories property, method, part. Such trees may be intended only for internal purposes (e. g., defining dependency rules) or for browsing by the user. PresentationTable concept trees are small sets of a usually a few characters that allow to display data in a tabular arrangement. It is possible to define tables in more than 2 dimensions. By default the innermost dimension is considered cells in a row, the next rows in a table. Any further dimension may be displayed as multiple 2-dimensional tables one below the other. However, applications may also offer a browser based on pivot tables. - Note: Trees of type PresentationTable should not be offered in the user interface when selecting a browsing tree. A concept tree of type "SubsetFilter" is intended only for the purpose of filtering characters. It will often be a flat list of characters. Applications should not offer it as a choice when the user selects a hierarchy for displaying or reporting purposes. Note that conversely, the filter selection dialog in applications should not be restricted to trees of type SubsetFilter. Any concept tree, including part, method or property hierarchies may be used as a filter to define character subsets. Defines the intended roles that a designer may assign to a concept tree (list of enumerated values to support application interoperability). Setting this value in a concept tree is a recommendation to applications with a user interface to offer this tree for editing the description data set (the application may, however, enable the user to select any concept tree). Setting this value in a concept tree is a recommendation to applications with a user interface to offer this tree for building stored identification keys (e. g., dichotomous keys). Setting this value in a concept tree is a recommendation to applications with a user interface to offer this tree for interactive identification. Setting this value in a concept tree is a recommendation to applications to use this for creating a report of the character terminology. (Note that no TerminologyEditing value is defined; all concept trees should be available when designing the terminology. However, the tree marked as TerminologyReporting may be used as the initial editing view.) Setting this value in a concept tree is a recommendation to applications to offer this tree for natural language reporting. Setting this value in a concept tree is a recommendation to applications to offer this tree for filtering purposes. Some trees are explicitly (separately) typed as being intended exclusively for filtering/subset definition; but many trees are useful for filtering purposes. Defines the origin of data that may have been entered, calculated, aggregated or inherited The data are directly entered by a machine or human agent. These are the original data all other cached data (Origin unequal 'OriginalData') are based upon. The data are calculated from other data using a calculation rule. Examples: a ratio calculated from other characters, a mean calculated from a sample that is available under SampleData/Sample (if a mean is calculated from data no longer available, it would be recorded as 'OriginalData'). The data are calculated from other data based on a mapping definition (either from numeric to categorical, or from fine-grained categorical to coarse-grained categorical. The data are derived from data in classes placed below the current class in the class hierarchy. This applies both to aggregating data from objects to classes, as generalizing lower classes to higher classes. Note: BioLink calls this 'Compile from below'. The data are derived from data in classes placed above the current class in the class hierarchy. Defines the origin of concept/character ratings. Similar to DataOriginEnum, but fewer enumerated values. The data are directly entered by a machine or human agent. Concept ratings may inherit from ratings at higher concept nodes, and character ratings may inherit from all concept nodes they belong to (possibly in multiple concept trees). A rating of 1 (low) to 5 (high), with 3 as central value, plus indication whether inherited (= calculated based on related definitions) or defined directly. inherited = inherited from a concept parent. Concept ratings are inherited to all further concepts and to all characters in a branch of the concept tree. A collection of ratings to rate the consistency, etc. of a character or concept. Relevant during interactive identification to rank the remaining characters for discriminative power and convenience. How convenient is the character or concept for identification? [ATTR: rating, origin] How available is the character or concept for identification? The rating should be low if it is only available at a short time during the life on a object, or only expressed with low frequency in populations. [ATTR: rating, origin] How reliable is the character or concept for identification? This should include both variability of values and variability in scoring the objects. [ATTR: rating, origin] How convenient is the character or concept for identification? [ATTR: rating, origin] MinimumExpertiseLevel: the designer of the concept tree expects the user to have a certain minimum expertise level. [ATTR: origin] inherited = inherited from a concept parent. Concept ratings are inherited to all further concepts and to all characters in a branch of the concept tree. Formatted text with an additional attribute "parsed". Used for Text elements in the NaturalLanguageDescription container. The following 4 types define a base element (which may later carry BlankBefore/BlankAfter attributes if this should be necessary) and variants of wording definitions for natural language report rendering. These types are used exclusively in the Audience-specific LabelPlusWording1-3 container types. A text element used to define wordings for natural language output. Currently the handling of blanks is assumed to be through leading and trailing blanks present in character content. If this should not work due to automatic trimming, the type may require two optional attributes like BlankBefore / BlankAfter of type BooleanTripleState. Currently the type is simply a synonym of StringWithFormatting is, but this may later be changed! Natural language wording for elements without content (= 'SimpleWording'). Wording for elements that have no further children in the natural language wording tree, e. g., char. states. Natural language wording for container elements with non-repeated content (e. g., modifiers around states) (= 'ContainerWording') Wording output before the contained elements. For characters this is the main character wording that is output before the states. (Optionally both before and after may be present) Wording output after contained elements. In the case of a character this is the wording after all states, or after numerical data and after a measurement unit where present. Natural language wording for elements with repeated content like characters that contain multiple modifiers + states. (= 'Array-' or 'ContainerWording') Normally the delimiters defined in the language rules will be used. However, they can be overridden here. Natural language wording for operators (and, or, with, to, etc.). Text used for operator unless the condition in IfNextElement is fulfilled. Contains 2 attributes, containing blank-separated lists of multiple starting patterns for next element. Example: In Spanish 'y' becomes 'e' if next word starts with 'i' or 'hi', but not 'hie'. Use 'i hi'/'hie' for StartWith/ButNotWith to define this. Text used if condition is fulfilled. @@ check later whether still necessary! This delimiter is used if only 2 elements are present. Examples: en: ' or ', de: ' oder ' If 3 or more elements are present, this delimiter is used between all elements, except before the last element. Examples: en: ', ', de: ', ' If 3 or more elements are present, this delimiter is used between the second-but-last and the last element. Examples: en: ', or ', de: ' oder ' The following types are audience-specific (i. e. they refer by a ref mechanism to audiencekey values). Note that some types are used only a single time, but it was thought more transparent to define all audience-specific collections and representations through types rather than make this dependent on the frequency of use. A label = collection of audience-specific label representations (without abbreviations or natural language reporting wordings). Used, e. g., for concept trees or modifier sets. Audience-specific simple label representation (= without abbreviations or natural language reporting wordings) [ATTR: audience] Audience-specific label representations (without abbreviations or natural language reporting wordings). Used, e. g., for concept trees or modifier sets. Text of the normal label, intended for screen display or reports that accommodate unabbreviated labels. Label (incl. abbreviations) Audience-specific label representations (incl. abbreviations) [ATTR: audience] Audience-specific label representations (incl. abbreviations) Restricted to 50 characters maximum length, including blanks (recommended to be much shorter!). Label abbreviations are especially important when displaying information in a tabular format. When missing, applications may abbreviate the label (this may create duplicates, however). Highly constrained version of the label (max. 12 characters, only uppercase letters, no blanks). Defined to support exports to formats requiring very short and simple names or labels, especially phylogenetic or statistical analysis software like NEXUS or SAS. Small multimedia resource to be displayed in addition to the label. It should be quicly recognized and will usually not be informative enough to base decisions on it alone. Example: in a concept tree a leaf icon image is provided for the node containing leaf characters. [ATTR: ref] A set of representative multimedia resources that convey the meaning even when used without a Text representation (but applications may choose to combine text and media). Example: display shape images to select a state during identification. If more than one resource is defined here, the assumption is that they will normally all be consumed before making a selection. The size of the resource should be sufficiently concise to view ca. 6 images from different labels concurrently on the intended output device, or listened to ca. 6 audio extracts before making a selection. - Both Icon and these selectors resources are audience-specific (e. g., image with abbreviation, bird-call with spoken text). [ATTR: ref] All Representations within a Label must have different language values. Label (incl. abbreviations and a single wording) Language/audience-specific label representations (incl. abbreviations and a single nat. lang. reporting wording ) [ATTR: audience] Extends LabelPlusAbbreviationRepr with a single wording element. Label (incl. abbreviations and a wording before and after the contained elements) Audience-specific label representations (incl. abbreviations and wording for natural language reports) [ATTR: audience] Extends LabelPlusAbbreviationRepr with a wording before and after the contained elements. Label (incl. abbreviations and a wording text before, after, and between the contained elements) Audience-specific label representations (incl. abbreviations and wording for natural language reporting) [ATTR: audience] Extends LabelPlusAbbreviationRepr with a complex wording element. Used in concept tree nodes and character references. Allows to define a text before, after, and between elements; used during natural language reporting. Container for multiple audience-specific representations of a (publicly reported) Note as text (optionally with basic formatting). Used, e. g., inside state, statistical measure, coding status, etc. references in descriptions. [ATTR: audience] Audience-specific representation of a (publicly reported) Note as text (optionally with basic formatting). The type provides an audience reference in an attribute. [The presence of the (seemingly superfluous) text element has two advantages: 1. Cleaner typing; adding an audience attribute directly to FormattedSimpleText type would require multiple inheritance. 2. In nat. language markup, Text surrounds all verbatim text. Retrieving all Text content retrieves the original text prior to markup.] --- Abstract base type for some vocabulary definitions: Abstract base type used to derive concepts in Terminology/General and Terminology that require only a single label and wording (states, coding status, etc.); the label natural language wording has only a single text element. Audience-specific labels, abbreviations, media icons/selectors & wording All Representations within a Label must have different language/audience values. Abstract base type used for stat. measures and modifier definitions (certainty, frequency, etc.); the label natural language wording has text before and after! Label with abbreviations and wording for natural language reports. All Representations within a Label must have different language/audience values. Note: It would be possible to define a VocabularyW3Base abstract base type, but this would be used only for concept nodes. === EXTENSIONS of UBIF (Unified Biosciences Information Framework) elements ProxyData objects: @@Currently this is a DUMMY, pending any decision about how to handle classifiers! It is kept in the schema to mark the places where scope classifiers are needed. ### Defines an element with a ref attribute pointing to a ClassName in Entities (in biology: Class = Taxon) - plus: additional classifier references, e. g., to further define the sex, generation, or life cycle stage of descriptions or class names in keys. Include file for the main SDD schema. This file isolates a number of derived simple types used to define ID-based relations between object definitions and object references. For each kind of relation in SDD a specific type is used. The use of the type is intended to clarify the relations, which otherwise are hidden in the xml schema identity constraints that are difficult to study. Bob Morris proposed using this to help when wording with tools like Castor. Clearly, these types are technically redundant, and the semantics could also be documented separately (and are already in the identity constraints), but they hurt very little either. They are isolated in this include file so that they do not clutter the type list in the main SDD schema file. ---- Relation types used in general declarations(defined to help in type-safe programming; this duplicates information also defined in schema identity constraints): The recommend pattern is expertise level, optionall a dot and a short character code (if multiple audiences are defined). GH: Das alte Pattern einschließlich language war: "(([a-z][a-z])|([A-z][A-z]-[A-z][A-z]))([1-5]|([1-5][a-z]))?" für, z.B. de, en, EN-US, fr3, en4a, en4b. Ein mögliches neues Pattern wäre [0-5]|([0-5]\.[0-9]+), also z. B. 0, 3, 5.42. Im Moment jedoch besser kein Pattern verwendet! Derived from RelationID simple type without changes. Declares a unique type to clarify relations between key definition and key references and supports type-safe programming. Derived from RelationID simple type without changes. Declares a unique type to clarify relations between key definition and key references and supports type-safe programming. Derived from RelationID simple type without changes. Declares a unique type to clarify relations between key definition and key references and supports type-safe programming. ---- Relation type used in glossary (defined to help in type-safe programming; this duplicates information also defined in schema identity constraints): Derived from RelationID simple type without changes. Declares a unique type to clarify relations between key definition and key references and supports type-safe programming. ---- Relation types used in terminology (defined to help in type-safe programming; this duplicates information also defined in schema identity constraints): Derived from RelationID simple type without changes. Declares a unique type to clarify relations between key definition and key references and supports type-safe programming. Derived from RelationID simple type without changes. Declares a unique type to clarify relations between key definition and key references and supports type-safe programming. Derived from RelationID simple type without changes. Declares a unique type to clarify relations between key definition and key references and supports type-safe programming. Derived from RelationID simple type without changes. Declares a unique type to clarify relations between key definition and key references and supports type-safe programming. Derived from RelationID simple type without changes. Declares a unique type to clarify relations between key definition and key references and supports type-safe programming. ---- Relation types used in descriptions (defined to help in type-safe programming; this duplicates information also defined in schema identity constraints): Derived from RelationID simple type without changes. Declares a unique type to clarify relations between key definition and key references and supports type-safe programming. Derived from RelationID simple type without changes. Declares a unique type to clarify relations between key definition and key references and supports type-safe programming. ---- Relation types used in identification keys (defined to help in type-safe programming; this duplicates information also defined in schema identity constraints): Derived from RelationID simple type without changes. Declares a unique type to clarify relations between key definition and key references and supports type-safe programming. Derived from RelationID simple type without changes. Declares a unique type to clarify relations between key definition and key references and supports type-safe programming. ### Copyright © TDWG (Taxonomic Databases Working Group, www. tdwg.org), 2004. This file is a special version of the Unified Biosciences Information Frameword (UBIF) XML schema. It may be used only for viewing convenience and may not be distributed independently from the primary schema files (UBIF.xsd, UBIF_TypeLib.xsd, etc.). The inclusion of all parts starts below: !###

Unified Biosciences Information Frameword (UBIF) XML schema for data exchange and integration across knowledge domains. The schema has been design for biological data, but is applicable to other knowledge areas as well. It is based on work of the TDWG SDD and ABCD subgroups and currently jointly authored by the SDD, ABCD, TaxonName subgroups and by GBIF (Global Biodiversity Information Facility). The framework may be used without changes for new schemata, no registration is necessary. Its main features are:
* A foundation of shared simple and complex types, including some enumerations to simplify world-wide data integration and interoperability across language barriers. * A top-level structure of Datasets collections containing independent Dataset objects. The collection is purposely semantically neutral; relations between Dataset have to be discovered by the data consumer or are assumed to be implicit in the protocol requesting the data.
* Derivation metadata that support tracing and debugging the online transformation history data. They provide important technical information about access providers and the path of potentially multiple portals involved.
* Metadata describing the principal data collection from which the dataset was derived. The dataset may represent the entire source dataset or it may be filtered, normalized, or enriched with secondary information. A dataset is never an aggregation of multiple data collection sources with different authorship, copyright, or other IPR; these are assumed to be delivered as separate datasets. Note: Derivation and content/source metadata together provide all necessary information for UDDI support.
* External data interface (EDI) providing a standard mechanism to link to external data providers for knowledge domains outside of the scope of the current dataset. This includes a collection of supported object linking mechanisms involving globally unique identifiers and resolving mechanisms. Proxy objects can replace a links in cases where a specific object is (perhaps not yet) available in an external data source, and they cache a minimalized data interface on the assumption that access is asynchronous, slow, or may be temporarily unavailable. Furthermore, these cached data provide semantic information to human readers, preserving the semantics of a link even if it has become permanently broken.
* A single "payload" element which must come from a different namespace. Note that within a Datasets collection each Dataset object may have a payload from a different external schema. It is the responsibility of the consumer to decide which dataset payload it is interested in or can process.

Conventions: Element or attribute names starting with underscores (__) are present in the schema for discussion purposes only and should be only experimentally used. Annotations containing @ indicate unfinished points of discussion.
Note: blockDefault="#all" in xs:schema prevents that in instance documents derived types can be used in elements typed to the base type (which otherwise is possible using xsi:type=""). - finalDefault is not set, further type derivation is currently not considered problematic. Please contact us if you believe otherwise. Note that according to the w3c discussion forum, the developers of xml Schema consider to drop the final attribute in the upcoming XML Schema version 1.1. - Nillable: xsi:null is not supported in UBIF documents (schema declaration nillable="false" is default, not explicitly stated).

Copyright © TDWG (Taxonomic Databases Working Group, www. tdwg.org), 20. July 2004. Licensed under the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version (http://www.gnu.org/licenses/gpl.html). Schema designed and annotations authored by G. Hagedorn & W. Berendsohn, Berlin with help from members of the SDD, ABCD, TaxonName subgroups.

The Datasets collection is the only root element allowed in UBIF: Root element for files or data streams. Multiple Dataset objects are completely independent. Potential relationship may be detected by the consumer, but are not expressed in the UBIF format. The sequence of Dataset objects has no semantics and does not have to be preserved. The version of the UBIF standard used is defined in the namespace declaration and needs no separate data element. A single file or data stream may consist of multiple data sets A history (tree) of all automatic or semi-automatic data derivations (transformations) through computer programs: database export, filtering, merging, or unmodified data provision through portals. The elements immediately in this element describe the process that created the current xml document. [ATTR: datetime (= When was it done?), gooduntil (= caching interval)] Data from other knowledge domains to which the data set refers may be represented by collections of proxy data objects. In the absence of available external databases a proxy object may be used as a local placeholder. The data inside the proxy object usually provide a reduced interface data model that abstracts from a potentially more complex external data model. Examples: persons, publications, geographical localities, media resources, but also class names (biology: taxa) and objects/units (biology: specimens). Metadata referring to the principal source of the entire data collection (the metadata scope may be wider than the objects actually contained in the data set). The 'payload' of the dataset exchanged using UBIF. At this point a new namespace is defined (and usually the default namespace is redefined). Note that if an xsi:schemaLocation is desired, it should not be defined here but added to an xsi:schemaLocation attribute in the Datasets root element. Example from SDD instance document: <DescriptiveData xmlns="http://www.tdwg.org/2004/SDD"> === Data derivation, transformation, and derivation history: Describes the providers and application/ script(s) that produced the current data set, plus a derivation history of all automatic or semi-automatic transformation with negligable or automated content changes. Derivation examples: a) Generation of file from a database, b) Adding/removing data to/from an existing UBIF xml file, c) Passing data through a portal without intentionally changing any data. The information provided here is intended to a) facilitate debugging b) react to known deficits of generators, esp. if generators produce syntactically correct but semantically faulty data (misapplication of data elements, etc.) c) evaluate the quality and scope of archived data, especially whether the data contained in the document are complete or an excerpt from a larger data set. d) inform about options to update/refresh data [ATTR: datetime (= When was it done?)] = Date and time (UTC or local time with timezone information) at which the current document or data stream was created by the generator. Using UTC (Universal time coordinates = Greenwich mean time) is recommended. [ATTR: gooduntil (= information about expiration of validity for caching purposes.] Which tool did it? Metadata about the software (application, script, etc.) that performed the derivation/transformation. [ATTR: name, version, notes, routine] (Detailed attribute annotations exist, but are not visible in graphical schema view!) Name of the application performing the transformation. The term 'application' should be understood in a loose sense; it may be a script that is not part of a larger application (compare the Routine attribute, which may provide the detailed name of scripts that are part of an application!). Version of the application that has generated this document. The attribute should not be named 'Version' to avoid confusion with the version of the content (see content Metadata). Additional information about the generating application that is not part of the name or version. If the copyright of the generating application is specified, it should be understood that this does not affect the content copyright of the data. Optionally allows a generating application to identify which of possibly multiple transforming routines (database code, xslt, etc.) was used. This attribute may also be used, to identify different conditions under which the export routine may behave differently. What was done? Metadata describing actions on the entire data set, recording especially the intent to transform data or pass them on unchanged. This element may be missing if the actions are variable/cannot be traced. However, consumers may wish to avoid datasets containing untraceable derivation actions. -- Note: The combination of Actions in this and previous derivations (see DerivationHistory) implicitly informs about the completeness of data (relevant when comparing archival data sets). The kind of action is described using three required boolean attributes: ATTR: addition, removal, normalization = All 3 are false if the data are passed on without changes (except for information in the Derivation element itself). - The scope of action may optionally be described in: ATTR: affectedobjecttype = if empty the Action record is a summary for the entire dataset, else actions on specific object types may be described separately. ATTR: within = if true, the action changes values and structures inside objects rather than presence/absence of entire objects in their collections. ATTR: description = optional free-form text description of the action. If the dataset is based on a query, use: [ATTR: uri = If online query can be expressed as a single URI, it is recommended to provide this (even if another mechanism has actually been used). Executing a request to this URI is expected to return the same data set if no content updates occurred and updated information otherwise. If the only web query mechanism has a wizard-like interface involve multiple steps a Query/URI may not exist. ATTR: description = Optional description of the query in any format considered intelligible to human readers. For example, it may be a set of rules in a programming language, an xpath, sql, or oql expression, or plain language (English is recommended, but not required). The format itself should be explained unless plain language is used. The information is intended for human consumption to improve the interpretation of documents that archive extracts and snapshots at a certain point in time.] Who did it? Technical contacts are those to whom questions about accessibility of a provider or resource should be directed. Who did it? Administrative contacts [= Content contacts] are those to whom questions and feedback about data, or restrictions on use of the data should be directed. Optional description of the derivation actions, acknowledgement, copyright, etc. statements. The statement should be complete and identify the speaker (Technical/AdministrativeContact should not be expected to be displayed). - This is the only item in Derivation expected to be displayed on web reports addressing the general public. All other items in Derivation are normally displayed only on technical pages. -- Note: Claiming copyright/database rights on derivations may interfere with the usability of data and is not recommended. Care must be taken to avoid violating the rights of holders of the original content copyright! The derivation history includes all automatic or semi-automatic transformation with negligable or automated content changes. It does NOT include the history of content revisions and expansions, possibly combined with changes of copyright or ownership; this history must be acknowledged in the Description, Owner and IPR statements in Metadata. Whenever a data provider receives a dataset already containing derviation data, it will put these unchanged into previous derivations and add its own data as a new outer layer. Thus the outermost Derivation is the most recent (immediate) one, the innermost the first. Usually this contains only a single node! The history is not an array, but the recursion or Derivation within Derivation! However, multiple earlier derivations may be present if information has been merged. Example: SDD descriptions are enriched with images created by a geography server and based on ABCD collection data. Datasets should be kept separate whereever possible, e. g. in the case of specimen data from multiple collections. [ATTR: datetime] When did it occur? Date and time (UTC or local time with timezone information) at which the current document or data stream was created by the generator. Using UTC (Universal time coordinates = Greenwich mean time) is recommended. The data in this Dataset are guaranteed not to change until this date. No guarantee is given after this date and a cache should be refreshed. If the provider cannot guarantee that the data will not be changed until a future date, this attribute should be omitted. Used for transformation Action elements inside DerivationMetadata. This extension mechanism is designed for future versions of the UBIF standard. It might, e. g., allow to list references to ids of objects affected by an action. Most transformations would probably not give such information, but it could be valuable where given. Normalization are actions that change data in a way intended to improve data quality without changing the meaning. Examples: standardization of collectors names or abbreviations. Primarily intended for changes within the major objects. However, a collection of objects may be normalized as well, e. g. if duplicate objects are removed. If empty the Action record is a summary for the entire dataset, else actions on specific object types may be described separately. If within is true, the action changes values and structures inside objects rather than presence/absence of entire objects in their collections. Optionally an unconstrained text with details of the transformation action and completeness of data relative to the source of the current transformation. It may include both an account of the actions as well as a narrative describing the purpose of the transformation. This should address human readers. It is intended for technical usage, not to be displayed in consumer-oriented web pages. English language is recommended but not required. === Meta data about the entire data collection from which the data set was derived: Metadata referring to the principal source of the entire data collection (thus the metadata scope may be wider than the objects actually contained in the data set). If a history of the data collection (revised or expanded in various projects or at different institutions) exist, this must be reflected in the IPR statements and possibly in the list of Owners. Language-specific header information [ATTR: language] The Language values must uniquely identify the Representations within Description. Language-independent expressions of limited geographical, taxonomic, etc. scopes. In the case of projects in progress, 'scope' may define the planned or intended, rather than the achieved scope (or coverage). Compare also Coverage in Description (which is language-specific). (Items from Scope may be added to DC.Coverage) A data collection may have a limited geographical scope. Example: 'Germany', 'Austria'. A data collection may have a limited class scope (biology: taxonomic scope). Example: 'Hymenoptera' Information in the entire dataset may come from these (printed or digital) publications. Note that if data are not just copied from publications into independent descriptions, but revised and combined with expert knowledge, SourcePublications should not be used. Such a process creates an independent new work and the publications are only cited in the descriptions). @@ E.g. ecological like "Temperate rainforest", 'insectivores' (bats, birds, mammals' ...), temporal (jurassic fossils)... Problem: these should be external subject vocubularies that should be linked to... Library of congress subject headings may be usefull. Number and date of current version The major version number ('1' in 1.2) as defined by the content creators. An optional minor version number ('2' in 1.2) Unconstrained text specifying status + optional number, e. g., 'beta', 'alpha', 'rc/release candidate', 'internal'. If missing, release status is assumed. Citable 'publication date' of the current version (comp. RevisionData/ Initiation- and LastRevisionDate for version- independent dates). This date must be missing if the current version is not yet published! (= DC.Date.issued; http://purl.org/dc/terms/issued) Note: currently no mechanism exist to record the date of the first version release. Is this needed? Creators, Revision status, and dates of the entire data collection from which the current dataset is derived. A globally unique ID-string, distinguishing a data collection (which may be identical or larger than the current dataset) from all others. The value should never be changed once it has been introduced. To refer to objects within the dataset from elsewhere, this value is combined with the object. If you don't have this, it will be difficult to compare versions of data collections. Recommendation: Avoid choosing simple names that are likely to be used multiple times ('plants', 'French bees', etc.). Authors working at research institutions that allow to use their name as permanent identifiers (even if the author stops working there), may use institutional-URI/personal-or-team-name/ data-collection-label (example: xyz.de/hagedorn/coelomycetes). Note that this is only an identifier and does NOT help to locate real web resources. Language-specific content metadata (title, description, etc.) with *required* Language attribute added. A short, concise title. Does not support any formatting! (= DC.Title) General Note on DublinCore translation: In addition to those that can bee transformed from UBIF metadata, an additional DC.Type='dataset' should be added. Free-form text containing a longer description of the project. (= DC.Description) Free-form text describing geographic, taxonomic, or other coverage aspects of terminology or descriptions available in the current project. (= DC.Coverage) Optionally an image media resource containing an icon/logo symbolizing the project. [ATTR: ref] URL pointing to an online source related to the current project, which may or may not serve an updated version of the terminology or descriptions. === Proxy data objects (representing external resources) and references to these objects: Collections of non-abstract data proxy elements, forming an interface to potentially existing more object representations Class (biology: taxon) names used in the project. Each proxy object contains a name - either locally defined or representing an external resource defined in a linking mechanism and defines a local id attribute that may be referred to multiple times from within the project. Biology: Object in a nomenclator [ATTR: id] The labels of proxy objects must be unique for a given language. Optional hierarchy (= tree, biology: taxonomy) of classes defined above. A hierarchy may be incomplete, i. e. some ClassName object may not be in the hierarchy. ClassHierarchies may be locally defined or represent an external source. Biology: Taxonomic hierarchy, or arbitrary set of taxa. [ATTR: id] The labels of proxy objects must be unique for a given language. Units are physical objects (biology: specimens) that are collected, described, or observed. In biology a collected object is often called a specimen. Biology: Object in a collection (= specimen) or an observation. Units may be identified or assigned to a Class name. [ATTR: id] The labels of proxy objects must be unique for a given language. Documentation of persons/organizations involved in the authoring, compiling, editing, etc. of the data set. @@ The specific elements are only a preliminary sketch, this should be synchronized with TDWG ABCD! [ATTR: id] The labels of proxy objects must be unique for a given language. Publications used in the project, defined through proxy objects (= local or external link, see under Agents). Printed or digital publication (including database source) [ATTR: id] The labels of proxy objects must be unique for a given language. Geographical locations (often country names, but potentially on any level), defined through proxy objects (= local or external link, see under Agents). An example of an external gazetteer referred to is the TDWG Geography standard. [ATTR: id] @@A geographical locality data interface would be highly welcome. It could be based on something similar to the DarwinCore Locality elements. The Unit interface could express its locality information through reference to this proxy Interface! Geographical coordinates (decimal longitude/latitude) of the location from which the object or observation was collected. The labels of proxy objects must be unique for a given language. Resource definitions containing links like URLs or actually embedding the resource (e. g. encoded images). These are proxy objects (= local or external link, see under Agents). [ATTR: id] The labels of proxy objects must be unique for a given language. Measurement units like mm-square, °C, ml, pH, and dimensionless scaling factors like %, promille. [ATTR: id] The labels of proxy objects must be unique for a given language. Abstract base type for proxy objects representing external resource objects (publications, class names, specimens, etc.). Provides a free-form label (this may be locally defined and the only data item if no external object is available) plus an ID-based link to an external object. Human readable representation. This may be the only data item if no machine readable ObjectLink exists. Example for a publication: "Smith 1998. Flora of Erehwon, XY Publishers." Even if an external ID exist, the Label is required. It preserves the semantics of the proxy object (= keep interpretable by humans) even if the machine-readable object links are broken. Label should be updated automatically (without human control) only after a human decided that the semantic management of an external object provider can be fully trusted. Some Labels like scientific taxon names or publication references can be expressed more or less language-independent, others like geographic names are always language dependent. @@Discussion neccessary: language type is currently extended with neutral and unknown codes ('-', '?'), is this necessary?@@ The Abbreviation element provided is not necessary for all proxies, but especially useful for class names (e. g., for tabular reports) and publication abbreviations (author/year style). Defines an ID of an external object or one to several services providing it. The format in which the object is returned is undefined and needs to be interpreted by the receiving application. Ideally, common standards (TDWG, MARC, etc.) should be used. === Class names (biology: taxon names): Used for class names (biology: taxon names). Provides a locally defined simple free-form text plus an optional link to an external resource object. This may be changed to allow entering a structured form of taxonomic names (Genus/Higher taxon, rank, optional specific/infraspecific epithets, authors). However, note that simply splitting into taxon name and authors does not work, because authors may be in the middle of the parts of the taxon name (e. g. in botanical autonyms). Currently the development of the TDWG taxon names standard should be awaited first. Note that Class names are not restricted to accepted names (also referred to by Synonyms in ClassHierarchyNode type) Extensions of ProxyBase specific to ClassNameProxy For biological taxonomic names: order, family, species, etc. Derived from an enumerated value list. This element needs to be interoperable; formatting often depends on specific ranks rather than on relative place in the hierarchy alone. Defines an element with a ref attribute pointing to a ClassName in ExternalDataInterface (in biology: Class = Taxon) Refers to a class name (biology = 'taxon'; ExternalDataInterface/ClassNames/ClassName) A collection of ClassRef type elements Reference to a class name (in biology = taxon name) defined in ExternalDataInterface/ClassNames [ATTR: ref] === Class hierarchy (biology: taxon concepts): Used for class hierarchies (taxonomies) Extensions of ProxyBase specific to ClassHierarchyProxy For example, SDD supports taxonomic (order/family/genus etc.) and non-taxonomic (weed species, diseases, herb/shrub/tree) hierarchies. For many analytical purposes it is relevant whether a hierarchy is based on phylogenetic (= evolutionary) relatedness or whether it is an operational categorization. Note: a conventional taxonomic hierarchy should be considered phylogenetic until proven to be not. Root of the recursive tree A node in a class hierarchy tree (biology: taxonomical hierarchy) A node either contains a class reference (biology: taxon) and optionally (if it is a higher level class) further child Nodes, or it is anonymous and contains only further child Nodes. Nodes may not be empty. (The complex choice/sequence expresses the A, or B, or A and B constraint which is difficult to express in xml-Schema.) The class (biology: taxon; with optional synonyms) that identifies the node. Refers to a class name (in biology a taxon name) [ATTR: ref] Rather specific to biology: Taxa above rank of species have a lower taxon by which they are typified. Rather specific to biology: Taxa of species rank or below have a physical unit (specimen) by which they are typified. Collected and preserved unit(s) (biology: specimens) by which the name is typified. (The expression of synonyms may be essential for reports and to convey the concept of a class to information consumers.) References to project-wide defined ConceptStates (defined at the nodes of concept trees) must be unique within each character. This is achieved by a uniqueness constraint (local to each character) on the ref attribute of StateReference. The id attribute is already unique through the general CharacterStateKey. If class identification is present, further nodes are optional. The class identification may be missing, but then further Nodes are required. A collection of objects with ClassHierarchyNode type Defines an element with a ref attribute pointing to a ClassHierarchy. === Units (biology: specimen, 'Objects' in earlier versions of SDD): Used to define objects that are collected, described, or observed (collected objects may be preserved permanently in a specimen collection). In biology a collected object is often called a specimen. Provides either a simple free-form descriptive label ('so-and-so in freezer 14, with tag 1233'), or a link to an external collection unit. Note that the term 'Unit' as used here has no relation to 'measurement units' or 'organization units'. Extensions of ProxyBase specific to UnitProxy @@ SomeElementsAnalyzedBySDD: These are just the preliminary elements identified by SDD to be necessary as local extensions. A decision needs to be made, compare the DWC-based present in an alternative interface group! @@ Identification of specimen object. The information may come from the service provider. If the service provider only provides a name, this must be compared with and if necessary added to the list of ClassNames so that a ClassName reference may be used here. This may point to a higher taxon (family, order, or even "plantae") to indicate incomplete, broad identifications. [ATTR: ref] Default is 'certain'; 'Abies cf. alba' would be recorded as 'uncertain'. False = object has not been collected and preserved (it may still be databased in an observation database and have an ExternalID!). The default for this element is true, i. e. if the element is missing the object has been collected/preserved. ### To be decided! Extensions of ProxyBase specific to UnitProxy This is derived from DarwinCore, "version 1.25 2003/05/24 11:14:24 John Wieczorek", but in a first attempt tried to rework into structures compatbile with UBIF usage. The following is not yet a serious proposal, just a basis for further work. Most likely this is too rich at the moment for a simplified interface... DarwinCore 'core' fields A description indicating whether the record represents an object or observation (e.g., tissue sample, living organism, voucher specimen, germplasm/seed, genetic information, etc.) The code (or acronym) identifying the institution administering the collection in which the object or observation record is cataloged. No global registry exists for institutional codes; use the code that is "standard" in your discipline. This attribute must contain no spaces. The code (or acronym) identifying the collection within the institution in which the object or observation record is cataloged. This attribute must contain no spaces. The alphanumeric value identifying an individual object or observation record within the collection. It is highly recommended that each record is uniquely identified within a collection by this value. It is also recommended that each record is universally uniquely identified by the combination of InstitutionCode, CollectionCode and CatalogNumberText. The name(s) of the collector(s) of the original data for the object or observation. Date in which the object or observation was collected from the field. Each part of the date may be missing. (= DarwinCore: YearCollected, MonthCollected, DayCollected, VerbatimCollectingDate) ATTR: year = four digit year; month = two digit month of year; day = two digit day of month An identifying string applied to the object or observation at the time of collection. Serves as a link between field notes and the object or observations. An identifying string applied to a set of objects or observations resulting from a single collecting event. Notes taken in the field for the object or observation, or a reference to such notes. The combination of all geographic elements less specific than locality. "Like" query operations on this element will search for a substring that might be in any of the higher geography elements. The full, unabbreviated name of the continent or ocean from which the object or observation was collected. The full, unabbreviated name of the island group from which the object or observation was collected. The full, unabbreviated name of the island from which the object or observation was collected. The full, unabbreviated name of the country or major political unit from which the object or observation was collected. The full, unabbreviated name of the state, province, or region (i.e., the next smaller political region than Country) from which the object or observation was collected. The full, unabbreviated name of the county, shire, or municipality (i.e., the next smaller political region than StateProvince) from which the object or observation was collected. The description of the locality from which the object or observation was collected. Need not contain geographic information provided in other geographic fields. Geographical coordinates (decimal longitude/latitude) of the location from which the object or observation was collected. Includes geodetic datum and an optional verbatim text representation. The upper limit of the distance (in meters) from the given latitude and longitude describing a circle within which the whole of the described locality must lie. Use NULL where the uncertainty is unknown, cannot be estimated, or is not applicable (e.g., because there are no coordinates). The minimum and maximum altitude in meters above (positive) or below (negative) sea level of the collecting locality. (= DarwinCore.MinimumElevationInMeters / MaximumElevationInMeters A text representation of the altitude in its original format in the source database. The minimum distance in meters below the surface of the water at which the collection was made; all material collected was in this range. (= DarwinCore.MinimumDepthInMeters / MaximumDepthInMeters A text representation of the depth in its original format in the source database. A reference to the methods used for determining the coordinates and uncertainties. This includes DarwinCore GeoreferencingMethod and GeoreferencingReferences The extent to which the georeference has been verified to represent the location where a Cataloged Item was collected. The name(s) of the person(s) who applied the currently accepted ScientificName to the object or observation. The date in which the unit (specimen, observations, strain, culture, animal) was identified as having the ScientificName. A standard term to qualify the identification of the object or observation when doubts have arisen as to its identity(e.g., "cf.", "aff.", "subspecies in question", etc.). The name of the phylogenetic kingdom in which the object or observation is classified. The name of the phylogenetic phylum (or division) in which the object or observation is classified. The name of the phylogenetic class in which the object or observation is classified. The name of the phylogenetic order in which the object or observation is classified. The name of the phylogenetic family in which the object or observation is classified. The full name of the lowest level taxon to which the object or observation can be identified (e.g., Family, Genus, Genus+" "+SpecificEpithet, Genus+" "+SpecificEpithet+" "+SubspecificEpithet, etc.). The name of the genus in which the object or observation is classified. The specific epithet of the scientific name applied to the object or observation. The subspecific epithet of the scientific name applied to the object or observation. The author of the ScientificName. Can be more than one author in a concatenated string. Should be formatted according to the conventions of the applicable taxonomic discipline. A list of one or more nomenclatural types (including type status and typified taxonomic name) represented by the object (e.g., "holotype of Ctenomys sociabilis. Pearson O. P., and M. I. Christie. 1985. Historia Natural, 5(37):388."). Does not apply to observations. Free text references to information not covered elsewhere (e.g., URLs to specimen details, photographs, publications, etc.). DarwinCore Curatorial The sex of a biological individual represented by the cataloged object or observation (e.g., male, female, hermaphrodite, gynandromorph, not recorded, indeterminate, transitional - between sexes, for sequential hermaphrodites). The age class, reproductive stage, or life stage of the biological individual (e.g., juvenile, adult, eft, nymph, etc.) referred to by the catalog number. A concatenated list of preparations and preservation methods (skin, skull, skeleton, whole animal (Ethanol), slide, etc.) for the object. Includes tissue preparations (frozen, EDTA, etc.). Does not apply to observations. GenBank Accession number(s) associated with the biological individual(s) referred to by the cataloged object. A list of previous or alternative fully qualified catalog numbers for the same object or observation, whether in the current collection or in any other. The fully qualified identifier (InstitutionCode+" "+CollectionCode+" "+CatalogNumberText) of the related object or observation, preceded by the nature of the relationship (e.g., "(sibling of) MVZ Mamm 1234"). The current disposition of the cataloged item (e.g., "in collection", "lost", "voucher elsewhere", etc. Free text comments accompanying the object or observation record. DarwinCore Microbial Fate of the isolate between isolation and deposit in the present collection. The backward sequence of deposits is used separated by "<" meaning "received from". Each entry may contain the name of the collection, (month and) year of the acquisition. Between parenthesis can be entered: strain designation or collection numbers (only when confusion is possible between two or more numbers from the same collection) and/or a name when a name change has occurred. Example: [in Bacillus sphaericus DSM 488] NCTC, Nov. 1973 (Bacillus loehnisii) < T. Gibson, 1935 < Kral Collection (Bacillus probatus) Name of the Depositor The date in which the unit (strain, culture, animal) was deposited in the collection. Substrate from which the strain was isolated (soil, water, blood, leaf, etc) Name of the person perfoming the isolation into pure culture Method used to isolate the strain Any specific conditions related to cultivation and maintenance of the strain such as culture medium, atmospheric and light conditions, temperature, etc Names of chromosomal markers of the strain Type and parent of mutant if strain is a mutant strain Name of the race of the strain and authors of the race Name of the alternate state of the strain and authors of the alternate state Any specific properties of the strain (enzyme production, metabolites production, degradation, etc) Any specific applications that the strain may have, such as in bioremediation, inoculants, biologic control, etc Hazard group, pathogen class, plague type Any specific disease that the strain may cause Defines an element with a ref attribute pointing to a Unit (biology: observation or specimen) defined in ExternalDataInterface. @GH@: Discuss whether to add a separate element for collection abbreviation (cached information form provider or from Refers to a Unit object identifier (biology = 'specimen') Extension of UnitRef with a required type status attribute (NomenclaturalTypeStatusOfUnitsEnum) The type status of a unit (biology: specimen). See the enumerated type for further information. === Publications, references, and citations: Used for resources like publications, laboratory notes, speeches, etc. Provides either a simple free-form text, or a connection to an external resource. Extensions of ProxyBase specific to PublicationProxy @@GH: Two proposals for publication-specific extensions of the proxy base data. Both have advantages and I can imagine either solution. The important thing would be to select a common solution for SDD, ABCD, TaxonNames, LinneanCore, etc.! GENERAL Note: Some parts of publication representations are already available as proxy base data. These are: - unconstrained text form as commonly found inpubliched referende (i.e. not atomized belongs into the Label. - URL location of the article on the web and DOI (digital object identifier) can be found in ObjectLinks) Extensions of ProxyBase specific to PublicationProxy This structure is based on the Linnean Core proposal and checked against the DiversityReferences and ReferenceManager(TM) data structures. It would provide a relatively satisfying full structure usable in the absence of other literature management systems. Note: Many aspects of reference managers such as keywords, abstracts, availability, or reference types are not supported in the current data interface. However, they may be added and managed inside the generic extension mechanism, see "CustomExtensions" above, . @@Open question: How to reference a software? Year as appearing on the publication. Compare TruePublicationDate below. Effective date of publication; may be different from year stated on/in the publication. Important for taxonomic or other priority. [ATTR: year (required), month, day (optional). Typed as gYear, gMonth, gDay; note that gMonth requires '01' instead of '1' for 'Jan.') Series of books or articles (the latter may be published in edited books. journals, or on the web). Series title Series editors Printed book: monograph or edited book with articles Book title (monograph or edited book) Book creators are authors if Book is used alone or in combination with Series or Chapter, but editors if used in combination with article Volume or part in a series Total range of pages, including foreword, appendices, index and plates/figures. International Standard Book Number Number of the edition of a book. Publisher, reprint year, note, etc. for historical books that are reissued. Periodical/magazine /journal information We really need BPH and TL2 as standard dictionaries to drive these titles Standardized abbreviated form of title International Standard Serial Number Publishers of a book, periodical, or independently published article. The name of the publisher (publishing company or institution, including universities or scientific societies). The location where the item being referenced was published, such as a city and state. Articles may, e. g., be published in periodicals, edited books, the internet. Volume of periodical (empty if article appears in edited book) Part or issue of a periodical volume (empty if article appears in edited book) Pages of article. This may include table, or figure numbers for the reference. Examples: '23-41', '341 pp.', or '20, 22-24, 32' (for non-consecutive pages). Optional information about a chapter, section, etc. that has the same authors as the publication in which it is contained. Compare Article for authored chapters in edited books. Number of chapter, section, etc. as used in the publication. Pages of current part ('22-34') Extensions of ProxyBase specific to PublicationProxy This structure is less satisfying in the absence of a literature management system, but it provides some atomization helpful in finding or filtering local proxy data and in associating locally recorded data with external databases at a later time. For article, chapter, or monographic book the authors, for an entire edited book the editors. The editors of the book in which an chapter appears are not listed here, but as part of the Source text string. Title of the immediate publication (i.e. title of authored chapter, but not of source book or journal). Year as appearing on the publication. Compare TruePublicationDate below. True date of publication, especially if different from stated year. Important for taxonomic or other priority. All remaining information, including Periodical/Volume for articles, or edited book for articles and chapters in a book, with the exceptions of the separate Pages (see below). Pages of article. This may include table, or figure numbers for the reference. Examples: '23-41', '341 pp.', or '20, 22-24, 32' (for non-consecutive pages). International Standard Book Number. @@Although this is an ideal key, this element may be dropped from the selective structure! Only very few references are covered by entire books with ISBN. Articles in journals are far more frequent and it would be more valuable to be better support those. Defines an element with a ref attribute pointing to a Publication (ExternalDataInterface/Publications/Publication) A collection of elements of PublicationRef type. [ATTR: ref] --- The following types build on the PublicationProxy infrastructure: Combines a publication resource reference with a detail location within that reference (esp. page number) Refers to a publication as defined under ExternalDataInterface/Publications [ATTR: ref] Location within publication where the cited data can be found: Page, table, figure number, database record, html document bookmark, etc. (Note: this is not the page range of the entire article!). If publication is a non-persistent web resource that may change or disappear, the date at which the citation was verified to be appropriate should be recorded. It may later be updated, but not through a link checker verifying only technical access: the semantics of the citation have to be verified! If publication is a non-persistent web resource that can not longer be verified, the date it was found to have disappeared (or became semantically inappropriate) may be recorded. Verbatim name as it appears in citation. 'sub name xy' @@ Do we need this? I think the use case may be considered an extension to the Location element. Example: "p. 3, sub Ustilago"@@ A collection of Citation-type elements === Agents (persons, organization, software agent): Used for Agent documentation (an Agent is a person, project, organization, or software agent). Currently used for authors, editors, contributors, and translators. Ideally it connects to an outside definition or documentation of the Agent. Extensions of ProxyBase specific to AgentProxy (The Agent-specific proxy extension is partly modeled after elements defined in vCard 3.0 and Jabber, see http://www.jabber.org /jeps/jep-0054.html.) (Mostly vCard:Org) Full organization or corporate name in multiple languages (en: 'Botanical Garden of ...', de: 'Botanischer Garten von ...'). (vCard:Org.OrgName) The standard Label mechanism also supports acronyms/abbreviations (no vCard equivalent!). For collections, the organisation abbreviation maps to Darwin Core 2: Institution Code. If Agent contains no person definition: the unit within the organization the agent represents, else a list of the various organisational units to which a person may belong. (vCard:OrgUnit) (vCard:OrgUnit) (There is no equivalent to vCard:FN/full name here, this is already covered by proxy Label above). For the problems involved in atomizing names from different cultures compare http://dublincore.org/documents/1998/ 02/03/name-representation/ See also http://efgblade.cs.umb.edu/twiki/bin/ view/SDD/ProxyDataAgentProxy on our own WIKI. @@ To be decided before schema can be published! @@ The full name in preferred sorting sequence, i. e. with main name first. Use case: sorting, reporting in sorted lists. Examples: 'Duarte, Amália Mourinha' (pt), 'Pina de Morales, Ana Maria' (es). (vCard:Sort-String) Professional or academic title of individual person (prefer using Role for job titles!) (vCard:Title) Enumeration of male, female, unknown (vCard no equivalent) Birthday of person. (vCard:BDay, may include time) Death date of a deceased person. (vCard: not surprisingly no equivalent) (Software agents are not handled by vCard!) (Software agents probably need to be extended in future versions.) Role of Person or Organization in context. This element can be used to provide a title such as "Database Administrator" or "Curator" even when no individual person is named. (vCard:Role) @@Note gh: I see a problem with the unparsed address proposals in the original ABCD model and in two of the alternatives presented here, in that the Label for the Agent often requires the addition of city/country to disambiguate multiple agents with the same name (vCard:Adr) @@ To be decided before schema can be published! @@ Telephone/fax/modem numbers (vCard:Tel) [ATTR: number = should be provided in the ITU Recommendation E.164 international format ("+CountryCode AreaCode Number") (vCard:Tel.Number) ATTR: devicetype = voice, fax, mobile, pager, modem (identical with vCard:Tel.Voice etc.; if several are on a single phone number list the phone number with each device type!) ATTR: usagenote = free-form text for constraints on use e. g. "weekdays only" or "home number" (partly: vCard:Tel.Home/Work flags) ATTR: preferred = preferred number, may occur multiple times for different device types (vCard:Tel.Pref)] E-mail addresses (vCard:Email) E-mail address for contact (vCard:Email.UserID; this also has Home/Work flags not supported here) [ATTR: preferred (vCard:Email.Pref)] URI pointing to a homepage with further information. Note: If the Agent has a permanent URN representation, it is expected in ObjectLink in the base type. (vCard:URL, vCard supports only 1 URL) URL for person or organization [ATTR: preferred] URL of logo or icon image; usually of organization but may also be used by a person. (vCard:Logo) (Note: vCard:Note maps to Annotation in the base type!) ### To be decided! PROPOSAL 1: Atomized structure Family names, generational names, clan name, parents/grandparents personal names, etc. This (= last name in western cultures) may be compound ('Fischer von Waldheim', 'da Selva', 'Silvano Morales'). Depending on culture it is not necessarily the name of the parents nor common to the married couple and children, thus 'family name' should be avoided even though used in vCard. (vCard:N.Family) Prefix to name that should be output before name, but is usually not included in sorting. Examples: 'Prof.', 'Dr.', 'von', 'Lord'. (vCard:N.Prefix) Suffix to name that should be output after name, regardless whether it is in sorting sequence (Inherited, Given) or not. Examples: 'Jun.', 'III.'. (vCard:N.Suffix) The name given to a person as a personal name (= first or christian name in western cultures, including 'middle initials') may contain several words ('Ana Maria', 'Jerry B.'). Applicable only to persons. (vCard:N.Given + vCard:N.Middle) May differ from the first given name: second given name, nickname ('Bob' for 'Robert'), etc. (vCard:Nickname) ### To be decided! PROPOSAL 2: Name-variant structure @@ Seq. temporarily made optional @@ Preferred version of complete name in forward sequence as defined by the culture of the name-bearer. Use case: reporting. Examples: 'Maria Amália Mourinha Duarte' (pt), 'Ana Maria Pina de Morales' (es), A version of the name in forward sequence used in informal usage. Use case: reporting. Example: 'Bob Morris' for 'Prof. Dr. Robert Morris', 'Amália Mourinha Duarte' (pt), 'Ana Pina de Morales' (es). ### To be decided! Proposal 1: ABCD-style single string Contact address. Each element should be one address; do not use multiple elements for each line! (vCard:Adr.POBox + .ExtAdr + .Street + .Locality + .Region + .PCode + .Ctry) [ATTR: language, preferred (vCard:Pref)] @@vCard defines further attributes: Home/Work, Postal/Parcel, Dom/Intl Also, vCard atomizes the address, see proposal 2 below. Perhaps at least the country should be specified in ISO 2-letter codes? ### To be decided! Proposal 2: Similar to ABCD-style, but using UDDI-style address lines Contact address. (vCard:Adr.POBox + .ExtAdr + .Street + .Locality + .Region + .PCode + .Ctry) [ATTR: language, preferred (vCard:Pref)] Address line ### To be decided! Proposal 3: model following the atomized vCard fields 1:1. (vCard:Adr.POBox) (vCard:Adr.ExtAdr) (vCard:Adr.Street) (vCard:Adr.Locality) (vCard:Adr.Region) (vCard:Adr.PCode) (vCard:Adr.Ctry) @@vCard defines further attributes: Home/Work, Postal/Parcel, Dom/Intl Abstract base type for AgentRef and MicroAgent. The ref attribute is optional here! Reference to a Agents (ExternalDataInterface/Agents/Agent) Provides a minimalized local Agent definition together with an optional Agent reference (ref attribute). In principle this is derived from AgentRef, but to properly do it Person or role name (e. g., 'head of departement') (voice phone) Defines an element with a required ref attribute pointing to an Agent (ExternalDataInterface/Agents/Agent) Makes the optional base type attribute required. A collection of AgentRef-type elements, i. e. Agents forming a team like an author team. (The sequence of elements in instance documents is informative!) [ATTR: ref] --- The following types build on the AgentProxy infrastructure: Extension of AgentRef with a role attribute and three attributes recording object-specific contributions. The first time an agent (creator or contributor) has edited/made a contribution to an object. If a creator has contributed both as an author and later as an editor of data, two references in these two roles will exist and the contribution dates will be recorded separately. The number of contributions by a specific agent (editing, revising, adding to an object). A collection of RichAgentRef elements. (The sequence of elements in instance documents should be preserved. Within each role it is mandatory. Different roles may, however, be reported in separate sequences.) [ATTR: ref, role, ...] Restriction of RichAgentRef to Creator roles only. Collection (sequence) of Agent elements of type CreatorRef (The sequence of elements in instance documents should be preserved. Within each role it is mandatory. Different roles may, however, be reported in separate sequences.) [ATTR: ref, role, ...] Restriction of RichAgentRef to either Creator or Contributor (but not Owner) roles. Collection (sequence) of Agent elements of type CreatorContributorRef (The sequence of elements in instance documents should be preserved. Within each role it is mandatory. Different roles may, however, be reported in separate sequences.) [ATTR: ref, role, ...] Restriction of RichAgentRef to Contributor roles only. Collection (sequence) of Agent elements of type ContributorRef (The sequence of elements in instance documents should be preserved. Within each role it is mandatory. Different roles may, however, be reported in separate sequences.) [ATTR: ref, role, ...] Restriction of RichAgentRef to Owner roles only (contribution attributes prohibited). Owners and IPRStatements (incl. identity constraints). Entities having legal possession of the data collection content. Owners are defined only for the entire data collection, not for individual descriptions etc. (= http://www.loc.gov/ marc.relators/own) Copyright, terms of use, license and other IPR-related statements like disclaimer or acknowledgement. Giving a copyright statement and a (if possible public) licence is highly recommended! (=DC.Rights) [ATTR: language] The Language values must uniquely identify the Representations within IPRStatements. Collection (sequence) of Agent elements of type OwnerRef (The sequence of elements in instance documents should be preserved. Within each role it is mandatory. Different roles may, however, be reported in separate sequences.) [ATTR: ref, role] --- Note: A modeling problem is that in instance documents Agents within a role are usually ordered (sequence), but different roles not (authors+editors = editors+authors). UBIF 1.0 until beta 14 (available on WIKI!), attempted to solve the problem by introducing a 2-layer collection with Creators/AgentRole[@role='aut']/Agent[@ref]. Now this has been abandoned because it introduced too much complexity. --- types related to Agent references: A collection (seq) of name strings, used for publication authors or editors and for collectors, i. e. whenever the identity of an Agent is doubtful and can not associated with an Agent without doubt Authors or Editors expressed only as string, e.g. in publications where the identity of creators can often not be discovered. Optionally, the ref attribute may refer to an agent if the relation between string and Agent can be assessed. (The sequence of elements in instance documents is informative!) [ATTR: ref] Reference to a Agents (ExternalDataInterface/Agents/Agent) RevisionData (creators, dates, revision) for the entire project/data set or individual objects. If RevisionData exist at all, at least one creator(author or editor) is required. (= DC.Creators) General contributors, or translators. (= DC.Contributors) @@Request for discussion: Translator-Contributors are currently not listed on individual Representation elements. Only a general statement about all translations together can be made. Should this be changed? Also: should one Representation be marked as 'Original/ SourceForTranslation'? @@ Date/time when the intellectual content (project, term, description, etc.) was created. Applications may initially set this to the system date for new data objects, but authors must be able to change it to an earlier date if necessary. If for legacy data this is imprecisely known, it may be missing here. Earlier versions in other data formats should then be mentioned in the copyright or acknowl. statements. (= DC.Date.Created) Date/time when the last modification of the object was made. If in online data sources the provider can not assess this, the current date/time may be substituted. For legacy data this may be set to the file date of imported data, or estimated. (= DC.Date.Modified) Intended to be a rough estimate by authors/editors rather than exact statements. RevisionStatus refers primarily to the correctness of data already entered, but includes an estimate of completeness relative to the stated scope (e. g. taxonomic or geographic scopes in the project definition). However, if the project goal is to describe the frequent species of a group, the project status may be 'FullyRevised' even if many other species in the group are missing. === Geography: Used for resources like geographical names or places. Provides either a simple free-form text, or a connection to an external resource. @@ Problem: in contrast to class names, and even publications, locality names are necessarily language-specific! Extensions of ProxyBase specific to LocalityProxy Geographical coordinates (decimal longitude/latitude) of the location from which the object or observation was collected. Includes geodetic datum and an optional verbatim text representation. Defines an element with a ref attribute pointing to a Locality (ExternalDataInterface/Geography/Locality) A collection of LocalityRef-type elements. The sequence of elements in instance documents is semantically irrelevant and may be changed. Reference to a locality defined in ExternalDataInterface/Geography/Locality [ATTR: ref] === Media (especially images, audio/video): Extends resource proxy type with optional encoded data content (esp. images embedded in xml document) and with a Type (Image/Audio/Video, etc.). Extensions of ProxyBase specific to MediaResourceProxy Type of medium, based on DCMI Type vocabulary (= DC.Type) An optional caption for a resource, esp. if it will be presented embedded in another document. Captions can be provided in multiple languages. Differs from the resource Label, wihich is closer related to a 'title'. @@ Issue: captions, even in multiple languages, may be obtained from the service provider. Even then it may be desirable to override them! Do we need two collections: InheritedCaption and CaptionOverride? This seems to be awkward whenever there is no ServiceProvider! Also, Label can contain a "title" only in a single language! @@ Creators, Revision status, and dates for the media resource Optionally the full resource data may be embedded (as an alternative or in addition to defining a URI). Note: A resource like an image should be directly encoded, i.e. not wrapped into a MIME object first. Defines an element with a ref attribute pointing to a MediaResource (ExternalDataInterface/MediaResources/MediaResource) A collection of MediaResourceRef elements. The sequence of elements in instance documents is semantically relevant and should be preserved. (the sequence in instances is informative!) [ATTR: ref] [Not yet used] A media resource element embedded in a group is provided solely to allow reuse together with the necessary identity constraints for the ref attribute. Limitations of xml Schema prevent the definition of identity constraints on the MediaResourceReftype itself. (the sequence in instances is informative!) [ATTR: ref] === Measurement units: Provides an extensible definition mechanism for measurement units like meter, mm, µm, liter/litre, °C, m/s, etc. May also be used dimensionless scaling factors like %! Label contains a language/culture- specific long form of the measurement unit, e. g., 'liter' (en-us) or 'litre' (en-uk) for 'L.' Label and InternationalAbbreviation text allow some xhtml formatting to support, e. g., "mm2". Note: "International Standard ISO 31 (Quantities and units), 1992 may be relevant here, but it seems not available online. Printed version: ISO Standards Handbook: Quantities and units. 3rd ed., International Organization for Standardization, Geneva, 1993, 345 p., ISBN 92-67-10185-4, 182.00 CHF. A useful online resource is http://hem.fyristorg.com/ojarnef/fys/ metric-units-comp.txt Extensions of ProxyBase specific to MeasurementUnitProxy A scientific abbreviation considered language and audience independent. It may contain formatting to express "mm2". Note that the Abbreviation element available in most label types does not support formatting! True if unit is SI unit or a derived unit acceptable in scientific publications. False for local/historical units like feet and velocity in fathoms per fortnight :-). True indicates that unit should be output before the value (as in 'pH 7.0'). Default is false. Describes relations to other units that can be expressed through a simple multiplication factor (i. e. not cubic meter = meter * meter * meter, or Celsius to Fahrenheit) @@ Do we really need multiple relations or is a single relation to the base unit sufficient? @@ Ideally the relation should always be defined towards the base unit, e. g., km, cm, mm, µm all to meter. Multiply current unit with this factor to obtain related unit referenced above. Refers to a MeasurementUnit (attribute ref is required) Abstract base type for MeasurementUnitRef and MicroMeasurementUnit. Here the ref attribute is optional! ref refers to a measurement unit id (Terminology/General/MeasurementUnits) Provides a minimalized measurement unit identified through a local (and presumably international) abbreviation - together with an optional Measurement Unit proxy reference (ref attribute). A scientific abbreviation considered language and audience independent. It may contain formatting to express "mm2". === Public objects carrying a key also generally provide for developer annotations/comments (undefined language), version extensions for future versions of UBIF, and custom extensions (= "application annotations"). Version extension (Ext), CustomExtensions, and Annotation/comment free-form text. Internal notes/management comments (not multilingual). Annotations should be displayed only in a 'designer' or 'revision' mode' and are expected to be invisible to users who only want to consume or apply the data. They are appropriate for rough, unedited comments, but should not contain confidential information. Extension mechanism to implement forwards compatibility in a new version of the standard (i. e. old applications can process newer data versions; compare backwards compatibility using optional elements anywhere). Community extension mechanism, e. g., for application-specific data. To allow forward compatible extensions of UBIF and derived schemata, an extension container for the target namespace is provided for the use by the designers of the schema. Only the developers of the standard namespace may place elements here! This provides an extension mechanism to the standard model that may be used, e. g., to store application-specific data. Recommendation: UBIF applications that both import and export data may implement the loss-less round-tripping of data. The information of all imported custom extensions, even if those that are not interpreted, should be preserved as string and later exported unchanged. Each custom extension contains xml content defined in another namespace. This may either be application-specific, or several applications may agree on common custom extensions. [ATTR: name, version] The content or CustomExtension is not further validated against a schema by validating xml processors. However, it must be well-formed xml and it is not possible to directly store a text string (content model mixed="false"). Identifier chosen by the target application(s) for which the content in the extension container is intended. The only purpose of this attribute is that application(s) generating a type of custom extension recognize the target identifier, while other applications just pass this through. Optional information about which version of the custom extension definition has been used. === Key/ref infrastructure for linking within a data set: This allows to define (and redefine) the value type for keys and keyrefs Note: the use of attribute groups instead of globally defined and referred attributes is a work-around for problems occurring with attribute definitions in included library schemata. The use of global attributes by ref caused validation or namespace problems, even though this library has no target namespace (chameleon pattern); Spy 2004.4 says, e. g., ... attributes that need to be qualified because your schema uses attributeForm = qualified or global attributes. You must specify a prefix for your schema namespace. An optional attribute to add a human-readable equivalent to the numeric primary identity key, intended to simplify debugging. The attribute can be discarded or updated at any time. Applications should not produce exports containing this attribute, instead it can be generated using xslt (based on labels/abbreviations). An optional attribute to add a human-readable equivalent to the numeric ref to simplify debugging. A debugref always points to the associated debugkey. The attribute can be discarded or updated at any time. Applications should not produce exports containing this attribute, instead it can be generated using xslt (based on labels/abbreviations reached through key/ref). === Options to link using URLs or GUID + resolving mechanisms (used especially for UBIF data proxies): The object linking mechanisms used by the ProxyBase type may also be used by other objects! LifeScience ID (without the constant prefix 'urn:lsid:'). 3 to 4 parts separated by colon, the 1st part is the url of a life science authority service that provides metadata on how to obtain the object references in part 2 (namespace = data collection), 3 (object ID) and 4 (optional object version). Example: lsid.gbif.org:DataCollectionID:ID/1§31~b+:v2 Digital object identifier (an ID scheme advanced by the library community). A URL directly providing an object representation. In contrast to the URN types LSID or DOI this should resolve directly. The URL may be a query string (with ID embedded), for example: "http://x.y.fr/pub/au=smith?yr=1998". In the case of URLs multiple definitions may be defined to reduce the likelihood of failure. [The element sequence in instance documents is informative and should be preserved.] === Basic type library: === Basic generic types normalized string required to contain at least 1 character (this removes the xml string anomaly, i. e. either element/attribute may be optional, but if they are required the content may not be an empty string) normalized string restricted to 1..50 character length to be used for abbreviations (the recommended length of abbreviations is usually much shorter, but 50 characters should be a normalized string restricted to 1..255 character length (i. e. required, may not be empty string) Double precision numeric value in the range of [0..1] Colors defined as RGB (red-green-blue) values combined as hex-encoded into a string, like in html. Example: #EE88FF. Colors may also be expressed as HSV (hue-saturation-luminance), but this is convertible to RGB. RGB is preferred because it is used in HTML. Html also allows a shortend version with only 3 hexadecimal values. A pattern supporting both would be: #(([0-9]|[a-f]|[A-F]){3}|([0-9]|[a-f]|[A-F]){6}) Derived string type with restricting patterns Life Science ID (= string restricted by a regular expression pattern). Annotation of the pattern: 5 to 6 parts separated by colon 1. The string URN (case-insensitive) 2. The string LSID (case-insensitive) 3. AuthorityID = DNS token with at least 2 parts plus a top-level domain with 2-5 characters (case-insensitive) (In earlier LSID specs this was assumed to be a DNS name; the final spec. however says: "The authority identification is usually an Internet domain name. In this case it is recommended that it be owned by the organization that assigns an LSID in question. Such organization is responsible for ensuring the uniqueness of the string created from the namespace, object and revision identifications. In the case where the authority identification string is not an Internet domain name, the authority should take care to ensure that it is a unique string and if possible, register that unique string with the organization that is currently the authority for the URN Namespace Identifier (NID) "lsid"" 4. Data collection identifier/namespace: non-whitespace characters except colon (case-sensitive) 5. Object ID: non-whitespace characters except colon (case-sensitive) 6. Object version (optional): non-whitespace characters except colon (case-sensitive) Earlier, more specific specs at http://www.i3c.org/wgr/ta/resources/lsid/docs/LSIDSyntax9-20-02.htm had more restrictions on authority (DNS name!) and fewer characters beyond US-ASCII and digits. A pattern matching the earlier spec. was (extended with "local"): pattern value="[Uu][Rr][Nn]:[Ll][Ss][Ii][Dd]:((local)|(([0-9A-Za-z\-]+\.){2,}[A-Za-z]{2,5}))(:[0-9A-Za-z][0-9A-Za-z\(\)\+,\.=;$" _!\*'\-]+){2,3}" Example: urn:LSID:www.gbif.net:DataCollection.Namespace:ID/$17+731_b:v2.1 Compare LSID, this omits the prefix 'urn:lsid:' Digital Object Identifier (standalone, not embedded into URI syntax) Pattern based on http://www.doi.org/handbook_2000/enumeration.html#2.2 which states that all DOIs start with "10." then a free prefix, then "/", then suffix. An additional constraint not expressed here but possible to implement would be that in the Appendix the pattern "\S/" (single character followed by slash) for the suffix (i.e. after the first slash) is reserved for future extensions. String containing a format pattern of the type used in the xslt format-number function A generic or higher taxon name (monomial) under the bacteriological, botanical, viral, and zoological code, with a pattern to fulfill the following rules: a) First character must be upper case [A-Z]; b) Second and following characters must be lower case [a-z], i.e. without accentuation but with e diaresis ("ë") being allowed as an exception in botany; c) From third character on, a hyphen may occur as well. Note that Genus hybrid flags are expected to be stored separately! Based on ABCD, S.Blum 12/2002. W.Berendsohn 12/2003. The rules above should apply to generic names under all codes; if an exception is discovered, the change in constraints should be implemented as an extension [SB]. Note that a maximum length of 255 characters is stipulated to simplify the design of persistent databases [GH]. Notes regarding the admission of ë and hyphen (only for botany):
ICBN St. Louis: Art. 60.6. Diacritical signs are not used in Latin plant names. In names (either new or old) drawn from words in which such signs appear, the signs are to be suppressed with the necessary transcription of the letters so modified; for example ä, ö, ü become, respectively, ae, oe, ue; é, è, ê become e, [...]. The diaeresis, indicating that a vowel is to be pronounced separately from the preceding vowel (as in Cephaëlis, Isoëtes), is permissible.
Bacteriology: Diacritic signs are not used in names or epithets in bacteriology [Rule 64].
ICZN, Article 11: "Mandatory use of Latin alphabet... a scientific name must when first published have been spelled only in the 26 letters of the Latin alphabet; the presence of diacritic marks, apostrophes, diphthongs or the additional letters of the Scandinavian alphabet does not render the name unavailable, but marks must be removed, diphthongs separated and the Scandinavian letters transliterated. " Also: digits or symbols must be spelled out in latin, hyphenation must be contracted.
The pattern should prevent a hyphen as the last character! Two hyphen in a row are still possible, but considered irrelevant. Example: "Epichloë".
A specific or infraspecific epithet name string under the bacteriological, botanical, viral, and zoological code, with a pattern to fulfill the following rules: a) contains only lower case characters [a-z] or e-diaresis (ë). Not that this data type can not be used for cultivar names, which may contain blanks and accented or other letters. The pattern should prevent a hyphen as the last character! Two hyphen in a row are still possible, but considered irrelevant. Example: "vitis-ideae". === The following Range, Date, and Coordinate types describe frequently recurring simple type combinations in a element with attributes -- Element with 2 attributes to define a range: Lower and upper value as required attributes (no default values) Lower and upper probability value as required attributes (no default values) Contains lower/upper estimate attributes; used, e. g., for certainty and frequency! The default values are 0 and 1, indicating that no estimate was possible. -- RGB color polygon expressed as a list of RGB values (these should form a single polygon when connected, which is not validated in the schema!) A single color value or a color polygon defining an area in color space (i. e. not a spatial polygon having a color!) A single point in color space, or multiple points forming vertices of a polygon area in color space. When using a polygon this defines an estimated color range into which the single or variable true color values of the object fall. -- Types for composite gregorian calendar date/time (points in time where parts may be missing; following the seven property model described, e. g., in xml Schema 1.1 (http://www.w3.org/TR/2004/WD-xmlschema11-2-20040716/#theSevenPropertyModel). Instead of gYear, gMonth, gDay integer types with constraining facets are used for two reasons: a) each of them may have a timezone, which may lead to inconsistent data with multiple timezones; b) the lexical representation seems to be occasionally poorly implemented (e.g. where '31', or '---5' are accepted, whereas valid examples are '---31', '---05', and '---05+02:00'). In addition to the seven property model additional text attributes for either unsharp additions or complete verbatim dates are added. Note that incomplete dates in most cases are calendar specific and incomplete non-gregorian dates can not be expressed. Furthermore, for complete dates it may be unclear whether a reformed or unreformed date has been used (e.g. in Russia in the 19th century). Date separated into attributes so that any part of the date may be missing [ATTR: year = four digit year; month = two digit month of year; day = two digit day of month verbatim = unparsed textual date representation supplement = text additional or modifying the exact dates, e. g., 'end of summer', 'first half or year', 'first decade of month', '1888-1892'. timezone = expressed as integer according to the xml schema seven parameter model The four digit year in the Gregorian calendar (in Western cultures usually without a suffix or with 'AD/Anno Domini', 'CE/Common Era'; negative years with 'BC/Before Christ', 'BCE/Before Common Era'). Whether a year 0 is used or not differs between a true Gregorian calendar and recent astronomic usage, xml schema is likely to change its position, see xml schema draft 1.1. Thus database designers should not use 0 as a missing value representation for year. two digit day Text in addition to or modifying the exact date components, e. g., 'end of summer', 'first half or year', 'first decade (of month)', '1888-1892'. An uninterpreted text representation of the original date information (date range, 'summer', perhaps unreformed Russian dates, etc.); as close as possible to the (digital/printed/handwritten) information source. Timezone expressed in minutes. In the seven property model (http://www.w3.org/TR/2004/WD-xmlschema11-2-20040716/#theSevenPropertyModel) the timezone has a range of +/- 14 hours (14 * 60 = 840 minutes). Date + Time separated into attributes so that any part of the date may be missing. [ATTR: see CompositeDate type, plus: time] '24' may only occur if both minute and second are zero (http://www.w3.org/TR/2004/WD-xmlschema11-2-20040716/#theSevenPropertyModel). The normal range should be 0-59, but 60 may occur for UTC leap-seconds (http://www.w3.org/TR/2004/WD-xmlschema11-2-20040716/#theSevenPropertyModel). An additional validator may choose to validate this. The simplest validation would attempt to convert those Composite date instance that containing all seven elements to a xs:dateTime value. -- Types for geographical coordinates Latitude of geographical coordinates in decimal degrees (i.e. 30° 30' would be expressed as 30.5) Longitude of geographical coordinates in decimal degrees (i.e. 30° 30' would be expressed as 30.5) ATTR: latitude, longitude (in decimal degrees), geodeticdatum (esp. if different from a Greenwich-based datum). Longitude is expressed from -180 to 180°, East longitude being plus and West longitude being minus. Where knowledge of the geodetic datum is readily available it should be passed on. However, in most situations no undue resources should be invested into researching the geodetic datum when this is unknown. Many geodetic datum systems result in differences only up to a 100 m, some up to several hundred meters. For many purposes in biodiversity sciences are acceptable. The 'World Geodetic System 1984 (WGS-84)' is the most commonly used geodetic datum. It is used, e. g., by the 'Global Positioning System (GPS)'. Other important systems are used (e. g., ITRF, ETRS89, NZGD2000, OSGB36, ED50, see also http://www.ncgia.ucsb.edu/education/curricula/giscc/units/u015/tables/table03.html or http://www.colorado.edu/geography/gcraft/notes/datum/edlist.html). The differences between WGS-84 and International Terrestrial Reference Frame (ITRF) are in the centimeter range worldwide, and ETRF 89 and NAD 83 are identical to WGS84 for Europe and North America, respectively. -- As an exception to what has been said above are historical coordinates (for most countries up to ca. 1900, much later for France) may be based on a prime meridian other than Greenwich/Airy (e. g., the NTF datum uses Paris as its prime meridian, 2.33723° east of Greenwich). An uninterpreted text representation of the coordinate data (latitude/longitude, UTM, TRS, etc.), as close as possible to the (digital/printed/handwritten) information source. === Various complex types Three attribute provide options to express sex as code (enumerated vocabulary), free-form text (perhaps interpreted), or verbatim (uninterpreted original version). At least one attribute should be present; this can not be validated by the schema. Controlled vocabulary to express sex status for clinical human or biological purposes. The string present in the source database, either in addition to or instead of code (especially no mapping to the controlled vocabulary has been implemented yet, or if a specific value can not be mapped. This differs from verbatim in that it claims no special status and may contain any amount of interpretation relative to the original source (e. g., a specimen label) An uninterpreted text representation of the original sex information; as close as possible to the (digital/printed/handwritten) information source. Telephone, fax, etc. number ATTR: number = should be provided in the ITU Recommendation E.164 international format ("+CountryCode AreaCode Number") (vCard:Tel.Number) ATTR: devicetype = voice, fax, mobile, pager, modem (identical with vCard:Tel.Voice etc.; if several flags apply to a single phone number list the phone number multiple times!) ATTR: usagenote = free-form text for constraints on use e. g. "weekdays only" or "home number" (partly: vCard:Tel.Home/Work flags) ATTR: preferred = preferred number, may occur multiple times for different device types (vCard:Tel.Pref) Numbers should be provided in the ITU Recommendation E.164 international format ("+CountryCode AreaCode Number"). Note that telephone device types are not necessarily exclusive (voice/fax, mobile/modem, etc.) and vCard 3.0 allows multiple for a single number. However, in UBIF this can be represented by adding a single number multiple times for each device type. This attribute should not have a default value voice, even though this is the most likely case. However, an exporting database may not have properly reported the type, or the type may be indicated only in the usage note. Free-form text for constraints on use e. g. "weekdays only" or "home number" (partly: vCard:Home/Work flags) === Extension of xs:language and a reference element using Language Union of xs:language with '-' for language-neutral (e.g. scientific names) and '?' for unknown. Language follows RFC 3066 'Tags for the Identification of Languages': a two-letter code taken from ISO 639 part 1 or a three-letter code taken from ISO 639 part 2, followed optionally by a two-letter country code taken from ISO 3166. (Notes: When a language has both a two-letter and three-letter code, use the two-letter code. RFC 3066 replaces RFC 1766.) Defines an element with a required 'language' attribute Complex types that add attributes 'language' or 'preferred' to the simple types String, String255, anyURI: Note: the use of attribute groups instead of globally defined and referred attributes is a work-around for problems occurring with attribute definitions in included library schemata. (single 'language' attribute) Attribute for Language, used by-reference (single 'language' attribute) Attribute for Language, used by-reference (single 'preferred' attribute) Elements with preferred = true indicate recommendation by the data provider. The consumer may have reasons to make a different choice. Note on current usage: these types are used by ABCD and UBIF, but not by SDD (which uses mostly audiences instead of language) String (i. e. xs:string with minimum length=1) extended with *optional* language attribute String255 (i.e. xs:string with length 1-255), extended with *optional* language attribute String (i. e. xs:string with minimum length=1) extended with *optional* preferred attribute String255 (i.e. xs:string with length 1-255), extended with *optional* preferred attribute String (i. e. xs:string with minimum length=1) extended with *optional* language and preferred attributes String255 (i.e. xs:string with length 1-255), extended with *optional* language and preferred attributes xs:anyURI extended with *optional* Preferred attribute === Some text data support limited xhtml. (Could appropriate elements from xhtml be imported and encapsulated here?) Collection of language-specific label representations Language-specific label representation [ATTR: language] Language-specific simple label, using simple formatted text Label text in a specific language. Restricted to 50 characters maximum length, including blanks (recommended to be shorter!). Label abbreviations are especially important when displaying information in a tabular format. Collection of language-specific label representations Language-specific label representation [ATTR: language] LabelRepr with short inherited Text extended with longer Details text. Optional text of unconstrained length, elaborating details of the ShortText Text with primary language plus multiple optional translations; used, e. g., in PublicationProxy type. A string, e. g. the title of a publication, having a single primary language. [ATTR: language] Translations from the primary language [ATTR: language] === Statements are a special form of complex text expressions Text, optional Details (both free-form text) and optional URI. A concise representation of a statement (copyright, acknowledgement, etc.). Recommended to be as short as possible, but actual length is unconstrained. Optional text of unconstrained length, elaborating details of the ShortText An optional resource on the net providing details on the statement (may be used as an alternative to the long text). A sequence of various intellectual property right (= IPR) statements, with a language attribute on the entire sequence. Other forms of IPR declaration not yet covered (e.g., database rights); also used in cases where an automatic converter can not decide whether a statements is copyright, licence, etc. Copyright may include the information that the data has been released to the public domain. To be used if data are placed under a public license (GPL, GFDL, OpenDocument). Placing data under a public license while maintaining copyright is recommended! (= DC.Rights.Licence; new 2004) Defines conditions under which the data may be analyzed, distributed or changed. "Terms of use" includes concepts like "Usage conditions" and "Specific Restrictions". Disclaimer statement, e. g. concerning responsibility for data quality or legal implications. A free form text acknowledging support (e. g. grant money, help, permission to reuse published material, etc.) === The following types are currently unused (August 2004), but may be used in the future or by other standards. [Unused!] Valid states are true, false, and default. A name whose only value is "default", used for union definitions. [Unused!] Valid states are true, false, and default. A name whose only value is "default", used for union definitions. === Enumerations to support interoperability: Internal formatting note: Annotations of individual enumerated values should be written as ^"short label" + " -- " + "detailed information". An xslt transforms such schema annotations into a data document that can directly be used in user interfaces. -- a) Generic enumerations Revision Status is applied to the entire data collection as well as to individual objects (a specimen, a class description, etc.). Exact semantics are only defined for the first and the last category. The semantics of the intermediate (level 1 to 5) may be chosen freely by the user, but the relative position should be maintained. If, for example, three revision steps are planned (2 intermediate, reaching FullyRevised on third), it is recommended to use RevisionLevel2, RevisionLevel4, FullyRevised. Unrevised -- The data have been input, but no separate revision was performed. Revision level 1 of 5 -- For example, in a collection less than ca. 20 % of the data are revised, or on a single object only a plausibility check has been performed. Revision level 2 of 5 -- For example, in a collection ca. 41-60 % of the data are revised, or on a single object the data are compared carefully with the source. Revision level 3 of 5 -- For example, ca. 41-60 % of the data are revised. Revision level 4 of 5 -- For example, ca. 61-80 % of the data are revised. Revision level 5 of 5 -- For example, more than 80% revised (but not yet completed). Revision completed -- This does not necessarily imply that the data are complete in a scientific sense. They are completely revised only under the available time and the goals set for the project. Restricted to integer values from 0 to 5. 0 is defined as unspecified level, and 1 to 5 indicates expertise from schoolchildren to taxonomic expert. See the description of the values for recommendations for interpreting and choosing the expert level. 0 = Unspecified expertise level -- Use this if the expertise level of can not be assessed (e. g. when exporting data) or is considered irrelevant. 1 = Elementary school (year 1 to 6) 2 = Middle school (year 7 to 10) 3 = High school (year 11 above) and general public -- When addressing this level specialized terminology or jargon should be avoided. 4 = University students or (partly) trained staff -- This level uses specialized terminology, but avoids or explains problematic terms. 5 = Experts -- This level uses the full range of terminology This enumeration is identical with the DCMI Type Vocabulary (http: //dublincore.org/documents/dcmi-terms/, as of 6/2004), except that an additional type "Other" has been added. Its purpose is to provide a framework of broad media or resource type terms, without the technical detail provided by the large number of MIME types. The annotations are largely based on those from the DublinCore metadata initiative vocabulary. Collection -- A collection is an aggregation of items. The term collection means that the resource is described as a group; its parts may be separately described and navigated. Dataset -- A dataset is information encoded in a defined structure (for example, lists, tables, and databases), intended to be useful for direct machine processing. Event -- An event is a non-persistent, time-based occurrence. Metadata for an event provides descriptive information that is the basis for discovery of the purpose, location, duration, responsible agents, and links to related events and resources. The resource of type event may not be retrievable if the described instantiation has expired or is yet to occur. Examples - exhibition, web-cast, conference, workshop, open-day, performance, battle, trial, wedding, tea-party, conflagration. Image -- An image is a primarily symbolic visual representation other than text. For example - images and photographs of physical objects, paintings, prints, drawings, other images and graphics, animations and moving pictures, film, diagrams, maps, musical notation. Note that image may include both electronic and physical representations. Interactive Resource -- An interactive resource is a resource which requires interaction from the user to be understood, executed, or experienced. For example - forms on web pages, applets, multimedia learning objects, chat services, virtual reality. Moving Image (Video) -- A series of visual representations that, when shown in succession, impart an impression of motion. Examples of moving images are: animations, movies, television programs, videos, zoetropes, or visual output from a simulation. Comment: Instances of the type "Moving Image" must also be describable as instances of the broader type "Image". Physical Object -- An inanimate, three-dimensional object or substance. For example -- a computer, the great pyramid, a sculpture. Note that digital representations of, or surrogates for, these things should use Image, Text or one of the other types. Service -- A service is a system that provides one or more functions of value to the end-user. Examples include: a photocopying service, a banking service, an authentication service, interlibrary loans, a Z39.50 or Web server. Software -- Software is a computer program in source or compiled form which may be available for installation non-transiently on another machine. For software which exists only to create an interactive environment, use interactive instead. Sound -- A sound is a resource whose content is primarily intended to be rendered as audio. For example - a music playback file format, an audio compact disc, and recorded speech or sounds. Still Image -- A static visual representation. Examples of still images are: paintings, drawings, graphic designs, plans and maps. Comment: Recommended best practice is to assign the type "text" to images of textual materials. Instances of the type "Still Image" must also be describable as instances of the broader type "Image". Text -- A text is a resource whose content is primarily words for reading. For example - books, letters, dissertations, poems, newspapers, articles, archives of mailing lists. Note that facsimiles or images of texts are still of the genre text. Text may contain embedded still image illustrations, e. g., formatted html pages. Other -- Use this category if the resource does not seem to fit into any of the categories provided above. Kind of phone number: voice, fax, mobile, pager, modem. These enumerated values are identical with vCard 3.0 flags (several of which can be added to a single phone number; to represent this in the UBIF interface duplicate the phone number itself!) voice phone number fax number mobile phone number modem number pager number Enumeration restricted to integer values from 1 to 5, indicating an arbitrary rating (meaning, e. g., 1 = disagree strongly, 2 = rather disagree, 3 = neutral or undecided, 4 = rather agree, 5 = agree strongly). This enumeration is of limited usefulness and could be replaced by an restriction on integer, but using the enumeration the semantics of agreement/disagreement or positive/negative rating can be communicated in a culture-neutral way (in German 1 is generally considered best and 5 worst, in English 1 worst, 5 best...). Aufzählung von ganzzahligen Werten zwischen 0 u. 5 für beliebige Bewertungsskalen. Beispiel für Interpretation der 5 Werte: 1 = ablehnend, 2 = eher ablehnend, 3 =neutral oder unentschieden, 4 = eher zustimmend, 5 = zustimmend) 0 -- Undecided (not yet rated). 1 -- For example, "disagree strongly", or "very poor". 2 -- For example, "rather disagree", or "poor". 3 -- For example, "neutral", "average", "undecided". 4 -- For example, "rather agree", or "good". 5 -- For example, "agree strongly", or "very good". Values are ltr (left to right), rtl (right to left). Compare CSS2 and the XHTML 2.0 bi-directional text module. Note: A future UBIF version may also include lro/rlo = left-right-overide/right-left-overide, if this is found to be necessary. [ltr] -- left-to-right text direction (e.g., English) [rtl] -- right-to-left text direction (e.g., Arabic) In statistical analysis it is often vital to know some basic properties of the values that are being analyzed. Some of these properties can be summarized in the form of a measurement scale. Higher scales can always be analyzed under the assumptions of a lower scale (ordinal data can be analyzed as nominal, ratio as interval). Those values from StatisticalMeasurementScaleEnum addressing numerical data ('ratio' and 'interval'). Note: Occasionally "integer" or "cardinal" (versus real numbers) are also considered part of the measurement scale. This should be avoided because: a) All combinations of interval/ratio and discrete/continous are possible. b) The important distinction is whether a measurement is based on a continuous or discrete scale. Although in most cases this is equivalent with integer versus real numbers, it is not necessarily so. An ANOVA will report false significance not only when values come from "1, 2, 3 and 4", but also when they come from "1.2, 2.4, 3.6 and 4.8". interval -- real numeric (= floating point) values, where 0 is an arbitrarily defined point. As a consequence, ratios are undefined and only the intervals between values can be analyzed. Example: Temperature in °C or °F. ratio -- real numeric (= floating point) values (DELTA: type 'RN'), where 0 is an objective point and ratios can thus be analyzed. Example: length measurements. Most measures belong into this category and it is acceptable to assume the 'ratio' scale when importing DELTA legacy data. Those values from StatisticalMeasurementScaleEnum addressing categorical data ('nominal' and 'ordinal'). nominal -- unordered categorical states (DELTA character type 'UM') ordinal -- ordered categorical states (DELTA character type 'OM'). Unless a separate tree defines more specific ordering, the order is assumed to be linear in the sequence in which the categories are enumerated in their definition. -- b) Statistical categories Note: No satisfying external ontology for statistical methods could be found; the statistics section of MathML 2.0 (statistics.xsd) seems strangely incomplete! An enumeration of univariate statistical measures supported by UBIF (esp. used by SDD). The list is intended to be more complete than normally necessary at least in biological morphometrics. Missing measures should be requested for addition in a future version of this schema. Compare also UnivarStatMeasureWithParamEnum, containing further statistical measures that use an additional parameter (for percentage of percentile or confidence interval, etc.). [-] -- Lower range limit (human estimate) -- Free estimate made by human observer for the lower range limit (no statistical sampling and calculation was used). This method is appropriate when it is known that the values are derived from experience with the described objects (perhaps from memory) or from scanning a sample of objects and measuring those objects considered 'typical'. This method is not appropriate for single measurements or for calculations based on statistical methods (which provide exact 'statistical estimates'). Compare also the 'UnknownMethod'-methods that are provided for legacy data. [-] -- Untere Grenze (Schätzwert, ohne Berechnung) -- Freie Schätzung ohne Verwendung statistischer Probenahme und Berechnung. ObserverEstimate LowerRange false [+] -- Upper range limit (human estimate) -- Free estimate made by human observer for the upper range limit (no statistical sampling and calculation was used). This method is appropriate when it is known that the values are derived from experience with the described objects (perhaps from memory) or from scanning a sample of objects and measuring those objects considered 'typical'. This method is not appropriate for single measurements or for calculations based on statistical methods (which provide exact 'statistical estimates'). Compare also the 'UnknownMethod'-methods that are provided for legacy data. [+] -- Obere Grenze (Schätzwert, ohne Berechnung) -- Freie Schätzung ohne Verwendung statistischer Probenahme und Berechnung. ObserverEstimate UpperRange false [centr.] -- Central or typical value (human estimate) -- Free estimate made by human observer for a single central or typical value (no statistical sampling and calculation was used). This method is appropriate when it is known that the values are derived from experience with the described objects (perhaps from memory) or from scanning a sample of objects and measuring those objects considered 'typical'. It is not appropriate for single measurements nor for calculations based on statistical methods (which provide exact 'statistical estimates'). Compare also the 'UnknownMethod'-methods that are provided for legacy data. [centr.] -- Mittlerer oder typischer Wert (Schätzwert, ohne Berechnung) -- Freie Schätzung ohne Verwendung statistischer Probenahme und Berechnung. ObserverEstimate CentralMeasure false [-(?)] -- Lower range limit (legacy data) -- Lower range limit obtained by an unknown method (e. g. human observer estimate or some kind of statistical estimate). The range may, e. g., be mean plus/minus standard deviation, or a range estimate. 'Unknown' is important for legacy data where the statistical measure used is not known. If it is known that a measure is a human observer estimate rather than a defined value, the 'ObserverEstimate' methods should be used instead. UnknownMethod LowerRange false [+(?)] -- Upper range limit (legacy data) -- Upper range limit obtained by an unknown method (e. g. human observer estimate or some kind of statistical estimate). The range may, e. g., be mean plus/minus standard deviation, or a range estimate. 'Unknown' is important for legacy data where the statistical measure used is not known. If it is known that a measure is a human observer estimate rather than a defined value, the 'ObserverEstimate' methods should be used instead. UnknownMethod UpperRange false [centr.(?)] -- Central or typical value (legacy data) -- Central or typical value obtained by an unknown method (e. g. human observer estimate or some kind of statistical estimate). The central value may, e. g., be a single measurement, median, or arithmetic mean. 'Unknown' is important for legacy data where the statistical measure used is not known. If it is known that a measure is a human observer estimate rather than a defined value, the 'ObserverEstimate' methods should be used instead. [Mittl.(?)] -- Mittlerer/typischer Wert (genaue Definition unbekannt) UnknownMethod CentralMeasure false [Min] -- Minimum value -- Absolute smallest value [Min] -- Minimum -- Der kleinste beobachtete Wert StatisticalEstimate LowerExtreme false [Max] -- Maximum value -- Absolute largest value [Max] -- Maximum -- Der größte beobachtete Wert StatisticalEstimate UpperExtreme false [µ] -- Mean (= average) -- This is the normal, arithmetic mean. [µ] -- Mittelwert -- Dies ist der normale, arithmetische Mittelwert. StatisticalEstimate CentralMeasure false [hµ] -- Harmonic mean -- The harmonic mean (reciprocal of the arithmetic mean of reciprocals) is rarely used. Recommendation: if nothing specific is said about a "mean", it can safely be assumed to be an arithmetic mean. [hµ] -- Harmonischer Mittelwert -- Der harmonische Mittelwert (Inverses des Mittelwert der inverses Werte) wird nur selten verwendet. Empfehlung: wenn keine genauere Angabe zu einem Mittelwert gemacht wird kann man davon ausgehen, dass es sich um den arithmetischen Mittelwert handelt. StatisticalEstimate CentralMeasure false [gµ] -- Geometric mean -- The geometric mean (antilog of mean of logarithms) is relatively rarely used. Recommendation: if nothing specific is said about a "mean", it can safely be assumed to be an arithmetic mean. [gµ] -- Geometrischer Mittelwert -- Der geometrische Mittelwert (Umkehrlogarithmus des Mittelwertes der logarithmierten Werte) wird nur sehr selten verwendet. Empfehlung: Wenn keine genauere Angabe zu einem Mittelwert gemacht wird kann man davon ausgehen, dass es sich um den arithmetischen Mittelwert handelt. StatisticalEstimate CentralMeasure false [mode] -- Mode -- The value or value class with the highest frequency (most frequently occurring). Applicable only to unimodal distributions. [mod.] -- Modus StatisticalEstimate CentralMeasure false [med.] -- Median -- The median is the 50 % percentile, i.e. 50% of the sampled values are smaller and the rest is larger than this value. [med.] -- Median (Zentralwert) -- Der Median ist das 50% Quantil, d.h. 50% der Werte sind kleiner, und ebenso viele größer als dieser Wert. StatisticalEstimate CentralMeasure false [IQM] -- Interquartile mean (= average) -- A truncated arithmetic mean, calculated only from those values that lie between 25 and 75% of sample values. This reduces the dependency of the mean on outliers and measurement errors. [IQM] -- Interquartilsmittelwert -- Der arithmentische Mittelwert berechnet auf der Basis der in das symmetrische Intervall um den Median (= 50% der Beobachtungen) fallenden Werten. StatisticalEstimate CentralMeasure false [Var.] -- Variance (sample, df = n-1) -- Variance based on a sample; calculated with n-1 (n = sample size) degrees of freedom. This is the "normal" variance used in almost all cases. A variance is a standard deviation squared. [Var.] -- Varianz (Stichprobe, Freiheitsgrade = n-1) -- Streuung der Werte in einer Stichprobe, berechnet mit n-1 Freiheitsgraden (n = Stichprobenumfang). Dies ist die "normale" Varianz die in fast allen Fällen verwendet wird. StatisticalEstimate VarianceMeasure true [Var. (pop.)] -- Variance (population; df = n; rarely applicable!) -- Variance of population, calculated with n (= sample size) degrees of freedom. Use this if the entire population of objects has been studied. Normally conclusions about the population are based on a sample that has been studied; in this case the "normal" variance with df = n-1 is appropriate. [Var./G] -- Varianz (Grundgesamtheit, n Freiheitsgrade) -- Varianz einer Grundgesamtheit, berechnet mit n (= Stichprobenumfang) Freiheitsgraden. In den meisten Fällen macht man Rückschlüsse auf die Grundgesamtheit anhand einer Stichprobe; in diesem Fall ist die "normale" Standardabweichung mit n-1 Freiheitsgraden zu verwenden. StatisticalParameter VarianceMeasure true [s.d.] -- Standard deviation (sample) -- Standard deviation based on a sample, calculated with n-1 (n = sample size) degrees of freedom. This is the "normal" standard deviation used in almost all cases. [Std.Abw.] -- Standardabweichung (Stichprobe, Freiheitsgrade = n-1) -- Standardabweichung einer Stichprobe, berechnet mit n-1 Freiheitsgraden (n = Stichprobenumfang). Dies ist die "normale" Standardabweichung die in fast allen Fällen verwendet wird. StatisticalEstimate VarianceMeasure false [s.d. (pop.)] -- Standard deviation (population; df = n; rarely applicable!) -- Standard deviation based on the entire population; calculated with n (= sample size) degrees of freedom. Use this if the entire population of objects has been studied. Normally conclusions about the population are based on a sample that has been studied; in this case the "normal" std. dev. with df = n-1 is appropriate. [Std.Abw./G] -- Standardabweichung (Grundgesamtheit; n Freiheitsgrade) -- Standardabweichung einer Grundgesamtheit; berechnet mit n (= Stichprobenumfang) Freiheitsgraden. In den meisten Fällen macht man Rückschlüsse auf die Grundgesamtheit anhand einer Stichprobe; in diesem Fall ist die "normale" Standardabweichung mit n-1 Freiheitsgraden zu verwenden. StatisticalParameter VarianceMeasure false [m.d.] -- Mean deviation -- The mean of the absolute differences from the arithmetic mean of values. The absolute differences are the positive, unsquared differences from the mean. [Mittl.Abw.] -- Mittlere Abweichung. StatisticalParameter VarianceMeasure false [m.d.m.] -- Mean deviation from median -- The mean of the absolute differences from the median of values. The absolute differences are the positive, unsquared differences from the mode. [Mittl.Abw.] -- Mittlere Abweichung vom Median. StatisticalParameter VarianceMeasure false [CV] -- Coefficient of variation (sample) -- Standard deviation (based on a sample), divided by the mean. The values entered should not be expressed as percent, but converted to a true value (use '0.3' for 30%). According to Sokal & Rohlf 1981:59 this is a biased estimate, which may be corrected, compare 'CVC'. [VK] -- Varianzkoeffizient (Stichprobe, Freiheitsgrade = n-1) -- Standardabweichung einer Stichprobe (n-1 Freiheitsgrade) geteilt durch den Mittelwert. StatisticalEstimate VarianceMeasure true true [CVC] -- Corrected coefficient of variation (sample) -- Corrected coefficient corrected by (1 + (1/4n)). Compare, e. g., Sokal & Rohlf 1981:59'. [VK] -- Korrigierter Varianzkoeffizient (Stichprobe, Freiheitsgrade = n-1) -- Varianzkoeffizient korrigiert mit (1 + (1/4n)). Vergleiche Sokal & Rohlf 1981:59'. StatisticalEstimate VarianceMeasure true [TR] -- Total range -- The maximum value minus the minimum value. Also often called "Range" without further qualification like 'absolute', 'total'. This measure can normally be computed automatically based on minimum and maximum. It will be manually entered, if minimum and maximum are not separately cited in a publication. [TR] -- Gesamtdifferenz -- Maximum minus Minimum. StatisticalEstimate VarianceMeasure false [IQR] -- Interquartile range -- This is the length of a symmetric interval around median, containing 50% of observations. [QD] -- Quartilsdifferenz -- Dies ist die Länge eines symmetrischen Intervalls um den Median welches 50% der Beobachtungen enthält. StatisticalEstimate VarianceMeasure false [s.e.] -- Standard error of mean -- The standard error of mean is defined as: "std. dev. / square root(n)" (with n = sample size) [SF] -- Standardfehler (Mittelwert) StatisticalEstimate VarianceMeasure true [s.e.(var.)] -- Standard error of variance (of multiple samples) -- This is not a variance measure of the mean, but a measure of the variance of the variance estimates. [SF (var.)] -- Standardfehler der Varianz (bei multiple samples) -- Der Standardfehler der Varianz kann nur bei mehreren Stichproben bestimmt werden. Dies ist kein Streuungsmaß für den Mittelwert, sondern ein Maß für die Streuung der Varianzschätzungen! StatisticalEstimate Other true [Skw.] -- Skewness -- Coefficient of skewness of a distribution, a measure of the degree of asymmetry of a distribution around its mean: <0 if mode < median, =0 if symmetric, > 0 if mode > median. [Sch.] -- Schiefe -- Schiefe der Verteilung: <0 wenn Modus < Median, =0 wenn symmetrisch, > 0 wenn Modus > Median. StatisticalEstimate Other true [Kurt.] -- Kurtosis -- Coefficient of kurtosis of distribution, a measure of the "peakedness" of a distribution. A normal distribution has a value of 0.263, larger values indicate wider, smaller narrower distributions. [Kurt.] -- Kurtosis -- Exzess (Kurtosis) der Verteilung. Eine Normalverteilung hat einen Wert von 0,263, größere Werte zeigen zu breite, kleinere Werte zu schmale Verteilungen an. StatisticalEstimate Other true [n] -- Stichprobenumfang -- Die Anzahl der Beobachtungen auf denen die angegebenen statistischen Maße (Mittelwert, Standardabweichung, etc.) beruhen. [n] -- Sample size -- The number of objects studied and on which the other reported statistical measures (mean, standard deviation, etc.) are based. StatisticalParameter SampleSize true An enumeration of parameterized univariate statistical measures supported by UBIF (esp. used by SDD). [-CI{ParameterValue}] -- Lower limit of {ParameterValue}% confidence interval for mean. -- The confidence interval is defined as a range into which the true mean of the distribution falls with a certain probability. The parameter expresses the confidence level in percent. Typical values are: 99.9% (= 0.05% from left!), 99% (= 0.5% from left), 95% (= 2.5% from left), 90% (= 5% from left). [-CI{ParameterValue}] -- Untere Grenze des {ParameterValue}%-Konfidenzintervalls für den Mittelwert StatisticalEstimate LowerRange false true [+CI{ParameterValue}] -- Upper limit of {ParameterValue}% confidence interval for mean. -- The confidence interval is defined as a range into which the true mean of the distribution falls with a certain probability. The probability is expressed in percent in a parameter called; typical values are: 99.9 (= 99.95% from left!), 99 (= 99.5% from left), 95 (= 97.5% from left), 90 (= 95% from left). [+CI{ParameterValue}] -- Obere Grenze des {ParameterValue}%-Konfidenzintervalls für den Mittelwert StatisticalEstimate UpperRange false true [-P{ParameterValue}] -- {ParameterValue}% percentile -- The {ParameterValue}% percentile is defined such that {ParameterValue}% of the observations are smaller than this value. Typical parameter values are 2.5, 5, 10, 15, 20, 25 (= 1st quartile), 30. Do not use 50; use the 'Median' measure instead! [-P{ParameterValue}] -- {ParameterValue}% Quantil -- Das {ParameterValue}%Quantil ist der Wert der größer oder gleich {ParameterValue}% der Beobachtungen ist. StatisticalEstimate LowerRange false true [+P{ParameterValue}] -- {ParameterValue}% percentile -- The {ParameterValue}% percentile is defined such that {ParameterValue}% of the observations are smaller than this value. Typical parameter values are 70, 75 (= 3rd quartile), 80, 90, 95, 97.5. Do not use 50; use the 'Median' measure instead! [+P{ParameterValue}] -- {ParameterValue}% Quantil -- Das {ParameterValue}%Quantil ist der Wert der größer oder gleich {ParameterValue}% der Beobachtungen ist. StatisticalEstimate UpperRange false true [+TM{ParameterValue}] -- {ParameterValue}% trim mean -- The arithmetic mean of the symmetric {ParameterValue}% interior portion of a set of data values. StatisticalEstimate UpperRange false true [µ - {ParameterValue} s.d.] -- Mean minus {ParameterValue} stand. deviation(s) -- Lower limit of a range calculated as mean minus standard deviations. The parameter ParameterValue (here {ParameterValue}) defines a factor with which the s.d. is multiplied before it is substracted from the mean. Typical parameter values are 1 or 2. [µ - {ParameterValue} Std.abw.] -- Mittelwert minus {ParameterValue} Standardabweichung(en). StatisticalEstimate LowerRange false false [µ + {ParameterValue} s.d.] -- Mean plus {ParameterValue} stand. deviation(s) -- Upper limit of a range calculated as mean plus standard deviations. The parameter ParameterValue (here {ParameterValue}) defines a factor with which the s.d. is multiplied before it is added to the mean. Typical parameter values are 1 or 2. [µ + {ParameterValue} Std.abw.] -- Mittelwert plus {ParameterValue} Standardabweichung(en). StatisticalEstimate UpperRange false false [Min\{ParameterValue} s.d.] -- Minimum; outlier corrected ({ParameterValue} std. dev.) -- Absolute minimum value of sample, excluding outlier values more than {ParameterValue} standard deviations distant from the mean. Typical parameter values are 3 or 4. [Min\{ParameterValue} Std.abw.] -- Minimum; Outlier-korrigiert ({ParameterValue} std. dev.). StatisticalEstimate LowerExtreme false false [Max\{ParameterValue} s.d.] -- Maximum; outlier corrected ({ParameterValue} std. dev.) -- Absolute maximum value of sample, excluding outlier values more than {ParameterValue} standard deviations distant from the mean. Typical parameter values are 3 or 4. [Max\{ParameterValue} Std.abw.] -- Maximum; Outlier-korrigiert ({ParameterValue} std. dev.). StatisticalEstimate UpperExtreme false false Broad classification of the univariate statistical methods, used in "UnivarStatMeasureEnum": //xs:enumeration/xs:annotation/xs:appinfo/Specification/ReportingClass. A separate xslt script (UBIF_Enumerations.xsl) is provided that converts this from schema data to xml instance data. ReportingClasses are provided to simplify the creation of applications using UnivarStatMeasure values. They simplify the detailed information provided by the method values into a minimally extended version of the five basic measurement classes supported by DELTA. Most applications that report information for human consumption can rely on these reporting classes in their decision how to present the data. Whereas UnivarStatMeasureEnum must be implemented, these additional specifications are an offer to simplify implementations and increase compatibility with future UBIF version. Implementors may choose different methods of handling the statistical measures, however. Compare also UnivarStatMeasureReportingClassEnum. Any kind of central measure, like mean, mode, median, etc. The lower value of any kind of range measure, like 'mean minus standard dev.', confidencen interval, percentiles, etc. The upper value of any kind of range measure, like 'mean + standard dev.', confidencen interval, percentiles, etc. The absolute minimum value. The absolute maximum value. Any kind of variance measure, like standard deviation, variance, etc. The sample size. Any other kind of statistical measure. Broad classification of the univariate statistical methods, used in "UnivarStatMeasureEnum": //xs:enumeration/xs:annotation/xs:appinfo/Specification/MethodClass. A separate xslt script (UBIF_Enumerations.xsl) is provided that converts this from schema data to xml instance data. MethodClasses inform about very general quality properties of measures. This is an optional feature. Whereas UnivarStatMeasureEnum must be implemented, these additional specifications are an offer to simplify implementations and increase compatibility with future UBIF version. Implementors may choose different methods of handling the statistical measures, however. Compare also UnivarStatMeasureReportingClassEnum. Statistical estimate -- Measures estimated by statistical methods. Examples: Sample mean, minimum, confidence interval, standard deviation (n-1 degrees of freedom). Statistical parameter -- Values calculated by statistical methods that are exact in relation to the population under study (statistical estimates are exact in relation to the sample, but estimates in relation to the population under study). Examples: Sample size, standard deviation (n degrees of freedom). Observer estimate -- Values estimated by humans without using mathematical/statistical methods. Unknown method -- Values obtained by an unknown method. This may be a statistical method or a human observer estimate. Many legacy data sets and data published in print fall into this category. -- c) Agent role codes Provides codes for roles like author, editor, photographer, advisor, or copyright holder. This type is implemented as a union of all AgentRole* enumerations. The roles and their codes used here are based on http://www.loc.gov/marc.relators/ (as of 2004/6 available at http://dublincore.org/usage/meetings/2004/03/Relator-codes.html). For example, the enumerated code "aut" for author corresponds to http://www.loc.gov/marc.relators/aut. The DublinCore Agents group is considering using the same codes (see e. g. http://www.loc.gov/marc/dc/Agent-roles.html), but as of 2004/6 the DublinCore Agents subgroup did not yet agree on a Creator/Contributor refinement as qualified DublinCore. Note that the roles selected here are a subset of the MARC roles. Creator or Contributor, but no Owner roles (union of AgentRole* enumerations). Enumeration of roles supported for creator agents. See AgentRoleEnum for information about the MARC relator codes. Author -- A person or corporate body chiefly responsible for the intellectual or artistic content of a work. This term may also be used when more than one person or body bears such responsibility. Editor -- A person who prepares for publication a work not primarily his/her own, such as by elucidating text, adding introductory or other critical matter, or technically directing an editorial staff. Creator in general -- A person or corporate body responsible for the intellectual or artistic content of a work. The more specific type Author [aut] or Editor [edt] should be preferred. Illustrator -- The person who conceives, and perhaps also implements, a design or illustration, usually to accompany a written text. Photographer -- The person or organization responsible for taking photographs, whether they are used in their original form or as reproductions. Enumeration of supported roles for contributor agents. See AgentRoleEnum for information about the MARC relator codes. Contributor in general -- Someone whose work has been contributed to a larger work, such as an anthology, serial publication, or other compilation of individual works. Do not use for someone whose sole function in relation to a work is as author, editor, compiler or translator. Translator -- A person who renders a text from one language into another, from an older form of a language into the modern form, or from one audience-specific representation to one appropriate for another audience. Transcriber -- A person who prepares a handwritten or typewritten copy from original material, including from dictated or orally recorded material. Collaborator -- A person or corporate body that takes a limited part in the elaboration of a work of another person or corporate body that brings complements (e.g., appendices, notes) to the work. Collector -- A person who has brought together material from various sources, which has been arranged, described, and catalogued as a collection. The collector is neither the creator of the material nor the person to whom manuscripts in the collection may have been addressed. Correspondent -- A person or organization who was either the writer or recipient of a letter or other communication. Programmer -- A person or corporate body responsible for the creation and/or maintenance of computer program design documents, source code, and machine-executable digital files and supporting documentation. Research team head -- The person or corporate body that directed or managed a research project. Research team member -- The person or corporate body that participated in a research project but whose role did not involve direction or management of it. Researcher -- The person or corporate body responsible for performing research. Scientific advisor -- A person who brings scientific, pedagogical, or historical competence to the conception and realization on a work, particularly in the case of audio-visual items. Proofreader -- A person who corrects text (orthography, grammar). Markup editor -- The person or organization performing the coding of SGML, HTML, or XML markup of metadata, text, etc. Commentator -- A person who provides interpretation, analysis, or a discussion of the subject matter on a recording, motion picture, or other audiovisual medium. Reviewer -- A person or corporate body responsible for the review of book, motion picture, performance, etc. Consultant -- The person called upon for professional advice or services in a specialized field of knowledge or training. Enumeration of supported roles for owner/copyright agents. See AgentRoleEnum for information about the MARC relator codes. Owner -- The person or organization that currently owns an item or collection. Former owner -- The person or organization who owned an item at any time in the past. Includes those to whom the material was once presented. The person or organization giving the item to the present owner is designated as Donor [dnr] Copyright holder -- A person or organization owning the copyright of the material. Copyright claimant -- The person listed as a copyright owner at the time of registration. Copyright can be granted or later transferred to another person or agent, at which time the claimant becomes the copyright holder. Donor -- The donor of a book, manuscript, etc., to its present owner. Donors to previous owners are designated as Former owner [fmo] Depositor -- A person or organization placing material in the physical custody of a library or repository without transferring the legal title. -- d) sexual status codes

Codes for sex value in humans (clinical status) or animals. The codes are largely based on those defined in DICOM (Digital Imaging and Communications in Medicine, http://medical.nema.org/, Coding Scheme Designator DCM Version 01, PS3.16 Annex B, CID 7455) and ASTM E1633 (= "Standard Specification for Coded Values Used in the Electronic Health Record. Document Number: ASTM E1633-02a. ASTM International, 10-Nov-2002, 76 pages"). Additional codes specific to biology have been added.

An alternative standard is ISO 5218, which provides only four codes: "0 = Not known, 1 = Male, 2 = Female, 9 = Not specified". The difference between 0 and 9 is: "(0) implies that the sex of the person is not provided in the personal details i.e. the data has not been supplied and sex cannot be ascertained from the data provided"; "(9) implies that the sex of the person cannot be determined for physical reasons, e. g. a new born baby". ISO 5218 contains fewer and less intuitive codes. For biological purposes many codes would have to be arbitrarily added. G. Hagedorn, 10. August 2004

Contains basic sex type codes, sufficient for recording human sexes in most administrative contexts (used, e. g., in the Agent type data interface) Male -- [= ASTM E1633: M, = ISO 5218: 1] Female -- [= ASTM E1633: F, = ISO 5218: 2] Unknown sex -- No information regarding the sex is available (= "not recorded"). [= ASTM E1633: U, = ISO 5218: 0] Contains codes in addition to those defined in BasicSexCodeEnum that are necessary for animals and clinical sex descriptions of humans. Additional codes "S, I, HM, HF, HT" has been added to those defined in DICOM /ASTM E1633. On the other side, the DICOM /ASTM E1633 codes "MP = male pseudohermaphrodite" and "FP = female pseudohermaphrodite" are omitted here because they limited to human sex and express a politically contentious perspective (see http://en.wikipedia.org/wiki/Pseudohermaphrodite). See the UBIF type SexCodeEnum for a union of the enumerated values in this type and those in BasicSexCodeEnum. Hermaphrodite -- An organism having both male and female sexual organs at some time during adulthood. General term, not differentiating between simultaneous or sequential hermaphrodites. [= ASTM E1633: H] Simultaneous hermaphrodite -- An organism having both male and female sexual organs at the same time during adulthood. [Not in ASTM E1633] Male changing to Female -- The organism starts as a male, and changes sex to a female later in life (sequential hermaphrodite: protandry). Examples: seabasses (fish); many plant species; humans that underwent surgical sex change. This terms does not identify a phase in which an individual may be. [= ASTM E1633: MC] Female changing to Male -- The organism starts as a female, and changes sex to a male later in life (sequential hermaphrodite: protogyny). Example: Wrass reef fishes; some plants; humans that underwent surgical sex change. This terms does not identify a phase in which an individual may be. [= ASTM E1633: FC] Hermaphrodite, male phase -- Sequential hermaphrodite in male phase. [Not in ASTM E1633] Hermaphrodite, female phase -- Sequential hermaphrodite in female phase. [Not in ASTM E1633] Hermaphrodite, transitional phase -- Sequential hermaphrodite currently between sexes. [Not in ASTM E1633] Indeterminate sex -- The organism has been studied, but the sex could not be determined (e.g. in larval forms). Compare "ambiguous" and "unknown" sex. [perhaps = ISO 5218: 9; perhaps = DICOM: code 121103, 'Undetermined'] Ambiguous sex -- The sex organs have been studied, but the result was ambiguous. Includes abnormal mixed sex situations like "gynandromorph" (e. g. an insect is male on one side, female on the other). Compare "indeterminate" and "unknown" sex. [= ASTM E1633: A] -- e) Enumerations specific to the biological domain Identifications of an object/unit as belonging to a class concept may be uncertain. This is especially important in biology, where identification qualifiers like 'cf.' or 'aff.' are often used as part of the scientific name. The following enumerated list provides general categories not restricted to scientific organism names. Note: In biology additional expression is often expressed through the choice of placement of the certainty qualifier. For example, 'Echinonema ferruginea var. campestris' may be qualified as 'cf. Echinonema ferruginea var. campestris', 'Echinonema cf. ferruginea var. campestris', 'Echinonema ferruginea cf. var. campestris'. The first presumably means that the entire name is uncertain, but the infraspecific name may be appropriate, the second indicates that the genus is certain, the species uncertain, and the final that the species in certain and only the infraspecific rank is uncertain. To achieve this level of expressiveness, it is recommended that an additional data element 'IdentificationUncertainTaxonomicRank' of type TaxonomicRankEnum may be combined with an element of IdentificationCertaintyEnum. IdentificationUncertainTaxonomicRank should be optional and omitted to express that an identification is unknown, but the rank not known (e. g. in 'Echinonema ferruginea?'). In ABCD 1.44 a special rank with enumeration beforeName, beforeFirstEpithet, beforeSecondEpithet is used instead. The identification is certain The identification is uncertain -- In biology this is often expressed with the Latin 'cf.' (confer). The identification names a similar object class. -- In this case the identified object is considered very similar to those objects classified under the given name. Note that in contrast to 'Uncertain' this implies that the object most likely it does not belong to this class. In biology this is often expressed with the Latin 'aff.' or 'afin.' (affinis). The certainty of identification is unknown.

This list is a first version of a constrained vocabulary to express typifying relations between taxonomic names and units (specimens or objects preserved in collections). Beyond those type categories explicitly governed by nomenclatural codes (Zoology, Botany, Bacterioloy, Virology), the list also includes some additional type status terms. These categories may be helpful when interpreting the original circumscription (topotypes, ex-types), but do not have the same binding status as terms governed by the nomenclatural codes. The enumeration attempts to strike a balance between listing all possible terms, and remaining comprehensible. In general, including too many terms was considered less problematic than omitting terms. Applications may easily select a subset for presentation in their user interface.

This list is intended as a first version and it is hoped that in the review process through TDWG it will achieve sufficient maturity to be truly useful. It is expected that over time revisions will have to be made. Please use the WIKI (http://efgblade.cs.umb.edu/twiki/bin/view/UBIF/NomenclaturalTypeStatusOfUnitsDiscussion) to discuss the current list and the lists of synonymous, doubtful, or excluded type terms provided therein.

Some background information: A type provides the objective standard of reference to determine the application of a taxon name. The type status of a unit (specimen) is only meaningful in combination with the name that is being typified (a unit may have been designated type for multiple names in different publications). The type status of an object may be designated in the original description of a scientific name (original designation), or - under rules layed out in the respective nomenclatural codes - at a later time (subsequent designation). -- For taxa above species rank the type is always a lower rank taxon (e. g., species for genus, genus for family). The type terms for this situation are not included in the enumeration. Ultimately, typication of all taxa goes back to physical type units, but this should not be recorded as such in data sets. The indirect type reference in higher taxa means that typification changes to the lower taxon automatically affect the higher taxon.

The exact definitions of type status differ between nomenclatural codes (ICBN, ICZN, ICNP/ICNB, etc.). The term definitions are intended to be informative and generally applicable across the different codes. The should not be interpreted as authoritative; in nomenclatural work the exact definitions in the respective codes have to be consulted. A duplication of status codes (bot-holo, zoo-holo, bact-holo, etc.) is not considered desirable or necessary. Since the application of the type status terms is constrained by the relationship of the typified name with a specific code, the exact definition can always be unambiguously retrieved.

The following publications have been consulted to determine the number of type terms that should be included and to prepare the semantic definitions:

  • Nomenclatural Glossary for Zoology (January 18 2000; ftp://ftp.york.biosis.org/sysgloss.txt; verified 17. June 2004)
  • ICBN St. Louis Code (http://www.bgbm.fu-berlin.de/iapt/nomenclature/code/SaintLouis/0013Ch2Sec2a009.htm; verified 17. June 2004)
  • Draft BioCode 4th version (Greuter et al., 1997; http://www.rom.on.ca/biodiversity/biocode/biocode1997.html)
  • Glossary of 'type' terminology (Ronald H. Petersen; http://fp.bio.utk.edu/mycology/Nomenclature/nom-type.htm)
  • Dictionary of Ichthyology (Brian W. Coad and Don E. McAllister, 2004; http://www.briancoad.com/Dictionary/introduction.htm)
  • A useful resource that was not available when writing this proposal might be: Hawksworth, D.L., W.G. Chaloner, O. Krauss, J. McNeill, M.A. Mayo, D.H. Nicolson, P.H.A. Sneath, R.P. Trehane and P.K. Tubbs. 1994. A draft Glossary of terms used in Bionomenclature. (IUBS Monogr. 9) International Union of Biological Sciences, Paris. 74 pp.

Many thanks for review and help to Dr. Miguel A. Alonso-Zarazaga and Dr. Walter Gams. Gregor Hagedorn, 13.7.2004

Allotype -- A paratype specimen designated from the type series by the original author that is the opposite sex of the holotype. The term is not regulated by the ICZN. [Zoo.] Allolectotype -- A paralectotype specimen that is the opposite sex of the lectotype. The term is not regulated by the ICZN. [Zoo.] Alloneotype -- A paraneotype specimen that is the opposite sex of the neotype. The term is not regulated by the ICZN. [Zoo.] Cotype -- A deprecated term no longer recognized in the ICZN; formerly used for either syntype or paratype [see ICZN Recommendation 73E]. [Zoo.] Epitype -- An epitype is a specimen or illustration selected to serve as an interpretative type when any kind of holotype, lectotype, etc. is demonstrably ambiguous and cannot be critically identified for purposes of the precise application of the name of a taxon (see Art. ICBN 9.7, 9.18). An epitype supplements, rather than replaces existing types. [Bot./Bio.] Ex-Type -- A strain or cultivation derived from some kind of type material. Ex-types are not regulated by the botanical or zoological code. [Zoo./Bot.] Ex-Epitype -- A strain or cultivation derived from epitype material. Ex-types are not regulated by the botanical or zoological code. [Bot.] Ex-Holotype -- A strain or cultivation derived from holotype material. Ex-types are not regulated by the botanical or zoological code. [Zoo./Bot.] Ex-Isotype -- A strain or cultivation derived from isotype material. Ex-types are not regulated by the botanical or zoological code. [Zoo./Bot.] Ex-Lectotype -- A strain or cultivation derived from lectotype material. Ex-types are not regulated by the botanical or zoological code. [Zoo./Bot.] Ex-Neotype -- A strain or cultivation derived from neotype material. Ex-types are not regulated by the botanical or zoological code. [Zoo./Bot.] Ex-Paratype -- A strain or cultivation derived from paratype material. Ex-types are not regulated by the botanical or zoological code. [Zoo./Bot.] Ex-Syntype -- A strain or cultivation derived from neotype material. Ex-types are not regulated by the botanical or zoological code. [Zoo./Bot.] Hapantotype -- One or more preparations of directly related individuals representing distinct stages in the life cycle, which together form the type in an extant species of protistan [ICZN Article 72.5.4]. A hapantotype, while a series of individuals, is a holotype that must not be restricted by lectotype selection. If a hapantotype is found to contain individuals of more than one species, however, components may be excluded until it contains individuals of only one species [ICZN Article 73.3.2]. [Zoo.] Holotype -- The one specimen or other element used or designated by the original author at the time of publication of the original description as the nomenclatural type of a species or infraspecific taxon. A holotype may be 'explicit' if it is clearly stated in the originating publication or 'implicit' if it is the single specimen proved to have been in the hands of the originating author when the description was published. [Zoo./Bot./Bio.] Iconotype -- A drawing or photograph (also called 'phototype') of a type specimen. Note: the term "iconotype" is not used in the ICBN, but implicit in, e. g., ICBN Art. 7 and 38. [Zoo./Bot.] Isotype -- An isotype is any duplicate of the holotype (i. e. part of a single gathering made by a collector at one time, from which the holotype was derived); it is always a specimen (ICBN Art. 7). [Bot.] Isolectotype -- A duplicate of a neotype, compare lectotype. [Bot.] Isoneotype -- A duplicate of a neotype, compare neotype. [Bot.] Isosyntype -- A duplicate of a syntype, compare isotype = duplicate of holotype. [Bot.] Lectotype -- A specimen or other element designated subsequent to the publication of the original description from the original material (syntypes or paratypes) to serve as nomenclatural type. Lectotype designation can occur only where no holotype was designated at the time of publication or if it is missing (ICBN Art. 7, ICZN Art. 74). [Zoo./Bot.] -- Note: the BioCode defines lectotype as selection from holotype material in cases where the holotype material contains more than one taxon [Bio.]. Neotype -- A specimen designated as nomenclatural type subsequent to the publication of the original description in cases where the original holotype, lectotype, all paratypes and syntypes are lost or destroyed, or suppressed by the (botanical or zoological) commission on nomenclature. In zoology also called "Standard specimen" or "Representative specimen". [Zoo./Bot./Bio.] Paratype -- All of the specimens in the type series of a species or infraspecific taxon other than the holotype (and, in botany, isotypes). Paratypes must have been at the disposition of the author at the time when the original description was created and must have been designated and indicated in the publication. Judgment must be exercised on paratype status, for only rarely are specimens explicitly cited as paratypes, but usually as "specimens examined," "other material seen", etc. [Zoo./Bot.] Paralectotype -- All of the specimens in the syntype series of a species or infraspecific taxon other than the lectotype itself. Also called "lectoparatype". [Zoo.] Paraneotype -- All of the specimens in the syntype series of a species or infraspecific taxon other than the neotype itself. Also called "neoparatype". [Zoo.] Plastotype -- A copy or cast of type material, esp. relevant for fossil types. Not regulated by the botanical or zoological code (?). [Zoo./Bot.] Plastoholotype -- A copy or cast of holotype material (compare Plastotype). Plastoisotype -- A copy or cast of isotype material (compare Plastotype). Plastolectotype -- A copy or cast of lectotype material (compare Plastotype). Plastoneotype -- A copy or cast of neotype material (compare Plastotype). Plastoparatype -- A copy or cast of paratype material (compare Plastotype). Plastosyntype -- A copy or cast of syntype material (compare Plastotype). Secondary type -- A referred, described, measured or figured specimen in the original publication (including a neo/lectotypification publication) that is not a primary type. Supplementary type -- A referred, described, measured or figured specimen in a revision of a previously described taxon. Syntypes -- The series of specimens used to describe a species or infraspecific taxon when neither a single holotype by the original author, nor a lectotype in a subsequent publication has been designated. The syntypes collectively constitute the name-bearing type. [Zoo./Bot.] Topotype -- One or more specimens collected at the same location as the type series (type locality), regardless of whether they are part of the type series. Topotypes are not regulated by the botanical or zoological code. Also called "locotype". [Zoo./Bot.] Type -- a) A specimen designated or indicated any kind of type of a species or infraspecific taxon. If possible more specific type terms (holotype, syntype, etc.) should be applied. b) the type name of a name of higher rank for taxa above the species rank. [General] not a type -- For specimens erroneously labelled as types an explicit negative statement may be desirable. [General]

Enumerated codes to express the rank of a taxon (scientific organism name) in a taxonomic hierarchy. The list is intended to be interoperable between name providers for bacteria, viruses, fungi, plants, and animals. It is not assumed that in each taxonomic group all ranks have to be used. Individual applications may select appropriate subsets (which may be based on information given inside the enumerated values, see Specifications/BioCode-, Botany-, Zoology-, and BacteriaStatus). The enumeration attempts to strike a balance between listing all possible rank terms, and remaining comprehensible. For example, the "infra-" ranks specifically mentioned in BioCode have been included (although very rarely used), but the additional intermediate zoological ranks (micro, nano, pico, etc.) are not included. Whether the selection of infraspecific ranks (some informal ranks, esp. from bacteriology, may be missing!) probably needs some discussion. However, it is believed that this list may help to start developing data sets that can easily be integrated across the barriers of language and taxonomic traditions.

The following publications have been consulted to determine the number of type terms that should be included and to prepare the semantic definitions:

  • The Berlin Taxonomic Information Model, MoReTax view (Berendsohn & al., http://www.bgbm.org/scripts/ASP/BGBMModel/Catalogues.asp?Cat=MT
  • DiversityTaxonomy model version 0.7 (G. Hagedorn & T. Gräfenhan 2002, http://160.45.63.11/Workbench/Taxonomy/Model/InformationModels.html)
  • ABCD version 1.44, types HigherTaxonRankType and RankAbbreviationType, by W. Berendsohn, reviewed by D. Hobern
  • TaxCat2 - Database of Botanical Taxonomic Categories by Jörg Ochsmann, IPK Gatersleben; http://mansfeld.ipk-gatersleben.de/TaxCat2/default.htm

Many thanks for review and help go to Dr. Walter Gams.

Note: the list of all ranks is implemented as a union of all following rank subsets. Note that although BioCode has been used to define the partition into subsets, the ranks are not limited to BioCode but should be an interoperable superset of ranks used in Virology, Bacteriology, Botany and Zoology.

Subset of ranks; equivalent to BioCode "infra-subspecfic", i.e. below the species group [cand.] -- candidate -- Candidatus' rank is proposed in bacteriology for putative taxa, which could not yet be studied sufficiently to warrant the creation of a name with a known rank. (Murray, R.G.E. & Schleifer, K.H.: Taxonomic notes: a proposal for recording the properties of putative taxa of procaryotes. Int. J. Syst. Bacteriol., 1994, 44, 174-176). cand. - - - - - - - Proposed - - [tax. infrasp.] -- infraspecific tax. of undefined rank -- Undefined ranks (using either no rank identifier in botany, or using greek letters or symbols like stars, crosses) occur in very old publications. Most frequently these are to be interpreted as varieties, but occasionally they are forms or subspecies (see Stearn, W.T. 1957: Species plantarum (Facsimile); Introduction. 1. London, p. 90-95, 160-161, 163). The interpretation of these cases requires taxonomic knowledge that may not be available at the time when data are parsed. Such lack of knowledge can be expressed using this rank identifier. tax. infrasp. - - - - - - - - - - [f. sp.] -- special form -- The ICBN does not formally cover formae specialis (art. 4, note 3). However, because of the economic importance of pathogenic f. sp., and since it is common practice to handle them as if the code would apply (i. e. priority usually observed, name quoted with author), they are included here. f. sp. forma sp.; fsp.; fm. sp.; f. spec.; fm. spec.; forma spec. - - - - - - used, but all ranks below subsp. are not covered by ICNP/ICNB, see Rules 5d and 14a. ##Check whether this rank is used indeed. none - [subsubfm.] -- subsubform subsubfm. subsubf. - - - - - - - additional(?) - [subfm.] -- subform subfm. subf. - - - - - additional - additional - [fm.] -- form -- Form, race, variety are not subject to regulation in zoology; see ICZN Article 1.3.4 fm. f. - - - - - secondary - secondary - [subvar.] -- sub-variety subvar. subv. - - - - - additional - additional - [var.] -- variety -- Form, race, variety are not subject to regulation in zoology; see ICZN Article 1.3.4 Examples: Pinus nigra var. caramanica (= "P. nigra subsp. nigra var. caramanica"; Taxus baccata var. variegata var. v. - - - - - secondary used, but all ranks below subsp. are not covered by ICNP/ICNB, see Rules 5d and 14a. ##Check whether this rank is used indeed. secondary - [pathovar.] -- patho-variety pathovar. pv. - - - - - - used, but all ranks below subsp. are not covered by ICNP/ICNB, see Rules 5d and 14a. ##Check whether this rank is used indeed. - - [biovar.] -- bio-variety biovar. bv. - - - - - - used, but all ranks below subsp. are not covered by ICNP/ICNB, see Rules 5d and 14a. ##Check whether this rank is used indeed. - - [cult.] -- cultivar -- The epithet is usually output in single quotes and may contain multiple words, see ICBN §28. Examples: Taxus baccata 'Variegata', Juniperus ×pfitzeriana 'Wilhelm Pfitzer'; Magnolia 'Elizabeth' (= a hybrid, no species epithet). cult. - - - - - - - - Reference to 'Internat. code of nomenclature for cultivated plants' - [convar.] -- convar -- Used in cultivated plants (ICNCP), but deprecated, see 'Some notes on problems of taxonomy and nomenclature of cultivated plants' by J. Ochsmann, http://www.genres.de/IGRREIHE/IGRREIHE/DDD/22-08.pdf convar. cv. - - - - - - - - - [cultivar. group] -- cultivar-group cultivar. group - - - - - - - - - - [graft-chimaera] -- graft-chimaera graft-chimaera - - - - - - - - - - [infrasp.] -- infraspecies infrasp. infrasp.; infraspec. - - - - - additional - - - Subset of ranks; equivalent to BioCode "species group", i.e. only species and subspecies [ssp.] -- subspecies -- Examples: Pinus nigra subsp. nigra Homo sapiens sapiens ssp. subsp.; subspec. - - - - - secondary covered additional additional [sp.] -- species -- Examples: Taxus baccata Homo sapiens sp. spec. - - - - - primary covered principal principal Subset of ranks; equivalent to BioCode ""subdivision of a genus" ", i.e. all ranks between genus and species group (i.e. not including subgenus and species) [sp. group] -- species group -- The Berlin/MoreTax model notes: "Aggregate and species group have been included although they aren't taxonomic ranks but cirumscriptions because on the one hand they are necessary for the concatenation of the fullname and on the other hand they are necessary for distinguishing the aggregate or species group from the microspecies." sp. group * - - - - - - - - - [aggr.] -- species aggregate -- A loosely defined group of species. Zoology: "Aggregate - a group of species, other than a subgenus, within a genus; or a group of subspecies within a species. An aggregate may be denoted by a group name interpolated in parentheses." -- The Berlin/MoreTax model notes: "Aggregate and species group have been included although they aren't taxonomic ranks but cirumscriptions because on the one hand they are necessary for the concatenation of the fullname and on the other hand they are necessary for distinguishing the aggregate or species group from the microspecies." aggr. * - - - - - - - - - [tax. infragen.] -- infrageneric tax. of undefined rank -- A name that appear between a genus name and a species epitheton and is not clearly marked as series or section, or other may be assigned to this rank until the true rank can be assigned by a taxonomic expert. tax. infragen. - - - - - - - - - - [subser.] -- subseries subser. * - - - - - additional - additional - [ser.] -- series ser. * - - - - - secondary - secondary - [subsect.] -- subsection subsect. * - - - - - additional - additional - [sect.] -- section sect. * - - - - - secondary - secondary - Subset of ranks; equivalent to BioCode "genus group", i.e. infragenus to genus [infragen.] -- infragenus infragen. * - - - - - additional - - - [subgen.] -- subgenus subgen. * - - - - - secondary covered additional additional [gen.] -- genus -- Examples: Magnolia Homo gen. - - - - - - primary covered principal principal Subset of ranks; equivalent to BioCode "subdivision of a family", i.e. ranks between genus group and family group [infratrib.] -- infratribe infratrib. - - - - - - additional - - - [subtrib.] -- subtribe subtrib. - -inae -inae -inae -inae -ina additional covered (but probably not in current use) additional - [trib.] -- tribe trib. - -eae -eae -eae -eae -ini secondary covered (but probably not in current use) secondary - [supertrib.] -- supertribe supertrib. - - - - - - additional - additional - Subset of ranks; equivalent to BioCode "family group", i.e. infrafamily to superfamily [infrafam.] -- infrafamily infrafam. - - - - - - additional - - - [subfam.] -- subfamily -- Examples: Magnolioideae subfam. - -oideae -oideae -oideae -oideae -inae secondary covered additional additional [fam.] -- family -- Examples: Magnoliaceae Hominidae fam. - -aceae -aceae -aceae -aceae -idae principal covered principal principal [superfam.] -- superfamily -- Examples: Magnoliacea superfam. - - -acea -acea -acea -oidea; -acea secondary - additional - Subset of ranks; equivalent to BioCode "suprafamilial". This rank group includes all ranks higher than superfamily (class, phylum/division, kingdom, domain) [infraord.] -- infraorder infraord. - - - - - - additional - - - [subord.] -- suborder -- Examples: Magnolineae Catarrhini subord. - -ineae -ineae -ineae -ineae - additional covered additional additional [ord.] -- order -- Examples: Magnoliales Primates ord. - -ales -ales -ales -ales - principal covered principal principal [superord.] -- superorder -- Examples: Magnolianae superord. - - -anae -anae -anae - additional - additional additional [infracl.] -- infraclass infracl. - - - - - - additional - - - [subcl.] -- subclass -- Examples: Magnoliidae Eutheria subcl. - -idae [proposed; Stackebrandt, E., Rainey, F.A. & Ward-Rainey, N.L.: Proposal for a new hierarchic classification system, Actinobacteria classis nov. Int. J. Syst. Bacteriol., 1997, 47, 479-491.] -idae -phycidae -mycetidae - additional covered additional additional [cl.] -- class -- Examples: Magnoliopsida Mammalia cl. - -ia [proposed; Stackebrandt, E., Rainey, F.A. & Ward-Rainey, N.L.: Proposal for a new hierarchic classification system, Actinobacteria classis nov. Int. J. Syst. Bacteriol., 1997, 47, 479-491.] -opsida -phyceae -mycetes - principal covered principal principal [supercl.] -- superclass supercl. - - - - - - additional - additional - [infraphyl./div.] -- infraphylum (= infradivision) infraphyl./div. - - - - - - additional - - - [subphyl./div.] -- subphylum (= subdivision) -- Examples: Magnoliophytina Vertebrata subphyl./div. - - -phytina -phytina -mycotina - additional - - additional [phyl./div.] -- phylum (= division) -- Examples: Magnoliophyta Chordata phyl./div. - - -phyta -phyta -mycota - principal used, but all ranks above class are not covered by ICNP/ICNB - principal [superphyl./div.] -- superphylum (= superdivision) superphyl./div. - - - - - - additional - - additional [infrareg.] -- infrakingdom infrareg. - - - - - - additional - - - [subreg.] -- subkingdom subreg. - - - - - - additional - additional additional [reg.] -- kingdom -- Examples: Plantae Animalia reg. - - - - - - principal used, but all ranks above class are not covered by ICNP/ICNB principal principal [superreg.] -- super kingdom -- Examples: Eucaryota superreg. - - - - - - additional - additional additional [dom.] -- domain (= empire) -- Examples: Archaea (= Archaeobacteria), Bacteria (= Eubacteria), Eukarya (= Eukaryota) dom. - - - - - - secondary used, but all ranks above class are not covered by ICNP/ICNB - - [tax. supragen.] -- suprageneric tax. of undefined rank -- This value indicates that the rank of a name is unknown. Compare "incertae sedis" which is commonly used as a replacement for a taxon to group all taxa whose position in the classification or phylogenetic tree is uncertain. tax. supragen. - - - - - - - - - - === Complex types referring to UnivarStatMeasureEnum (used e.g. by SDD): Reference to a univariate statistical measure (without parameter) Refers to an enumerated value in the UBIF type, declaring which kind of statistical measure has been used. Reference to a univariate statistical measure (with 1 parameter) Refers to an enumerated value in the UBIF type, declaring which kind of statistical measure has been used. Reference to a univariate statistical measure (without parameter) plus a numeric value Reference to a univariate statistical measure (with 1 parameter) plus a numeric value This is a parameter value that further defines the univariate statistical measure. Example: for a percentile (ref='PercLower'), '0.10' would define the 10%-percentile.