XML schema to encode descriptive data in biology and other subjects. The primary goal of the design is to increase the knowledge and availability of knowledge about the diversity of life on earth. However, it may be used in many other areas (including medicine, pathology, archeology, anthropology) wherever objects or classes of objects are described for later reidentification.

The schema was designed by the Structure of Descriptive Data (SDD, http://160.45.63.11/Projects/TDWG-SDD/index.html) group. SDD was established 1999 as a subgroup of the Taxonomic Databases Working Group (TDWG, www. tdwg.org) of the International Union of Biological Sciences (IUBS). The author of the current schema version and of all annotations is G. Hagedorn, Berlin. The requirements for an SDD schema where elaborated in 6 major meetings of the SDD group and in discussions over the SDD email list. Over 60 people contributed to these discussions. However, the help, criticism and energy of Bob Morris, Kevin Thiele, Bryan Heidorn, Guillaume Rousse, Steve Shattuck and Nicolas Bailly is specially acknowledged!

Copyright © TDWG, 1. December 2003. This is a preliminary version (0.9!) for testing purposes. Permission to use this schema is granted to all scientific or commercial projects for a testing period of up to 3 years. After this time computer programs using this schema must either be discontinued or converted to the final version of this schema.

Conventions:
Element or attribute names starting with underscores (__) may be present in the schema for discussion purposes and should not be used. Annotations containing @ indicate unfinished points of discussion.

Note: blockDefault="#all" in xs:schema prevents that in instance documents derived types can be used in elements typed to the base type (which otherwise is possible using xsi:type=""). - finalDefault is not set, further type derivation is currently not considered problematic. Please contact me if you believe otherwise. Note that according to the w3c discussion forum, the developers of xml Schema consider to drop the final attribute in the upcoming Schema version 1.1. - Nillable: xsi:null is not supported in SDD documents (schema declaration nillable="false" is default, not explicitly stated).

Document is the required root element: Provides root element. Note that the version of the SDD standard used is defined in the namespace declaration and needs no separate data element. Note: until xInclude is sufficiently widespread implemented to combine data from different documents, terminology, descriptions, and resources must be in the same document! Describes the application or script that produced this document. The information is transient (informs the import process, but is discarded after import). Intended for debugging purposes and to improve import quality (esp. if some generators are known to produce problematic code). Required information defining the project itself. Refers to the entire document, (terminology, descriptions, keys, etc.) Defines the terminology (parts, characters, states, etc.) in which the descriptions are expressed. The classes (biology: taxa) and objects (biology: specimens) that are being described. Lists of external resources used in terminology or descriptions (persons, publications, media resources). This provides an interface as well as a cache. Descriptions of either an abstract class concept (taxon, disease, etc.) or a physical object (individual specimen, part of individual, etc.). Dichotomous or multifurcating authored keys (including legacy data) The labels of modifier definitions are required and must be unique for a given audience definition. The key of coding status values must be unique. The labels of modifier definitions are required and must be unique for a given audience definition. The key of coding status values must be unique. The labels of modifier definitions are required and must be unique for a given audience definition. Statistical measures are uniquely identified by the combination of method name and the optional method value. The labels of probability modifier definitions are required and must be unique for a given audience definition. The labels of frequency definitions are required and must be unique for a given audience definition. The labels of modifier definitions are required and must be unique for a given audience definition. In addition to the modifier keys restricted to one type of modifier (probability, frequency, general) we currently require that modifier keys are unique across the different modifier types. This requirement is introduced as a precaution to later simplify changes in the understanding of modifiers (one or several collections?) Defines the key for a modifier set (which contains references to Frequency, General, Probability modifiers). A set can be referred to as a whole to enable all different modifiers in the set for a character. The labels of probability modifier definitions are required and must be unique for a given audience definition. The labels of character definitions are required and must be unique for a given audience definition. This provides a joint key for states either defined locally within a character (= StateDefinition), or referenced from GenericStates (StateReference; provides a new local key). This is the only key for them; no separate keys for locally defined/generic reference within the character are defined. Note that state keys are unique across all characters, not only within each character. Provides a new key for the reference object referring to the project-wide statistical measure definitions in the context of a single character. This key is referred to when statistical measures are used in Descriptions. This provides a combined key for states (local or generic references, compare CharacterStateKey) and statistical measure references. @@ Further discussion on the utility of doing this may be required. It is intended to simplify implementations that treat all state or measure objects within the character as a single supertype. The labels of concept tree definitions are required and must be unique for a given audience definition. This collects all keys of ConceptTreeNode elements. Note that no UniqueLabelText constraint is defined for concept tree nodes; the labels on tree nodes are optional and not required to be unique. They are expected to be displayed together with their parents and thus obtain their uniqueness from the context (the path from the root should be unique, but this is not expressed in the schema). Also, the xpath selector selects all Concept elements anywhere in the document, which is more general (and therefore computation intensive) than necessary. A better xpath expression would be Terminology/ConceptTrees/ConceptTree//Concept, which includes all nodes regardless of their place in the tree structure. However, combining a defined path with an "all child" path is impossible under the restrictions imposed on xpath expressions in xml-schema identity constraints. Further, since keyed elements in a tree are collected from the document root rather than from the tree root, the node element names must be unique names in the entire schema. Especially, it is not possible to give both the tree definition nodes and the references to these nodes from within NaturalLanguageDescriptions the same name. Although the latter has a ref instead of a key, Schema will complain that the key is missing, rather than understand that only elements possessing a key are to be selected for the xs:key constraint. Please comment if you know better solutions! All generic state key values must be unique in an entire project. Compare ConceptTreeNodeKey. A joint key for all CodedDescription or NaturalLanguageDescription elements. Note: as any key, this may be problematic if Descriptions are federated! This identifies an entire designed key (i. e. not the nodes/steps in the key) The labels of designed key definitions (i. e. for the entire key) are required and must be unique for a given audience definition. This collects all keys of nodes in designed Keys (Lead elements). Compare the note on ConceptTreeNode about a potentially better xpath. Root sections (typed only for modularization; each one used only a single time): Describes the application or script that produced this document (whether it has been authored there or not). The information is transient (it informs the import process, but is discarded after import). Intended for debugging purposes and to improve import quality (esp. if some generators are known to produce problematic code). Furthermore, attributes describe whether the data contained in the document are complete or an excerpt of a larger data set. Name of the application that has generated this document. The term 'application' should be understood in a loose sense; it may be a script that is not part of a larger application (compare the Routine attribute, which may provide the detailed name of scriptis that are part of an application!). Version of the application that has generated this document. The attribute should not be named 'Version' to avoid confusion with the version of the content (see ProjectDefinition). Additional information about the generating application that is not part of the name or version. Documenting the copyright of the generating application is not recommended, but if desired, a copyright string may be placed here. Optionally allows a generating application to identify which export routine created the document; some applications may have several alternative export routines. This attribute may also be used, to identify different conditions under which the export routine may behave differently. Scripts (e. g. XSL transformations) that modify existing xml documents in a relatively minor way should add their name to this (semicolon-separated) list of transforming scripts (rather than replacing the GenerationMetadata with their own information). Date and time (UTC or local time with timezone information) at which the current document or data stream was created by the generator. If this document is produced in response to a query and therefore only contains a subset of the terminology defined in the project, this optional element should be set to true to inform consumers that a more complete version can be found elsewhere. If this document is produced in response to a query and therefore only contains a subset of descriptions available, this optional element should be set to true to inform consumers that a more complete version can be found elsewhere. If the document is a snapshot (complete or extract) of data held otherwise, and the data are served through a URI, this attribute informs about the point to query for up-to-date information. If possible this should be a complete web-query string. Required information defining the project itself. A globally unique ID-string, distinguishing this project from all others. The value should never be changed once it has been introduced. To refer, e. g., to a character across projects, this value is combined with the key of the character. If you don't have this, it will be difficult to compare versions of projects Recommendation: Avoid choosing simple names that are likely to be created multiple times ('plants', 'French bees', etc.). Authors working at research institutions and expect to continue to do so, may use institutional-URI/personal or team name/project label (example: http://bba.de/hagedorn/coelomycetes). Note that this is only an identifier and does NOT help to locate any real resource on the web. Number and date of current version The major version number as defined by the project creators An optional minor version number ('2' in 1.2) An optional incremental version number to distinguish each successive revision of a project. Publication of the current version (compare RevisionData/ InitiationDate for date of first version). PublicationDate should be missing if the current version is not yet published. Creators, Revision status, and dates of the entire project. The revision status refers to both terminology and descriptions. Note: Creators are optional, but within Audience- SpecificData at least a copyright statement for the project is required. Audience-specific project header information [ATTR: audience] The audience values must uniquely identify the Representations within the ProjectDefinition/AudienceSpecificData. Many projects will have a limited geographical scope (or coverage). Defines reference if information in the entire project came from one publication (printed or digital). [ATTR: ref] URL pointing to the online source for the terminology or descriptive data contained in the current xml document. WebAddress may serve an updated version of the data. @@ To be discussed. The idea is that a project may point to a web resource that informs about details about the history of the data (previous versions or a detailed log of changes). Optionally an image media resource containing an icon/logo symbolizing the project. [ATTR: ref] A list of audiences addressed in the project. An Audience is a combination of language (including dialect) and expertise (pupil, beginner, expert). [ATTR: defaultaudience] The labels of audience definitions are required and must be unique for a given language ('lang' attribute). For natural language reporting some rules can be defined per language rather than per audience. If a rule for a language used in an audience definition is missing, applications may add a default language rule to the project data. [ATTR: lang, dir] @@This whole sequence is not functional, just a bunch of ideas for discussion! @@ The extra Wording element is used to clarify that this is for natural language wordings (similar to Wording in Label types). Currently the only other Language rule is the dir attribute, but more language rules may follow. @@This whole sequence is not functional, just a bunch of ideas for discussion! @@ @@ Should for each of Or, And, etc. an entire delimiter group be defined? @@ Should only Or be defined and and etc. left to the override mechanism available anyways in the concept trees? @@ unclear whether used. DeltaAccess defines on the character level whether states are combined with or, and, to, or with. This has not yet been worked out for SDD! Instead originally SDD attempts to succeed just with delimiters. Combining delimiter rules with conditionally different operators is a problem, however!@@ @@ unclear whether this would be used For states that intergrade: 'red to orange' ltr = left-to-right direction rtl = right-to-left direction The labels of audience definitions are required and must be unique for a given language ('lang' attribute). The terminology is designed by the biological specialist(s). It defines the semantics of structural hierarchies of the organisms, methods, properties, characters, states, and modifiers. These terms are then used in the Descriptions (through references to their IDs ('key' attributes). Contains detailed terminological definitions (of object/part, object types, property, method, property state, etc.). These are referenced in character, state, etc. definitions, but may also be used independently. The Glossary entry for a single concept (object, method, property, state, etc.), which may be expressed in multiple audience-specific representations. [ATTR: key] @@unusually, in Glossary authors and revision status are given for EACH translation. This may be appropriate given complex material is presented in each Representation, but it is also inconsistent with the rest of the SDD schema. Please comment whether to move RevisionData here, and if so what to call the collection of Representation elements: Representations?? @@ The audience values must uniquely identify the Representations within each GlossaryEntry. Defines the semantics and labels of coding status values (e. g. unknown, not applicable, not interpretable). Coding status values (= 'missing data indicators', = 'special states') provide standardized reasons why data are missing. Unlike most elements in Terminology, these are constrained by the SDD model and can only be extended by revising the SDD standard (may be changed to user-definable in a later version of SDD). Labels are already user-definable to support multiple audiences. The labels and abbreviations given are only recommendations. They can be freely changed as long as the semantics are preserved. [ATTR: key] Defines the semantics and labels of statistical measures (e. g. mean, min, max, s. d., s. e., sample size). Unlike most other elements of the terminology, these definitions are constrained by the SDD model and cannot be extended by the designer of the terminology. However, future versions may allow this without requiring structural changes to SDD. (Note: generic categorical states are defined inside the concept trees, not here!) Each definition defines a fixed key value, multilingual label and glossary information (user extensible to new audiences) and attributes describing generalized semantics. [ATTR: key] Probability, Frequency, and general modifiers modify categorical states or statistical measures. Modifiers are defined for the entire project, but must be enabled on a per-character basis to be applicable to states in descriptions. This enabling occurs through the user-definable modifier sets. Probability modifiers are used to describe the probability of statements about categorical states and statistical measures (perhaps, probably, almost certainly, etc.). The true-by-misinterpretation modifiers are included as a special case (= 'certainly-false'). Probability modifier [ATTR: key] Frequency modifiers are used to describe state frequency (usually, rarely, etc.). The frequency range or estimate can be stated explicitly to Frequency modifier [ATTR: key] General modifiers include modifiers of degree, manner, time and location (strongly, at the tip, etc.). General modifiers convey their specific semantics only to human consumers (or processors able to parse and interpret the label). General modifier (degree, location, timing) [ATTR: key] @@ Necessary? To be discussed! See short doc. on SDD site. Multiple sets of modifiers (each identified by a key and a label) can be defined. A group of related general modifier definitions, allowing to enable a modifier set for a character in a single step. [ATTR: key] @@ Should we add this?@@ A list of additional character that are defined as templates for new characters, to provide combinations of statistical measures, modifiers, and other settings. Or should character have a boolean attribute whether it is a template or not? [ATTR: key] This provides a secondary key within each character to states. The main CharacterStateKey is unique across all characters, so the within character key is unique as well. However, this secondary key is used to constrain the identity of states mapped to states from within the current character. Characters are defined in a flat, unordered list. Multiple hierarchical views and ordered sequences are defined through the optional concept tree definitions below. [ATTR: key] This provides a secondary key within each character to states. The main CharacterStateKey is unique across all characters, so the within character key is unique as well. However, this secondary key is used to constrain the identity of states mapped to states from within the current character. Hierarchical trees of property, structure or part, methods or other concepts. The trees can be operationalized by inserting characters (only these allow scoring of data in the descriptions). Generic states can be defined at appropriate concepts (property or kind-of-part) and dependencies are expressed here as well. Finally, it is possible to express concepts in the form of flat subsets of characters for filtering purposes. @@DISCUSS: should concept tree hierarchies be recursively definable, as long as the resulting tree is acyclical?@@ [ATTR: key] The term 'Entities' is used to refer to the things that are being described. These may be classes (biology: taxa) or objects ('instances'; biology: specimen). Note: Classes and Objects contain lists of external Resources, similar to the lists in the Resources section. Entities are separated from Resources because they have such a central role for the descriptive data. Internal list of class (biology: taxon) names used in the project, each one may optionally link to an external data source. These connectors are reused multiple time in the description. For biology: Object in a nomenclator [ATTR: key] Optional hierarchy (= tree) of classes defined above (biology: taxonomy). A hierarchy may be incomplete, i. e. Classes defined above may be absent. (Collection currently restricted to a single member. in first SDD version this should be restricted, but multiple hierarchies may later be supported) [ATTR: key] Internal list of objects (biology: specimens) used in the project, each object is either described in free form text, or refers to external data source. Biology: Object in a collection (= specimen) or observation [ATTR: key] Global resource definitions containing URIs or actually embedded resources (e. g. encoded images). Documentation of all persons or organisations that where involved in authoring, compiling, or editing the document. @@@ This is just a preliminary sketch that should probably be synchronized with TDWG ABCD! [ATTR: key] Internal list of publication used in the project. Each publication is either described in free form text, or refers to an external data source. Printed or digital publication (including database source) [ATTR: key] Internal list of geographical locations (usually country names, but this may be on any level). Each one is either described in free form text, or refers to an external data source. The external gazetteer referred to may be the TDWG Geography standard. [ATTR: key] Global resource definitions containing URIs or actually embedded resources (e. g. encoded images). [ATTR: key] (Section within the SDD root) Contains an authored or auto-generated free-form description ('natural language description'). It may be completely or partially marked up with elements similar to those in coded descriptions. If all markup except the Text content is removed, the original description can be recovered without changes (lossless). [ATTR: key] A strict and largely language- independent description entirely controlled by the terminology defined in the current project. [ATTR: key] Root section containing authored guided keys (= dichotomous or multifurcating keys, polyclaves). Note that guided keys may also be created by applications 'on the fly' based on data in terminology and descriptions. This section is only intended to represent carefully manually designed keys. [ATTR: key] Should audiences become a root section? Would a predefined set of audiences be included in multiple projects? An Audience is a combination of language (including dialect) and expertise (pupil, beginner, expert). Multiple audiences can be defined for the same language and expertise, distinguished only by their label. [ATTR: defaultaudience] The audiencekey attribute of this element is an arbitrary string. It is referenced in all audience specific elements (labels, definitions) to specify the intended audience. Recommendation: audience keys should consist of the language code used in xml:lang plus the expertise level from 1-5 (plus a letter (a, b, ...) if a second audience for the same language and expertise level is defined). [ATTR: audiencekey, lang, dir, ExpertiseLevel] A concise label for the audience; expressed in the language and ability of the audience. Further text beyond a short label; perhaps clarifying the definition of the audience. Expressed in the language of the audience. The key value that is referenced whenever an audience="xxx" attribute is used in audience-specific elements. ExpertiseLevel is restricted to values from 1-5. These categories allow to communicate expected expertise between different applications using the SDD schema. The recommended interpretation is: 1 = elementary school (year 1 to 6); 2 = middle school (year 7 to 10); 3 = high school (year 11 above) and general public (trying to avoid any specialized terminology or jargon); 4 = university students or (partly) trained personnel (using terminology, but avoiding or explaining problematic terminology); 5 = experts (using the full range of terminology). The default audience is used whenever the setup of the consuming application has no other preference specified. The user interface of the application may then allow to choose a different audience/language available. The following types are used in the Terminology section to define characters, states, trees/tree nodes etc. Defines a character in the terminology Label includes abbreviations (e. g. for tabular reports) but no natural language wording. (Natural language wording for characters is available through concept trees!). The audience values must uniquely identify the Representations within each Label (no duplicate audience allowed). Note that separate uniqueness constraints exist that label representations must be unique for each parent element (e. g. a character) so that the parent element can be uniquely identified in the user interface. - cardinal data scale = integer (incl. negative values, although these are extremely rare in descriptive biological data; DELTA: type 'IN') - interval = real numeric = floating point values (DELTA: type 'RN') - nominal = unordered categorical states (DELTA: type 'UM') - ordinal-discrete = ordered categorical states (DELTA: type 'OM') @@ Should we make a distinction between ordinal-discrete and ordinal-interval [= ordered categorical states (DELTA: type 'OM'). Like ordinal-discrete but states can intergrade. Example: no / few / many hairs, ovate / ellipsoid. However, also intergrade without order: - Color @@ introduce a separate datatype for color? Exact value are not very practical, but polygons in color space would be very usefull! Only applicable if character type is cardinal or interval (not controlled by schema!) Constrains which project-wide StatisticalMeasure definitions can appear in descriptions of this character. Note: Some statistical measure definitions (min, median, mode, etc.) could apply also to ordinal or even nominal types. This is, however, not yet supported in SDD. The key attribute must be unique and is referred to in the descriptions. The ref refers to the semantics defined in the project-wide statistical measure definitions. [ATTR: key, ref] References to project-wide defined generic StatisticalMeasures must be unique within each character. For example, it should not be possible to define a mean twice in a single character. (Different characters may contain the same statistical measure reference). Mappings of numerical ranges to categories (like DELTA Key States) Each mapping defines a lower and an upper value to map numerical ranges to categorical states in the same character. A CompareWith attribute defines which kind of statistical measure (mean, confidence interval, or min/max) is used for the comparison. Defines a range through 2 attributes LowerValue and UpperValue (inclusive range) [ATTR: LowerValue, UpperValue, CompareWith] The type of statistical measure with which the mapping range defined through Lower/UpperValue is compared. This may be a central value (mean, median), the range (quantile, confidence interval, etc.) or the extremes (minimum/maximum). Currently only these three categories are defined. [ATTR: ref] Secondary ref to validate that the state is present in the current character. Unit like mm, µm, °C. The content allows some xhtml formatting to support e. g. "mm2". A Postfix attribute may be set to false to output string before a value (e. g. 'pH 7.0'). @@ Methods should ideally be defined in Glossary entries. Or should this become free-form text? [ATTR: ref] Free-form information about accuracy of measurement. @@ free-form is language and audience dependent and can not be included in analysis. Currently this is rather a specific InternalNote. Any way to improve this? Ideally a numeric value for a confidence interval of measurements would be desirable! Applicable to all character types; categorical states can be defined in addition to statistical measures! (States are defined outside the type specific tree, since categorical states may be present in addition to numerical data) (the sequence of states in the xml file is significant) Local definition of a state [ATTR: key] Reference to a single generic state (as defined project-wide at a concept tree node). [ATTR: key, ref] Refers to a project-wide definition of a generic categorical state and adds a new key to allow unique references to the generic state in the context of the current character. The key created here is the one referred to in the Descriptions. Refers to a generic character state (those defined within the concept tree, which may be used in multiple characters) References to project-wide defined GenericStates (defined at the nodes of concept trees) must be unique within each character. This is achieved by a uniqueness constraint (local to each character) on the ref attribute of StateReference. The key attribute is already unique through the general CharacterStateKey. The labels of character state definitions are required to be unique within each character and audience definition. Note that this includes both the locally defined states and the referenced generic states. A collection of references to sets of generic states (i. e. to tree nodes). If a generic state is added to any of these sets, it will be added to the states in the current character (as a StateReference). This occurs not through xml or schema mechanisms, but is a contract with SDD applications. Only applications modifying generic state sets are required to fulfill this contract. (this refers to a node (= group) in a concept tree, since generic states are defined at nodes) [ATTR: ref] References to project-wide defined GenericStates set (i. e. nodes within concept trees) must be unique within each character. Mappings between categorical states (e. g. subovate may be mapped to ovate to simplify identification choices). Each mapping defines a source and a destination state. Both From and To may point multiple times to the same state, but the combination From + To must be unique. Both state must be defined in the current character (validated through identity constraint!) [ATTR: ref] Secondary ref to validate that the state is present in the current character. [ATTR: ref] Secondary ref to validate that the state is present in the current character. A state may be mapped to multiple other states in the same character, or multiple states may be mapped to a single state, but the combination of From and To may only occur a single time. The project-wide modifier definitions become applicable to the current character only if a modifier set containing them is referenced here. Modifier usage in descriptions is not controlled by the schema, i. e. modifiers not present in any set may be present in descriptions for this character. Additional validations are, however, possible. Multiple modifier sets can be referenced in each character. The applicable modifiers are the union of all modifiers in the referenced sets (duplicates are ignored) [ATTR: ref] References to project-wide defined Modifier sets (Terminology/Modifiers/Sets) Defines an entire concept tree (which may be a single tree node containing a flat list) Label to identify the current object in the user interface The audience values must uniquely identify the Representations within each Label (no duplicate audience allowed). Note that separate uniqueness constraints exist that label representations must be unique for each parent element (e. g. a character) so that the parent element can be uniquely identified in the user interface. The type of a tree is constrained to an enumerated list to support application interoperability. Usage of concept tree intended by its designers; constrained to an enumerated list to support application interoperability. The designer of a concept tree defines it as 'complete' to declare that it is intended to include all characters of the terminology. A terminology editing application can use this information e. g. to warn the designer about missing characters, to display special dialog boxes after the creation of a new character, etc. MinimumExpertiseLevel: the designer of the subset expects the user to have a certain minimum expertise level. @@@ Needs discussion! @@@ The root node of the tree. Note that it has a label in addition to the tree label. The tree label uniquely identifies a tree when selecting it among a list of all trees, whereas the root node label can be very short and is shown when a single tree is displayed. [ATTR: key] A node in a concept tree Tree nodes may remain unlabeled! The audience values must uniquely identify the Representations within each Label. The availability in a given description of all characters within a Node may optionally be governed by rules depending on the presence of categorical states in the same description. Note that rules for individual characters can be defined in the terminal nodes. By default the characters below this node are inapplicable. They become applicable if any of the listed controlling character/state combinations is present in a description. Modifier references must be unique within each set (but different sets may contain the same modifier) By default the characters below this node are applicable. They become inapplicable if any of the listed controlling character/state combinations is present in a description. Modifier references must be unique within each set (but different sets may contain the same modifier) Project-wide state definitions tied to the part (e. g. for fruit: capsule, berry, nutlet, etc.), property (e. g. for color: red, green, etc., for shape: round, ovate, etc.), method, etc. described in the current tree. GenericStates become operational for descriptions only when binding or instantiating them in specific characters. The definition of generic states is identical to the local definition of states within a character. Using generic states simplifies the management of terminology and improves data analysis (states from different characters can be compared if they refer to generic states). [ATTR: key] The labels of generic state definitions are required to be unique within each GenericState set (i. e. at a node in the tree) and audience definition. A node either contains other nodes, or contains a single character reference. It may also be empty to decouple the definition of hierarchies (e. g. a complete part hierarchy) from characters defined at a given moment. Element may be missing, which results in the option to have empty nodes with neither a character nor further nodes. [ATTR: key] Characters are the 'leaves' of the tree. Each character is embedded in a node providing labeling information in the context of the current tree (which is usually different from the default character label). A single character may appear in several places in the tree, if this is desired. [ATTR: ref] Abstract base type used to derive statistical measures, coding status values and categorical state definitions. The audience values must uniquely identify the Representations within each Label. Based on StateDefBaseType, for categorical states. Used in generic (= 'project-wide') and local character state definitions. Any use of a character state in descriptions is a reference to an object of this type or one of its derivations. If present and true, the current state/ category allows unconstrained text not tied to a truly analytical state. Such states (which may be labeled: 'Text', 'Other:', 'none of the above, please specify:') prevent, especially if the terminology is still under development, that during data entry potentially inappropriate category must be chosen. DELTA text character are modeled using these states, but they also can occur in combination with categorical states. UnconstrainedText states are somewhat similar to the 'unknown' coding status, since the free-form text information is not available to most analytical processors (incl. identification programs). (This 2nd annotation contains detailed informations not entered in the first annotation, which is visible in the standard schema diagrams.) The name for this data element was contentious. Proposals were: Bob: IsIsolatedState with default false. Gregor: IsAnalyticalState, StateComparisonIsRecommended, or IsWellDefinedState, all with default true. ImpreciseEqualitywith default false? Furthermore, one may want to make a distinction between a category saying "enter free form text here" and one explicitly saying "none of the above". However, the action of choosing a separate free form text state instead of scoring a category (if available) and adding free-form note text, implies that choosing free-form text is always of the type "none of the above", whether this is explicitly stated in the text state label or not. @@ Was present as attribute in previous version and overlooked, needs discussion! On states or on set? If present should be made required! Enumeration: Local/ Generic/ Special/ Computed Used inside the character definition, it refers to a generic statistical measure in Terminology/StatisticalMeasures. In addition to the ref it defines a new key and formatting information. Format rules as used in the xslt format-number function. # = significant digits; 0 (zero) = signif. digits or insignif. leading/trailing zeros; '.' = decimal point, ',' = group separator. Note that this is NOT culture sensitive in xslt!!! - Examples: "0,0#" formats 5 / 0.59 as 5,0 / 0.59. "# ###,#" formats 5000 / 0.59 as 5 000 / .6. (Rules for exponential formats or percent may be added in later versions of SDD!) @@ This or a format string ?@@ @@ This or a format string ?@@ @@ This or a format string ?@@ Note: How can we handle measures as well as values from repeated observations with the same mechanism? Refers to a generic statistical measure The key attribute provides a new key to uniquely refer to the statistical measure in the context of the current character. This key is the one referred to in the Descriptions. The ref present here is a reference to the project-wide definitions of statistical measures. CodingStatus and StatisticalMeasures are defined project-wide: Based on StateDefBaseType; for CodingStatus values Properties describing a coding status value. They are provided to support generic application code that continues to function if new codes are added. @@ Both proposals need elaboration and discussion! To be coded / Not to be coded / Cannot be coded / coded successfully NotEvaluated / CannotExist / DoesNotExist / Exists For StatisticalMeasures. Can not be derived from StateDefBaseType by extension, since the nat. language wording requires TextBefore and TextAfter the value instead of only a single Text. @@ Does it makes sense to derive this by restriction as it is currently done? @@ The audience values must uniquely identify the Representations within each Label. Properties describing a statistical measure. Provided to support generic application code that continues to function if additional indicators are defined. Classification of statistical measures into predefined categories like CentralValue, VarianceMeasure, Min, Max, Lower/UpperRangeLimit, etc. @@ add enumeration to finalize schema! Classification of statistical measures according to method, e. g. ConfInterval, Percentile @@ add enumeration to finalize schema! A value defining the method. For Method='ConfInterval' this would be 0.95 for a 95% confidence interval. Modifier definitions (probability, frequency, general) are grouped into sets for management purposes. Definition and sets are both derived from common base types. Abstract base type for all modifier definitions (probability, frequency, etc.) Label with abbreviations and wording for natural language reports. The audience values must uniquely identify the Representations within each Label. Definition of probability (= uncertainty) modifiers An estimate of a probability range for verbal modifiers, defined through two attributes. The upper/lower limits of probability modifiers may overlap. Note that it is possible to enter 0-1 to indicate that no estimate was possible. If present and true the current modifier indicates that the state to which it refers is present or true only due to a misinterpretation. The probability range should be 0 to 0 = certainly false. Definition of frequency modifiers An estimate of a frequency range for verbal modifiers, defined through two attributes. The upper and lower limits of several frequency modifiers may overlap. Note that it is possible to enter 0-1 to indicate that no estimate was possible. Definition of general modifiers Set of references to modifier definitions. A set has a key and can be referenced as a whole. Label to identify the current object in the user interface. The audience values must uniquely identify the Representations within each Label (no duplicate audience allowed). Note that separate uniqueness constraints exist that label representations must be unique for each parent element (e. g. a character) so that the parent element can be uniquely identified in the user interface. List of modifiers (Probability, Frequency, General) defined in the set. (Unenforced constraint: At least 1 modifier from 1 modifier category should be present!) Modifier references must be unique within each set (but different sets may contain the same modifier) Modifier references must be unique within each set (but different sets may contain the same modifier) Modifier references must be unique within each set (but different sets may contain the same modifier) Manually designed keys are a separate section: Defines a guided key (dichotomous or multifurcating key) that has been manually created with expert knowledge. Note that guided keys may also be automatically created by applications based on information in terminology and using shortest search criteria in the coded descriptions. Label to identify the current object in the user interface. The audience values must uniquely identify the Representations within each Label (no duplicate audience allowed). Note that separate uniqueness constraints exist that label representations must be unique for each parent element (e. g. a character) so that the parent element can be uniquely identified in the user interface. If the key is derived from a published data source this is cited here. If Citation is missing, it is assumed that the compiler or editor of the data is the original source of information. Creators, Revision status, and dates for this key. The root node of the designed key. Note: Applications will generally ignore the Statement element in the root node when the key is selected as a whole. However, if a key shall be used both as independent key and as a branch node in another key, Statement must be defined. In both cases CodedStatements may be used to define statements that are applicable to the entire key (i. e. they are implied in the selection of the key). [ATTR: key] A node in a designed key, containing the lead statement to follow and optionally the next question, or terminating at class identification, subkey, or node reference. The key attribute for nodes in a designed Key is required because an xs:key constraint exists on this attribute. It seems impossible in xml schema to make existence of keys optional but require those present to be unique and the target of keyrefs that point to these existing keys. If the user agrees with the statement (expressed as free-form text), then the node will be followed. (The audience-specific representations provide abbreviations, which in picture keys may be used as alt-text of the image. ExportToken will usually not be used, but a separate type seemed to be unnecessary.) The audience values must uniquely identify the Representations within each Statement. Statements in coded terminology equivalent to the text in Statement. This information is used when switching between guided and multiple entry keys. Each state listed in the collection is considered scored when the lead text is followed. Note that in the case of "A or B" statements in the key it is not possible to convert that into coded statements. Within CodedStatements, each state reference may occur only once. A node contains either further nodes (= Leads), a single references to another key or key node, or a class reference (biology: a taxon) as the result of an identification. Optional question that is answered by the Statement elements in each of the Leads below. Note that in most trad. keys the question is empty and only the alternative statements are written. The audience values must uniquely identify the Representations within each QuestionText. The set of alternative statements (which may be answers to QuestionText) At least two alternatives leading further on in the key must be provided. This element defines the tree recursively. Refers to a class name (in biology a taxon name) [ATTR: ref] Refers to another designed key in the Keys section. This feature allows cross references between keys. [ATTR: ref] Refers to arbitrary key nodes within the current or other keys, to allow building reticulations into the key. @@ This may need further discussion and testing! Allowing to jump into other keys requires the leads (=node) key to be unique across all keys, not only within a key!@@ [ATTR: ref] Start of types referencing the definitions. The first ones are already used in Terminology. Refers to a character (e. g. from within concept trees or from Descriptions). It consists only of a reference to a Character definition key. ref refers to a character definition key (Terminology/Characters/Character) Refers to a character state (e. g. from Descriptions). It consists only of a reference to a Character state definition key. ref refers to a character state key A collection of state references (CharacterStateRefType) [ATTR: ref] Refers to a node in a concept tree (e. g. to refer to a generic state set defined at this node) Refers to a node in a concept tree (Terminology/ConceptTrees/ConceptTree/...) Refers to a general modifier (e. g. from within character states) Refers to a general modifier (Terminology/Modifiers/General/Modifier) A collection of general modifiers (The sequence of elements in the collection is not informative.) [ATTR: ref] Refers to a frequency modifier (e. g. from within character states) Refers to a frequency modifier (Terminology/Modifiers/Frequency/Modifier) A collection of frequency modifiers (The sequence of elements in the collection is not informative.) [ATTR: ref] Refers to an probability modifier (e. g. from within character states) Refers to an probability modifier A collection of probability modifiers (The sequence of elements in the collection is not informative.) [ATTR: ref] Refers to a modifier set (e. g. from within a character, to enable a set of modifiers for this character) Refers to a modifier set (Terminology/Modifiers/Sets/Set) Refers to a Glossary entry (e. g. from tree nodes or character states) Refers to a GlossaryEntry (Terminology/Glossary/GlossaryEntry) Refers to an entire designed Key (e. g. if a key is referenced as a subkey from within another key) Refers to an entire designed Key definition Refers to a node in a DesignedKey (e. g. for reticulating keys) Refers to a node in a designed key Descriptions are a collection of natural language or coded description types. Both are derived from the same base type: Abstract base type for NaturalLanguageDescriptionType and CodedDescriptionType. The key attribute is currently not used in keyrefs from within this schema. However, it is considered generally useful to uniquely identify descriptions in federated situations. This is the description of either an abstract class (e. g. a biological species) or an individual object (e. g. a specimen). Refers to a class name (in biology a taxon name) [ATTR: ref] Refers to an individual object (e. g. a biological specimen). Objects may refer to observed objects as well as to collected and preserved objects. The identification of a specimen is stored in the resource section. [ATTR: ref] A description may have a limited geographical scope, if geographical variability is know to exist or is expected. @@Should we define additional scopes for the description, e. g. host plants for pathogens, or should be simply provide a free-form text element like this? A description may be further defined through a published data source for the nat. language or coded description. If Citation is missing, it is assumed that the compiler or editor of the data is the original source of information. Creators, Revision status, and dates of individual description (compare RevisionData in ProjectDefinition) Contains multiple resources (e. g. images). @@ In previous versions, a description may consist of resources alone, this is not possible after Paris - may need discussions! @@ @@ Also, it is no longer clear whether the images are also created by Creators, or who has the IPR to them! @@ Each media resource may occur only once in the collection. Descriptions entered as free-form text with optional (and potentially incomplete) markup referring to concepts (= char. tree nodes), characters, and states as defined in the terminology. Retains the full, unchanged original wording of the natural language description. Group, character, or state markup may be added (partial or complete), but these should not change the original wording sequence. Note that in contrast to CodedDescriptions, no uniqueness constraints are formulated, i. e. a character may occur multiple times, but even a state may occur multiple times within a characters. The latter may possibly be constrained in the future (to be discussed). Concept tree markup is used to mark organism parts, methodological sections, etc. [ATTR: ref] In most cases states are initially recognized, but character markup can be deduced from the associations between char. and states defined in the terminology. [ATTR: ref] Text between characters groups or characters is necessary if markup is incomplete. [ATTR: parsed] Descriptions entered as data referring to the terminology elements. CodedDescriptions must fulfill more rigorous consistency requirements than natural language descriptions and are more suitable for analysis. Furthermore, language-dependent annotations are minimized so that data can be easily reorganized and translated into multiple languages. The coded description is entirely controlled by the vocabulary and structures defined in the Terminology section. It contains keyrefs to descriptors and modifiers (plus numerical values for measurements). Free-form text is allowed in Notes or Annotation only. Separating data and terminology allows rearranging and refactoring the terminology, multilingual support through central terminology translations, and multiple hierarchical views. (The xml sequence of Character or ObservationsSet elements is not informative and may be changed at any time!) (a uniqueness constraint guarantees that (except in ObservationSets) a character may occur only once in each description and that each State, StatisticalMeasure, and CodingStatus occurs only once!) [ATTR: ref] Within a single coded description each character state reference may occur only once, i. e. it is not possible to state "flowers blue, or blue". (The uniqueness constraint must involve the key of the description, otherwise no two descriptions could use the same character! However, the character reference is not necessary, since state keys are unique across characters.) Note that this still allows repeated occurrence of character states in ObservationSets. Within a single coded description each character statistical measure reference may occur only once, i. e. it is not possible to state "mean=2.4; mean=2.6". Note that this still allows repeated occurrence of character statistical measures in ObservationSets. Within a single coded description and within each character, a coding status reference may occur only once, i. e. it is not possible to state "not applicable, or not applicable" (it is possible to state "unknown, or not applicable"). (The uniqueness constraint must involve the key of the description, and the character key, since coding status values are not defined globally for all characters. Observations form a container for repeated observations in a study. All observation objects are assumed to be obtained under identical conditions. A description may contain an unlimited number of observation sets. Each observation may contain several characters that have been observed together. An example is "leaf shape, length, and width". The sequence of Observation elements should be preserved (it has no analytical semantics, but it may be relevant if data entry is compared with the source) [ATTR: ref] Within a single coded description each character state reference may occur only once, i. e. it is not possible to state "flowers blue, or blue". (The uniqueness constraint must involve the key of the description, otherwise no two descriptions could use the same character! However, the character reference is not necessary, since state keys are unique across characters.) Note that this still allows repeated occurrence of character states in ObservationSets. Within a single Observation each character reference may occur only once. Multiple observations in an ObservationSet may and should use the same characters (the latter is not validated by the schema). Within a single coded description each character reference may occur only once, i. e. it is not possible to state "flowers blue, with 5 petals, flowers red". Note that this still allows repeated occurrence of characters in ObservationSets. The following types are used in the Descriptions section to code data by reference to characters, states, and modifiers defined in the Terminology. Abstract base type for character data in coded descriptions. It primarily contains a reference to a Character definition key, plus a set of references to character state definition keys. @@This base type may be redundant. Is Sequence really relevant both in coded synthetical data as well as in raw data?@@ Constrained to 'description' or 'terminology' (default). If Sequence = description the sequence of states in the xml document is considered to be meaningful and can be used to distinguish between, e. g. 'round or elliptic' and 'elliptic or round'. Used in coded descriptions to make statements covering a single character of a class or object. The type provides a ref to the definition of a character (it is derived from CharacterRefType) States are 'scored' in a description by referring to a state in the character definition. All notes and modifiers are applicable to this element. [ATTR: ref] Statistical measures contain synthetic information like distribution parameters, sample size, etc. Refers to a StatisticalMeasure defined for the current character. It may have associated Notes (public notes) and Probability modifiers, but no general or frequency modifiers. The value is stored in an attribute of type double. [ATTR: ref, Value] Inapplicable, unknown, etc. It may have associated Notes, but no modifiers. [ATTR: ref] Note: In an object (= specimen) description only a single indicator may occur per character. However, for a class (e. g. a genus) it is up to the aggregation/generalization process whether to create multiple coding status values or not. For example, an expression "unknown or not applicable" may be useful for analytical purposes. Media specific to the character and the current object or class described. Example: microscopic picture of spore shape in a specimen. Used in coded descriptions to make statements covering a single character inside the repeated Observation container. The type provides a ref to the definition of a character (it is derived from CharacterRefType) plus references to states and a single real numeric value. States are 'scored' in a description by referring to a state in the character definition. All notes and modifiers are applicable to this element. [ATTR: ref] Value is only applicable to numeric characters (currently not validated through the schema!). For each character only a single value may be stored (but see ObservationSet for repeated observations) [ATTR: Value] The abstract base type defines the common attributes that are used in both coding status (not modifiable) and normal categorical state (modifiable) references. Public notes or comments, for multiple audiences. Applications may, e. g., report the text in brackets after the character state. The audience values must uniquely identify the Representations within each Note element (no duplicate audience allowed). If a new description is created as a child of the current description (in the class hierarchy or through an object identification), the current state will be inserted. This may be a normal state or a coding status. The inserting mechanism is available in addition to the dataless inheritance mechanism in the class hierarchy. @@ Open issue: Name for this element needs to be decided@@ @@ To be discussed. Is a given state a cached result of an inference or deduction process in the class hierarchy, a calculated character, or is it an original statement? This could also be defined as an attribute, currently as element to avoid being overlooked! Like CharacterStateData_BaseType, but allows expression of state probability, frequency, and general modifiers. Expression of probability: 'probably', 'perhaps', etc. [ATTR: ref] Choice of numeric value, numeric range, or modifier reference Numeric statement, single Value attribute Numeric frequency range (Lower/UpperEstimate attributes) Reference to globally defined frequency modifier (ref attribute). General modifiers of intensity ('very', 'weakly'), location ('at the tip'), timing ('spring', 'autumn'), etc. (The seq. of modifiers is informative!) [ATTR: ref] Similar to CharacterStateDataType, this one is intended for statistical measures. The ref attribute points to a statistical measure definition inside a character definition. @@Note: the necessity of Note inside statistical measures needs to be discussed. On measures like min, max, mean this will be difficult to support during natural language reporting! However, on measures like sample size they may be valuable. Expressions of probability: 'probably', 'perhaps', etc. are defined for numeric values and statistical measures. Frequency expressions are considered not applicable to statistical measures! [ATTR: ref] Similar to CharacterStateDataType, this one is intended for CodingStatus references. The ref attribute points to the key of Terminology/CodingStatusValues/CodingStatus @@Is it ok to inherit the ref attribute from the state base type, even though it points elsewhere? (single Value attribute of type xs: double in an otherwise empty element) This type is used coded descriptions and as base type for natural language descriptions. To hide the English- formatted value from natural language descriptions using other numeric formats, the value must be stored in an attribute of type double rather than as element content! The NLD = Natural Language Description versions are difficult to derive using the type system. They have been formally derived through restriction, however this results almost in a complete redefinition! For NaturalLanguageDescriptions. Refers to groups (i. e. nodes defined in concept trees) Text between Group and Character elements is necessary if markup is incomplete. [ATTR: parsed] In most cases initially only the states are recognized. However, character markup can be deduced from the associations between characters and states defined in the terminology. [ATTR: ref] For NaturalLanguageDescriptions. The sequence and cardinality of elements is undefined and Text elements may be freely interspersed. Any text within a character not yet identified a one of the following elements. [ATTR: parsed] Character state data that permit Text elements within. [ATTR: ref] Statistical measure that permits Text elements within. [ATTR: ref, Value] Inapplicable, unknown, etc. It may have an associated Note, but no modifiers. [ATTR: ref] The value is stored in an attribute of type double. The original text of the value may follow inside in the optional Text element. Note that the string in text will usually use a different number format than the English format required by xml [ATTR: Value] Like CharacterStateDataType, but for use in the NaturalLanguageDescription markup container. The sequence and cardinality of elements is unconstrained and Text elements are provided between all elements. Public notes or comments (for multiple audiences, esp. languages) associated with the state. The audience values must uniquely identify the Representations within each Note element (no duplicate audience allowed). [ATTR: ref] Choice of numeric value, numeric range, or modifier reference Numeric statement, single Value attribute Numeric frequency range (Lower/UpperEstimate attributes) Reference to globally defined frequency modifier (ref attribute). [ATTR: ref] Variant to be used inside the NaturalLanguageDescription markup container. Public notes or comments (for multiple audiences, esp. languages) associated with the state. The audience values must uniquely identify the Representations within each Note element (no duplicate audience allowed). Expressions of probability: 'probably', 'perhaps', etc. are defined for numeric values. Note that frequency expressions are not defined here! [ATTR: ref] Variant to be used inside the NaturalLanguageDescription markup container. Variant to be used inside the NaturalLanguageDescription markup container. NLD modifier references (coded descriptions can use the simple modifier types, but NLD need additional Text inside): Variation of GeneralModifierRefType, with Text inside Variation of FrequencyModifierRefType, with Text inside Variation of ProbabilityModifierRefType, with Text inside ResourceConnectors and references to these objects: Abstract base type for connectors to resources (publications, class names, specimens, etc.). Provides either a simple free-form text, or a connection to an external resource. Defines a service used to resolve ExternalID. This could be the URI of a wsdl-file of a web service. Can be URI, but does not have to. Examples: "ref://x.y.fr/floras/smith/1998", "432787632", "SMI1998_DZT" Human readable representation; this may be the only data item if no machine readable ID exists. Example in the case of a publication resource: "Smith 1998. Flora of Erehwon, Fingers Publishers." If an external ID exists, this is considered cached information and required to be present. @@ Should this be multilingual? Difficult if external source does not inform about language! @@ Should this be called Label instead? Used for class names (taxon names). Provides either a simple free-form text, or a connection to an external resource The resource connector here may be changed to a derived type that also allows to enter a structured form of taxonomic names (Genus/Higher taxon, rank, optional specific/subspecific epithets, authors). However, note that simply splitting into taxon name and authors does not work, because authors may be in the middle of the parts of the taxon name (e. g. in botanical autonyms). Note that class is not restricted to accepted class names (compare Synonyms in ClassHierarchyNodeType) @@ For biological taxonomic names: order, family, species Needs discussion: should this be constrained vocabulary, or in any language? Used for class hierarchies (taxonomies) A node in a class hierarchy tree (biology: taxonomical hierarchy) A node either contains a class reference (biology: taxon) and optionally (if it is a higher level class) further child Nodes, or it is anonymous and contains only further child Nodes. Nodes may not be empty. (The complex choice/sequence expresses the A, or B, or A and B constraint which is difficult to express in xml-Schema.) The class (biology: taxon; with optional synonyms) that identifies the node. Refers to a class name (in biology a taxon name) [ATTR: ref] (Synonymy is not a direct concern of SDD. However, the expression of synonymy may be essential for reports or to express the concept of a class to information consumers.) Refers to a class name (in biology a taxon name) [ATTR: ref] If class identification is present, further nodes are optional. The class identification may be missing, but then further Nodes are required. Used to define objects that are described (collected and preserved objects as well as objects that have only been observed). In biology a collected object is called a specimen. Provides either a simple free-form text, or a connection to an external resource. Identification of specimen object. The information may come from the service provider, but it must be converted to refer by ref to Resource/taxon names. If unidentified this may point to a higher taxon or a special class "unknown" introduced for that purpose. [ATTR: ref, IdentificationIsCertain] If present and false the name cited above is uncertain, e. g. as in 'Abies cf. alba' False = object has not been collected and preserved (it may still be databased in an observation database and have an ExternalID!). The default for this element is true, i. e. if the element is missing the object has been collected/preserved. Used for Agent documentation (an Agent is a person, project, organisation, or software agent). Currently used for authors, editors, contributors, and translators. Ideally it connects to an outside definition or documentation of the Agent. This may be a person as well as an organisation name Applicable only to persons This is an information URL pointing to a homepage with further information. If the person has a truly global URN representing its name, it is expected that this is used as the ExternalID above. Used for resources like publications, laboratory notes, speeches, etc. Provides either a simple free-form text, or a connection to an external resource. Used for resources like geographical names or places. Provides either a simple free-form text, or a connection to an external resource. Extends resource connector type with optional encoded data content (esp. images embedded in xml document) and with a Type (Image/Audio/Video, etc.). Type of medium @ To be discussed! @ An optional caption for a resource, esp. if it will be presented embedded in another document. Captions can be provided for multiple audiences. @@ Issue: captions, even in multiple languages, may be obtained from the service provider. Even then it may be desirable to override them! Do we need two collections: InheritedCaption and CaptionOverride? This seems to be awkward whenever there is no ServiceProvider! Also, FreeFormDescription can contain a "title" only in a single language! @@ Optionally the full resource data may be embedded (as an alternative or in addition to defining a URI) Defines an element with a ref attribute pointing to a Class in Resources (in biology: Class = Taxon) Refers to a class name (biology = 'taxon'; Entities/Classes/Class) Defines an element with a ref attribute pointing to a Specimen defined in Resources @GH@: Discuss whether to add a separate element for collection abbreviation (cached information form provider or from Refers to a described object identifier (biology = 'specimen'; Entities/Objects/Object) Defines an element with a ref attribute pointing to a Publication in Resources (Resources/Publications/Publication) Defines an element with a ref attribute pointing to a Locality in Resources (Resources/Geography/Locality) A collection of LocalityRefType elements Reference to a locality defined in Resources/Geography [ATTR: ref] Defines an element with a ref attribute pointing to a MediaResource defined in Resources (Resources/MediaResources/MediaResource) A collection of MediaResourceRef elements. (the sequence in instance is not informative!) [ATTR: ref] Defines an element with a ref attribute pointing to an Agent (Resources/Agents/Agent) Reference to a Agents (Resources/Agents/Agent) The first time a creator-agent has made a contribution to the object to which it was added by reference. The first/last contribution records are specific to the role of a creator-agent. If a creator has contributed both as an author and later as an editor of data, two references in two role containers will exist. Consequently, the dates for the two roles are recorded separately. A collection of AgentRefType elements, i. e. Agents forming a team like an author team. (The xml sequence of elements in this collection is informative!) [ATTR: ref] Metadata (application, revision, IPR; Creators and RevisionData are closely related to the AgentRefsType defined above): Creators = authors, editors or contributors. At least one of Authors/Editors is required. (Reason for choice: one of Authors / Editors is required) Authors that have originated the content; in the sequence of importance. Editors (see below) if present in addition to Authors. Editors that have revised content generated by multiple authors or contributors; in the sequence of importance. In general Editors should co-occur with Authors or Contributors (which is, however, not enforced). The sequence of Contributor Agents must be preserved during processing, but the semantics of it are defined by the authors or editors of the project: either importance or alphabetical sequence. In addition to authors/editors, several people may have translated audience- specific texts. @@Request for discussion: Translators are currently not listed on individual Representation elements. Only a general general statement about all translations together can be made. Should this be changed? Also: should one Representation be marked as 'Original/SourceForTranslation'? Will we have something like a 'normative' version? @@ [Unused!] Creators = authors, editors or contributors. It is generally desirable that at least on author/editor is named. In the case of legacy data, this may, however, not be feasible. Currently we are attempting to require the presence of at least one creator. If this should not be possible, we may have to go back to this type! Authors that have originated the content; in the sequence of importance. Editors that have revised content generated by multiple authors or contributors; in the sequence of importance. In general Editors should co-occur with Authors or Contributors (which is, however, not enforced). The sequence of Contributor Agents must be preserved during processing, but the semantics of it are defined by the authors or editors of the project: either importance or alphabetical sequence. In addition to authors/editors, several people may have translated audience- specific texts. @@Request for discussion: These are currently not listed in the Representation elements, but could a single one could be easily (however, not several!). Also: should one Representation be marked as Original/SourceForTranslation/Etc.? @@ RevisionData (creators, dates, revision) for project, character, glossary entry, and description data. Date/time when the object (project/ terminological definition/description) was initiated. Applications may initially set this to the system date, but the project authors must be able to change it to an earlier date if necessary. Date/time when the last change was made (either in terminology or in descriptions) Enumerated categories, which are intended to be rough estimates by the authors/editors, not exact statements. RevisionStatus refers primarily to the correctness of existing data. This includes an estimate of completeness relative to the stated scope (e. g. taxonomic or geographic scopes in the project definition). However, if the project goal is to describe the frequent species of a taxon, the project status may be 'FullyRevised' even if many species are still missing. Application specific data, providing an extension mechanism to the SDD model. SDD conforming editing applications are expected to preserve the information of other applications when importing and later exporting data to support lossless round tripping. Recommendation: Each application may read out its own information. Any other target information present should be preserved and output when a new document is generated. This is designed to support item potent round tripping data between two applications. This implies that no dependency between the settings and the descriptions and the terminology setting should be relied upon. The Application element must contain application-defined element content (not further validated by SDD). It is not possible to directly store a text string (content model mixed="false"). [ATTR: name, version] Identifier chosen by the target application for which the current information is intended. The only purpose of this attribute is that the application generating data in the application container recognizes the target identifier as its own, while other applications just pass this through. Optional information about which version of the application generated these application-specific data. Annotations of objects occur together with labels or similar identifying objects. However, are not audience-specific and separate from the Label collection: = reuse of Annotation and ApplicationData, i. e. designer and application 'annotations' Internal notes/management comments (not multilingual). Annotations should be displayed only in a 'designer' or 'revision' mode' and are expected to be invisible to users who only want to apply the data. They are appropriate for rough, unedited comments, but should not contain confidential information. Application-specific data (= extension mechanism) = AnnotationGroup + GlossaryEntry reference. Reference to the definition of term or concept in the glossary; may be provided for multiple audiences and may include media resources like images. [ATTR: ref] (This identity constraint is placed on a global element!) Internal notes/management comments of the designer (not multilingual). Internal notes should not be displayed to consumers of a data set. Appropriate for rough, unedited comments, but should not contain confidential information. Application-specific data (= extension mechanism) Key/ref infrastructure: This allows to define (and redefine) the value type for keys and keyrefs (except for audience keys, which are xs:Name) Contains a key and a generic debugkey attribute. An optional attribute to add a human-readable equivalent to the numeric primary identity key, intended to simplify debugging SDD applications. The attribute can be discarded or updated at any time. Applications should not produce exports containing this attribute, instead it can be generated using xslt (based on labels/abbreviations. Currently contains only the generic debugref attribute. The ref attribute could be defined here as well, but this would prevent adding annotations to clarify which key a ref is pointing to! An optional attribute to add a human-readable equivalent to the numeric ref to simplify debugging SDD applications. The attribute can be discarded or updated at any time. Applications should not produce exports containing this attribute, instead it can be generated using xslt (based on labels/abbreviations reached through key/ref). Basic simple types: normalized string required to be at least 1 character long (i. e. either element/attribute may be optional, but if they are required the content must not be an empty string) normalized string restricted to 1..255 character length (i. e. required, may not be empty string) [Unused!] Colors defined as RGB (red-green-blue), like in html. Example: #EE88FF [Unused!] Colors defined as HSV (hue-saturation-value (? value correct?)). @@ Unclear whether we should offer this@@ [Unused!] Valid states are true, false, and default. Currently no longer used, but preserved for future reuse. A name whose only value is "default", used for union definitions. String containing a format pattern of the type used in the xslt format-number function Restricted to integer values from 0 to 5, indicating expertise from schoolchildren to taxonomic expert. Recommendations for interpreting and choosing the expert level: 0 = unspecified 1 = elementary school (year 1 to 6) 2 = middle school (year 7 to 10) 3 = high school (year 11 above) and general public (trying to avoid any specialized terminology or jargon) 4 = university students or (partly) trained personnel (using terminology, but avoiding or explaining problematic terminology) 5 = experts (using the full range of terminology) 0 = unspecified expertise level. Use this if the expertise level of can not be assessed (e. g. when exporting data) or is considered irrelevant. elementary school (year 1 to 6) middle school (year 7 to 10) high school (year 11 above) and general public (trying to avoid any specialized terminology or jargon) university students or (partly) trained staff (using terminology, but avoiding or explaining problematic terminology) experts (using the full range of terminology) Enumerated list to improve application interoperability. It is unclear whether a simple SDD list(as presented here ), or a generic MIME type support is more desirable. RevisionStatus is applied to the project as a whole as well as to individual descriptions. RevisionLevel 1 of 5, for example less than ca. 20 % of the data are revised. RevisionLevel 2 of 5, for example ca. 21-40 % of the data are revised. RevisionLevel 3 of 5, for example ca. 41-60 % of the data are revised. RevisionLevel 4 of 5, for example ca. 61-80 % of the data are revised. RevisionLevel 5 of 5, for example more than 80% revised (but not yet completed). Revision completed. This does not necessarily imply that the data are complete in a scientific sense. They are completely revised only under the available time and the goals set for the project. Enumeration used in CodingStatus/Generalization Enumeration used in CodingStatus/Generalization Defines a specific method of univariate statistical measures supported by SDD. The combination of Method and MethodValue must be unique. MethodValue -1 = Minimum, MethodValue +1 = Maximum Confidence interval for the mean Undefined central value or lower or upper limit of mean, with MethodValue 0, -1 and +1 respectively. Important for legacy data where the statistical measure used is not known. If it is known that a range is a guessed rather than calculated value, the method available for this should be choosen. Range limits calculated as mean plus minus standard deviation. MethodValue defines a factor with which the s.d. is multiplied before it is added to the mean. Thus, a range of 2 s.d. has method values of -2 and 2 for lower and upper limit, respectively. Defines an unspecific broad classification of the univariate statistical measures supported by SDD. Most applications reporting information for human consumption can rely on these reporting classes in their decision how to present the data. If a range as defined in the numerical mapping definition is to be compared with sta Defines the type of a character. It refers to the type of the underlying data of the character. For example, leaf length should be typed numeric, even if currently only represented by categorical range definitions (0-10 cm, > 10 cm). nominal = unordered categorical states (DELTA: type 'UM') ordinal = ordered categorical states (DELTA: type 'OM') @@ needs discussion:@@ ordinal-discrete = ordered categorical states (DELTA: type 'OM') @@ needs discussion:@@ ordinal-interval = ordered categorical states (DELTA: type 'OM'). Like ordinal-discrete but states can intergrade. Example: colors. integer = cardinal data scale (DELTA: type 'IN') interval = real numeric = floating point values (DELTA: type 'RN') Proposal to add color as a special color type, to provide special support for RGB/HSV polygons of color values, and to support special interaction. @@ The special data structures required for this are not yet supported in other areas of the schema and need further discussion! @@ Defines the type of a concept tree (list of enumerated values to support application interoperability). Categorizing characters into basic property types (e. g. color, 2-dim. shape, 3-dim. shape, surface texture, taste, smell, behavior, physiology, measurements, etc.) greatly improves the analysis and management of larger character sets and is therefore recommended. [@ Note: Only a single concept tree should have this hierarchy type. (not enforced in schema, how can it be enforced? Other types occur multiple, i. e. one cannot make a UNIQUE statement on attribute! @] A hierarchy that organizes characters by observation method, e. g. field observation, light microscopy, electron microscopy, molecular methods, culture techniques, etc. A hierarchy that organizes characters by a morphological "contains" or "part-of" hierarchy: plant = root/stem/leaf, leaf = base/stipules/petiole/lamina, etc. Used for concept trees that fall into none of the categories above. A concept tree of type "SubsetFilter" is intended only for the purpose of filtering characters. It will often be a flat list of characters. Applications should not offer it as a choice when the user selects a hierarchy for displaying or reporting purposes. Note that conversely, the filter selection dialog in applications should not be restricted to trees of type SubsetFilter. Any concept tree, including part, method or property hierarchies may be used as a filter to define character subsets. PresentationTable concept trees are small sets of a usually a few characters that allow to display data in a tabular arrangement. It is possible to define tables in more than 2 dimensions. By default the innermost dimension is considered cells in a row, the next rows in a table. Any further dimension may be displayed as multiple 2-dimensional tables one below the other. However, applications may also offer a browser based on pivot tables. - Note: Trees of type PresentationTable should not be offered in the user interface when selecting a browsing tree. Defines the intended roles that a designer may assign to a concept tree (list of enumerated values to support application interoperability). Setting this purpose in a concept tree is a recommendation to applications with a user interface to use this as the default hierarchy for any editing or reporting purpose. The application may, however, enable the user to select any concept tree. Setting this purpose in a concept tree is a recommendation to applications with a user interface to use this as the default hierarchy for editing the terminology. The application may, however, enable the user to select any concept tree. Setting this purpose in a concept tree is a recommendation to applications with a user interface to use this as the default hierarchy for editing the description data set. The application may, however, enable the user to select any concept tree. Setting this purpose in a concept tree is a recommendation to applications to use this as the default hierarchy for building designed keys (e. g. dichotomous keys). Setting this purpose in a concept tree is a recommendation to applications to use this as the default hierarchy for interactive identification. Setting this purpose in a concept tree is a recommendation to applications to use this as the default hierarchy for natural language reporting. Double precision numeric value in the range of [0..1] Basic generic complex types (date/time, author/editor/contributor, application-specific data etc.): (lower/upper estimate attributes; used both for probability and frequency!) @@ To be discussed, see Paris 2003 minutes. Information about whether the range is a rough estimate about frequency or probability values that has been guessed after coding occurred, or whether it is rather a normative definition that was known at the time of coding. Combines a publication resource reference with a detail location within that reference (esp. page number) Refers to a publication as defined under Resources/Publications [ATTR: ref] Location within publication where the cited data can be found : Page, table, figure number, database record, html document bookmark, etc. (not the inclusive pages of the article). Verbatim name as it appears in citation. @@ Do we need this? @@ Collection of terms (string 1-255) The following types are increasingly complex subtypes of elements from xhtml allowed in certain elements. ! The necessary xhtml attributes are missing and need to be added to make them functional. Alternatively, appropriate elements from xhtml should be imported and encapsulated here. Allows basic character formatting using xhtml elements plus three semantic elements (citationauthor, taxonauthor, taxon; intended to be rendered formatted and for analysis). Note that no further formatting is supported within the semantic elements (taxon etc.). (Note that this is a mixed content model, allowing text between elements!) 'Emphasis' logical markup (phrase): usually rendered italic. 'Strong' logical markup (phrase): usually rendered bold. Logical markup: subscript Logical markup: superscript Font style markup: italic markup that could not be interpreted as (preferred) either emphasis or taxon. line break (empty element) markup for inserted/deleted text Recommended report rendering: italics Author of a referenced citation. Recommended report rendering: may be either ignored or rendered as small caps Author of a taxon. Recommended report rendering: see citationauthor [Unused!] Extends the FormattedSimpleTextType and allows in addition to basic character formatting also the use of <img> and <a> elements. This should probably be implemented through references to types from the real a xhtml schema, if it is possible to refer to an appropriate subset. @@ Probably this does not work yet, since the type extension itself is not a choice but a sequence!!! anchor/hyperlink element image element [Unused!] Extends the FormattedInlineTextType and allows the following block level elements as well: p, ol, ul, li, h1-h6. This should be defined through reference to a full xhtml fragment definition, but it must be without html/header/body elements! Currently only p element added as a example for the discussion. p would need attributes added to function properly!! If possible reuse xhtml modules! Formatted text with an additional attribute "parsed". Used for Text elements in the NaturalLanguageDescription container. The following 4 types define a base element (which may later carry BlankBefore/BlankAfter attributes if this should be necessary) and variants of wording definitions for natural language report rendering. These types are used exclusively in the Audience-specific LabelPlusWording1-3 container types. A text element used to define wordings for natural language output. Currently the handling of blanks is assumed to be through leading and trailing blanks present in character content. If this should not work due to automatic trimming, the type may require two optional attributes like BlankBefore / BlankAfter of type BooleanTripleState. Currently the type is a synonym of FormattedSimpleTextType is, but this may later be changed. Natural language wording for elements without content (= 'SimpleWording'). Wording for elements that have no further children in the natural language wording tree, e. g. char. states. Natural language wording for container elements with non-repeated content (e. g. modifiers around states) (= 'ContainerWording') Wording output before the contained elements. For characters this is the main character wording that is output before the states. (Optionally both before and after may be present) Wording output after contained elements. In the case of a character this is the wording after all states, or after numerical data and after a measurement unit where present. Natural language wording for elements with repeated content like characters that contain multiple modifiers + states. (= 'Array-' or 'ContainerWording') Normally the delimiters defined in the language rules will be used. However, they can be overridden here. Natural language wording for operators (and, or, with, to, etc.). Contains 2 attributes, containing blank-separated lists of multiple starting patterns for next element. Example: In Spanish 'y' becomes 'e' if next word starts with 'i' or 'hi', but not 'hie'. Use 'i hi'/'hie' for StartWith/ButNotWith to define this. Text used if condition is fulfilled. Text used for operator unless the condition in IfNextElement is fulfilled. @@ check later whether still necessary! This delimiter is used if only 2 elements are present. Examples: en: ' or ', de: ' oder ' If 3 or more elements are present, this delimiter is used between all elements, except before the last element. Examples: en: ', ', de: ', ' If 3 or more elements are present, this delimiter is used between the second-but-last and the last element. Examples: en: ', or ', de: ' oder ' The following types are audience-specific (i. e. they refer by a ref mechanism to audiencekey values). Note that some types are used only a single time, but it was thought more transparent to define all audience-specific collections and representations through types rather than make this dependent on the frequency of use. Base type; defines an element with a ref attribute pointing to Audience definitions (different data type from generic ref!) Audience-specific project header information A short, concise title. This does not support any formatting! Free-form text containing a longer description of the project. A free form text acknowledging support (e. g. grant money, help, permission to reuse published material, etc.) Disclaimer statement, e. g. concerning responsibility for data quality or legal implications. Not optional! At least a copyright statement is required! A concise copyright statement Optionally, an expanded copyright statement may include more detailed copyright information Free-form text defining conditions under which the data may be distributed or changed. To be used if data are placed under a public license (GPL, GFDL). Placing data under a public license is recommended. Free-form description of geographic coverage of descriptions available in the current project. Free-form text describing taxonomic groups covered by the project. A label = collection of audience-specific label representations (without abbreviations or natural language reporting wordings), used e. g. for concept trees or modifier sets. Audience-specific simple label representation (= without abbreviations or natural language reporting wordings) [ATTR: audience] Audience-specific label representations (without abbreviations or natural language reporting wordings); used e. g. for concept trees or modifier sets. Text of the normal label, intended for screen display or reports that accommodate unabbreviated labels. Label (incl. abbreviations) Audience-specific label representations (incl. abbreviations) [ATTR: audience] Audience-specific label representations (incl. abbreviations) Restricted to 20 characters maximum length, including blanks. Label abbreviations are especially important when displaying information in a tabular format. When missing, applications may abbreviate the label, which may lead to duplicate strings. Normalized string restricted to 1..20 character length. Highly constrained version of the label (max. 12 characters, only uppercase letters, no blanks). Defined to support exports to formats requiring very short and simple names or labels, especially phylogenetic or statistical analysis software like NEXUS or SAS. Small multimedia resource to be displayed in addition to the label. An icon should be recognized fast. It will usually not be informative enough to base decisions on it alone. Example: in a concept tree a leaf icon image is provided for the node containing leaf characters. [ATTR: ref] A set of multimedia resources to be displayed in addition or instead of a label, e. g. to select a state of a character during identification. If more than one resource is defined here, the assumption is that they will normally all be consumed before making a selection. The size of the resource should be sufficiently concise to view ca. 6 selectors at the same time, or listened to ca. 6 audio extracts before making a selection. - Icon and Selectors are audience-specific (e. g. image with abbreviation, bird-call with spoken text). [ATTR: ref] Label (incl. abbreviations and a single wording) Audience-specific label representations (incl. abbreviations and a single wording for natural language reporting) [ATTR: audience] Extends LabelPlusAbbreviationRepresentationType with a single wording element. Label (incl. abbreviations and a wording before and after the contained elements) Audience-specific label representations (incl. abbreviations and wording for natural language reports) [ATTR: audience] Extends LabelPlusAbbreviationRepresentationType with a wording before and after the contained elements. Label (incl. abbreviations and a wording text before, after, and between the contained elements) Audience-specific label representations (incl. abbreviations and wording for natural language reporting) [ATTR: audience] Extends LabelPlusAbbreviationRepresentationType with a complex wording element. Used in concept tree nodes and character references. Allows to define a text before, after, and between elements; used during natural language reporting. An entry in the terminological glossary, providing an attribute "key" by which the entry can be referred to. Audience-specific representation of a glossary entry. All audience-specific versions must define the same concept. If, for example, a fructification would be considered a 'berry' in French but not in Chinese (i. e. the definitions have different widths), these definitions must be placed in different GlossaryEntry elements, not in different Representations. [ATTR: audience] Audience-specific definitions primarily aimed at human consumption, but with the intent to be useful to computer linguistic ontological agents as well. The head term (one or several words) appears at the start of the definition and denotes the concept being defined. For characters and states the term is often identical to the Label, but this is not necessarily so, e. g. in tree nodes where term needs to carry the context. A one or several paragraphs long definition (glossary entry), explaining the concept (meaning, semantics) of a character, state, etc. Creators, Revision status, and dates of the this audience- specific definition. Optional URI to an external definition in addition to the internal Definition above. ExternalReference may differ between different audiences. Multiple citations (publication + page number) If the Definition element is missing, the ExternalReference (URI to an external definition) is required. Audience-dependent resources used in the definition (e. g. images with text, videos with speech, or images intended for audiences of different expertise). Each media resource may occur only once in the collection. Kind-of or is-a relationship (class inheritance hierarchy) Each concept term may occur only once in the collection. Part-of or aggregation relationship (class composition hierarchy) Both KindOf and PartOf relationships define 'broader terms'. Each concept term may occur only once in the collection. @@ To be discussed: Do we need both adjacent and connected? Example: The thumb is adjacent to the index finger, connected to the palm of the hand, and part of the hand Each concept term may occur only once in the collection. Each concept term may occur only once in the collection. Related concepts and terms. Used to express unspecific relations not yet expressed in the previous relationships. The list of related terms may also be viewed as a keywords list! Each concept term may occur only once in the collection. Container for multiple audience-specific representations of a (publicly reported) Note as text (optionally with basic formatting). Used, e. g., inside state, statistical measure, coding status, etc. references in descriptions. [ATTR: audience] Audience-specific representation of a (publicly reported) Note as text (optionally with basic formatting). The type provides an audience reference in an attribute. [The presence of the (seemingly superfluous) text element has two advantages: 1. Cleaner typing; adding an audience attribute directly to FormattedSimpleText type would require multiple inheritance. 2. In nat. language markup, Text surrounds all verbatim text. Retrieving all Text content retrieves the original text prior to markup.]