### Copyright © TDWG (Taxonomic Databases Working Group, www. tdwg.org), 2004. This file is a special version of the SDD XML schema, reduced to contain only the most important features of terminology and coded descriptions. It may be used only for viewing convenience and may not be distributed independently from the primary schema files (SDD.xsd, UBIF.xsd, UBIF_TypeLib.xsd, etc.). The inclusion of all parts starts below: !###

XML schema to encode descriptive data in biology and other subjects. The primary goal of the design is to increase the knowledge and availability of knowledge about the diversity of life on earth. However, it may be used in many other areas (including medicine, pathology, archeology, anthropology) wherever objects or classes of objects are described for later reidentification.

The schema was designed by the Structure of Descriptive Data (SDD, http://160.45.63.11/Projects/TDWG-SDD/index.html) group. SDD was established 1999 as a subgroup of the Taxonomic Databases Working Group (TDWG, www. tdwg.org) of the International Union of Biological Sciences (IUBS). The author of the current schema version and of all annotations is G. Hagedorn, Berlin. The requirements for an SDD schema where elaborated in 6 major meetings of the SDD group and in discussions over the SDD email list. Over 60 people contributed to these discussions. However, the help, criticism and energy of Bob Morris, Kevin Thiele, Bryan Heidorn, Guillaume Rousse, Steve Shattuck, Donald Hobern, Trevor Patterson and Nicolas Bailly is specially acknowledged!

Copyright © TDWG, 26. June 2004. Licensed under GNU GPL 2 (http://www.gnu.org/licenses/gpl.html) - with the following restriction: This is a preliminary version (0.91!) for testing purposes. Permission to use this schema is granted to all scientific or commercial projects for a testing period of up to 3 years. After this time computer programs using this schema must either be discontinued or converted to the final version of this schema.

Conventions:
Element or attribute names starting with underscores (__) are present in the schema for discussion purposes only and should be only experimentally used. Annotations containing @ indicate unfinished points of discussion.

Note: blockDefault="#all" in xs:schema prevents substitution and that in instance documents derived types can be used in elements typed to the base type (which otherwise is possible using xsi:type=""). - finalDefault is not set, further type derivation is currently not considered problematic. Please contact us if you believe otherwise. Note that according to the w3c discussion forum, the developers of xml Schema consider to drop the final attribute in the upcoming XML Schema version 1.1. - Nillable: xsi:null is not supported in SDD documents (schema declaration nillable="false" is default, not explicitly stated).

This imports the UBIF schema file (SDD uses the same namespace as UBIF!). DescriptiveData must be placed inside the UBIF top-level Datasets/Dataset structure as the last element. Because of keyref constraints, this schema depends on the imported UBIF root! Descriptive data itself that are specific to SDD, i. e. descriptive terminology, coded and natural language descriptions and stored identification keys. ## in UBIF this is xs:any to allow SDD, ABCD or other schemata here! Within UBIF (Unified Biosciences Information Framework), this represents the node where the SDD-specific data start Defines the operational terminology (parts, characters, states, etc.) in which descriptions are expressed. The terms are defined by the biological specialist(s). They are used in the descriptions through references to their 'id' attributes). General vocabulary lists and rules; applicable outside of the domain of descriptive data. In contrast to enumerated types in the schema, new values and labels for new languages/audiences can be defined. Defines the semantics and labels of coding status values (e. g., unknown, not applicable, not interpretable). Coding status values (= 'missing data indicators', = 'special states') provide standardized reasons why data are missing. Unlike most elements in Terminology, these are constrained by the SDD model and can only be extended by revising the SDD standard (may be changed to user-definable in a later version of SDD). Labels are already user-definable to support multiple audiences. The labels and abbreviations given are only recommendations. They can be freely changed as long as the semantics are preserved. [ATTR: id] Certainty, frequency, and other modifiers modify categorical states or statistical measures. Modifiers are defined for the entire project, but must be enabled for characters in concept nodes to be available when editing descriptions Modifiers within a group may be ranked/ordered; in descriptions, modifiers from the same group are combined with 'or'. All modifiers in a group must be of the same type (certainty, frequency, etc.). Characters are the operationalized concepts used in descriptions. They define categorical states and quantitative values or statistical measures. Characters are defined in an unordered flat list. They can be used alone or in combination with modifiers or concept trees. Non-abstract types derived from AbstractCharacter. In OO programming, a polymorphic collection of the base type may be used! Hierarchies of property, object part, observation methods or other concepts. Concepts can be operationalized by referring to characters (only these allow scoring of data in the descriptions). Concept states (property or kind-of-part, reusable in multiple characters) and char. dependencies are expressed here as well. Concept trees may also be used to define flat character subsets for filtering purposes. @@DISCUSS: should concept tree hierarchies be recursively definable, as long as the resulting tree is acyclical?@@ Importantly, this would allow to define generalization and part-of relations between parts/structures! [ATTR: id] Authored or auto-generated free-form descriptions, which may be completely or partially marked up with elements similar to those in coded descriptions. Largely language-independent descriptions entirely controlled by Terminology. Both coded and nat. lang. may describe either abstract class concepts (taxon, disease, etc.) or physical objects (individual specimens). [ATTR: id] The following types are used in the Terminology/General section. They define generic concepts not in principle restricted to the use in descriptive data. --- Coding status allows to express reasons why data are missing (not coded) Project-wide definition of CodingStatus values Properties describing machine-readable partial semantics for a coding status value. Provided to support generic application code that continues to function if additional codes are defined. @@ Both proposals need elaboration and discussion! To be coded / Not to be coded / Cannot be coded / coded successfully NotEvaluated / CannotExist / DoesNotExist / Exists Enumeration used in CodingStatus/Specification. These required values enable applications to interprete user-defined coding status values. To Be Coded -- Information has not yet been entered, but is is planned to do so. Not To Be Coded -- Information has not yet been entered, and is is not planned to do so (esp. because resources are lacking and other characters should have priority). Cannot Be Coded -- Information cannot be entered due to objective (inapplicable character) or subjective (cannot interpret available data) reasons. Coded Successfully -- Information has been entered successfully. Enumeration used in CodingStatus/Specification. These required values enable applications to interprete user-defined coding status values. Not Evaluated -- The presence of information has not yet been evaluated. Cannot Exist -- Information cannot exist for logical reasons (i. e. a character with a coding status having this value is inapplicable). Does Not Exist -- Information should exist, but extensive research has failed to find it. Exists -- Information has already been found, but may not yet have been entered. Refers to CodingStatus values (e. g., from within descriptions) Refers to a CodingStatus value (Terminology/General/CodingStatusValues/Status/@id) -------------------------------- START Modifiers (uses polymorphism!) --------------------------- The modifier type system covers expressions of certainty, frequency, manner, degree, etc. that can be added to existing character value or state expressions. The modifier system is complex and uses abstract base and derived types both for modifier definitions and for references applying these modifiers to statements in descriptions. Quick overview over the primary entry types: Modifiers are defined in ModifierSet elements. Recommended applicability of sets to characters is defined in the concept trees. Single modifiers are applied to descriptive statements using the PolymorphicModification/PolymorphicModificationMarkup groups. 1. --- Modifier definitions a) Modifiers are grouped into sets because of ranking (ordering) within a set (and for management purposes). All modifiers in a set must be of the same modifier type (e. g., all are frequencies), else ranking would not be meaningful. A set of modifiers of a single type that has a label and may define order/rank.for the contained modifiers Label expressing the concept or scope of each modifier set. = 'Modifiers are ranked'. If true, the sequence of modifier elements in instance documents is semantically meaningful (as in 'weakly' - 'moderately' - 'strongly'). If false the sequence is intended for display purposes only. (Unlike most similar poly- morphisms, this is not a collection; each set may occur only once.) Express the certainty of categorical or statistical statements ('perhaps', 'probably', 'almost certainly'). 'True-by-misinterpretation'- modifiers are included as a special case of 'certainly false'. Only predefined, no specifications yet! It is believed that specifications may be desirable here in the future. Defining them requires significant work, however. Only predefined, no specifications yet! It is believed that specifications may be desirable here in the future. Defining them requires significant work, however. Other, so far untyped modifiers of manner, degree, intensity (e. g., 'strongly', 'weakly'). These convey their specific semantics only to human consumers (or processors parsing and interpreting label text). Used to describe state frequency (usually, rarely, etc.). In descriptions frequency range estimates can also be stated numerically! Modifiers of degree or manner, specific to categorical states (very, strongly, etc.). Refers to a ModifierSet, used in ConceptTree//Concept to define recommended modifier sets The ref attribute refers to a modifier set (Terminology/Modifiers/ModifierSet) b) Single modifier definitions. Abstract base type and derived types to be used in instance documents. Note that 'Frequency'Modifier, 'CertaintyModifier', etc. may have been named 'FrequencyModifierDef', etc.; they have been abbreviated to improve the readability of instance documents in case xsi:type would have been used. Abstract base type for state or character modifier definitions (certainty, frequency, etc.) -- Character modifiers: Abstract base type for modifiers applicable to character types in principal Definition of certainty modifiers (perhaps, probably, etc.) An estimate of a probability range for verbal modifiers, defined through two attributes. The upper/lower limits of probability modifiers may overlap. The default values are 0-1, indicating that no estimate was possible. If present and true the current modifier indicates that the state to which it refers is present or true only due to a misinterpretation. The probability range should be 0 to 0 = certainly false. Definition of spatial modifiers (proximal, distal, at base, at tip, etc.) In version SDD 1.0 this element is defined only to support forward compatibility; no specification details are defined for this modifier type yet. Definition of temporal modifiers (earlier, later, in summer, in spring, etc.) In version SDD 1.0 this element is defined only to support forward compatibility; no specification details are defined for this modifier type yet. Definition of character modifiers not yet covered by the categories above (open extension!) -- Categorical state modifiers: Abstract base type for modifiers applicable only to categorical states Definition of frequency modifiers (rarely, usually, etc.) An estimate of a frequency range for verbal modifiers, defined through two attributes. The upper and lower limits of several frequency modifiers may overlap. The default values are 0-1, indicating that no estimate was possible. Definition of modifiers restricted to single categorical statmenents, esp. modifiers/adverbs of degree and manner (strongly, very, darkly, etc.). (Note: the grammatical concept of adverbs of manner often includes the certainty modifiers, which should not be included here!) - (It is expected that this list may have to be extended in future SDD versions, creating additional specific modifier types for those lumped in OtherModifiers) - (Open questions: a) can approximations ('ca.', 'roughly') be handled as CertaintyModifiers or is a separate type desirable? b) should manner, degree, intensity become separate types? c) Specification of spatial and temporal modifiers must be elaborated!) c) Collections of modifier definitions. Abstract base type and derived types to be used in instance documents. The ModifierSet type refers to these collections in a polymorphic way. This allows to define a collection of ModifierSet elements, each set containing multiple modifiers of a single modifier type. Abstract base type of a collection of modifiers of a single type. In instance documents one of the following non-abstract types must be used. (This is an abstract type, specific derived types will be used in instance documents!) [ATTR: id] -- Character modifiers: (Collection; derived from ModifierDefs abstract base type, restricted to specific modifier type) (Collection; derived from ModifierDefs abstract base type, restricted to specific modifier type) (Collection; derived from ModifierDefs abstract base type, restricted to specific modifier type) (Collection; derived from ModifierDefs abstract base type, restricted to specific modifier type) -- Categorical state modifiers: (Collection; derived from ModifierDefs abstract base type, restricted to specific modifier type) (Collection; derived from ModifierDefs abstract base type, restricted to specific modifier type) 2. --- Simple Modifier references (used in coded descriptions). a) Abstract base types Abstract base type for an actual modification of a statement. In instance documents the following derived types will be used, either referring to a defined modifier category, or giving explicit numerical ranges/values. Refers to a any kind of modifier definition type (Terminology/Modifiers/ModifierSet/*/Modifier/@id) Abstract base type including all references to CharacterModifierDef Abstract base type including all references to StateModifierDef Abstract base type, adding ProbRangeAttributeGroup. Currently used only for Frequency modifiers, where exact frequency values may optionally be given in descriptions. Lower value of a probability range (values 0 to 1 inclusive). Note: to specify a single, exact value set both lower and upper attributes to this value! Upper value of a probability range (values 0 to 1 inclusive). Note: to specify a single, exact value set both lower and upper attributes to this value! (Attribute modeling group used in StateModificationPlusProbabilities/Markup. In theory the attributes could be inherited from UBIF complex type ProbabilityRange, but this would require multiple inheritance!) Lower value of a probability range (values 0 to 1 inclusive). Note: to specify a single, exact value set both lower and upper attributes to this value! Upper value of a probability range (values 0 to 1 inclusive). Note: to specify a single, exact value set both lower and upper attributes to this value! b) Derived types to be used in instance documents. Note that 'Frequency', 'Certainty', etc. may have been named 'FrequencyModifierRef', 'CertaintyModifierRef', etc.; they have been abbreviated to improve the readability of instance documents in case xsi:type would have been used. -- Reference to character modifiers: Refers to a certainty character modifier Refers to a certainty modifier (Terminology/Modifiers/ModifierSet/CertaintyModifiers/Modifier/@id) Refers to a spatial character modifier Refers to a spatial modifier (Terminology/Modifiers/ModifierSet/SpatialModifiers/Modifier/@id) Refers to a temporal character modifier Refers to a "Temporal" modifier (Terminology/Modifiers/ModifierSet/TemporalModifiers/Modifier/@id) Refers to a character modifier not covered by the types above Refers to an "OtherModifer" modifier (Terminology/Modifiers/ModifierSet/OtherModifers/Modifier/@id) -- Reference to categorical state modifiers: Refers to a frequency modifier (e. g., from within categorical character data) Refers to a Frequency modifier (Terminology/Modifiers/ModifierSet/FrequencyModifiers/Modifier/@id) Refers to a state modifier (e. g., from within categorical character data) Refers to an "StateModifer" modifier (Terminology/Modifiers/ModifierSet/StateModifiers/Modifier/@id) -------------------------------- START Characters and dependent objects (states, statistical measures) --------------------------- 1. --- Character definitions (characters = data recording and analysis variables, depending on observed part, property, and observation or measurement methodology) a) Abstract base type and derived types to be used in instance documents. Defines a character in the terminology. Abstract base type, one of the extensions below must be used in instance documents Only a simple label for presenting characters in a flat list is defined here. (Abbreviated char. labels for tabular reports, natural language wordings, etc. can be defined in concept trees!) Meta information, rating characters under various aspects. Intended to guide a best- next character algorithm. # Derived from AbstractCharacter to be used in instance documents (non-abstract type). Categorical data include nominal and ordinal data (DELTA types UM/OM and NEXUS types). Other terms for categorical data in statistics are 'qualitative data' or 'attributes'. The term 'attribute' has been avoided in SDD because it has different definitions in statistics, programming, databases, DELTA, etc. Both 'qualitative' and 'attribute' are ambiguos as to whether ordinal/ ranked variables are in- or excluded. Extension of the common character properties with those specific to categorical data (= 'states'). An optional specification of the kind of categorical character variable. The available measurement scales are 'nominal', and 'ordinal'. The distinction between linear ordering and other kinds of ordering is made separately! Any categorical variable can assume only a limited number of discrete values. Thus data recorded in a CategoricalCharacter are always discrete (= discontinous or meristic). However, the measured property may either be naturally discrete ('male/ female', 'aseptate/ uniseptate/ biseptate/muriform'), or it may be continuously varying and partioned into into discrete categories ('no/few/many hairs', 'orange to red'). Only in the latter case the between-operator can be used on neighboring states. Mappings between categorical states (e. g., subovate may be mapped to ovate to simplify identification choices). Each mapping defines a source and a destination state. Both From and To may point multiple times to the same state, but the combination From + To must be unique. Both state must be defined in the current character (validated through identity constraint!) [ATTR: ref] Both To and From should point to a different character than the current (not validated). No explicit character reference is required, since state references are unique within a dataset. [ATTR: ref] (States are defined outside the type specific tree, since categorical states may be present in addition to numerical data) (The element sequence in instance documents is informative!) Local definition of a state [ATTR: id] Reference to a single concept state (as defined project-wide at a concept tree node); extended with an id definition so that the state in the context of the current character can be referred to from descriptions. [ATTR: id, ref] # Derived from AbstractCharacter to be used in instance documents (non-abstract type) Quantitative data include data like the DELTA types IN/RN. They are not supported by NEXUS. Extension of the common character properties with those specific to numerical measurements Especially including a more detailed measurement scale. --- Note: Unlike the states in categorical characters, the applicability of statistical measures to a character is not defined in the character. Any measure used in a description constitutes valid information. However, a list of recommended measures for sets of characters may be defined in concept nodes. An optional specification of the kind of numerical character variable. The numeric scales are 'interval', and 'ratio'. Interval differs from Ratio that the 0-value is an arbitrary point (e.g. in °C/°F) so that ratios should not be calculated. If true, an application may issue a warning if sample measurements are not integer. Note that most statistical measures are real values for integer data (min/max/TotalRange being exceptions). Data are continous if theoretically any value is possible with a sufficiently fine measurement method. They are discrete if only certain values are possible and gaps between values exist. The value must be false for ValuesAreInteger= true. It may also be false for real numeric values (esp. for ratio data based on counts). An inclusive range defined through two attributes into which all measured values and most statistics (mean, extremes, ranges, etc.) should fall. Only dimensionless statistics (variance, sample size) are not to be tested against the plausibility range. This does not define a schema constraint; applications may ignore this, enforce it strictly, or issue warnings when violated. [ATTR: lower, upper] Circular data are a special kind of MeasurementScale='interval'. If this data element is present, lower and upper define the values joining the circle. Example: '0, 360' for compass values, '0, 24' for hours of day. Compare Zar 1984: 422ff. Mappings of numerical ranges to categories (like DELTA Key States) Each mapping defines a lower and an upper value to map numerical ranges to categorical states in the same character. A CompareWith attribute defines which kind of statistical measure (mean, confidence interval, or min/max) is used for the comparison. An inclusive range defined through two attributes ('lower', 'upper'), plus a 'comparewith' attribute defining the preferred kind of measure. [ATTR: lower, upper, comparewith] The type of statistical measure with which the mapping range defined through Lower/UpperValue is compared. This may be a central value (mean, median), the range (quantile, confidence interval, etc.) or the extremes (minimum/maximum). Currently only these three categories are defined. The categorical state corresponding to the range defined in From. [ATTR: ref] Refers to a measurement unit (like mm, µm, °C) defined in Terminology/General. To simplify integration of descriptions from different data sources, different (but compatible!) measurement units may be used in different descriptions. However, the unit set here is recommended for data input and reports. Further, this is the default measurement unit if numerical data in a description declare no unit. Measurement units apply only to those statistical measures not marked with IsDimensionless='true'. The number of figures in normal (non-dimensionless) measures assumed to be significant for all data in this character. Note that in sample values the 'significant' attribute also records the number of significant figures for individual measurements. # Derived from AbstractCharacter to be used in instance documents (non-abstract type) Extension of the common character properties with those specific to color measurements (i.e. color expressed as a color range/area, rather than as named categories). (Not yet used!) Mappings of color polygon values to categorical states An inclusive range defining a color range through color vertices forming a polygon in color space. The categorical state corresponding to the range defined in From. [ATTR: ref] Note: The ColorRangeCharacter above is only one example of future derivations expected, like algorithmically described shapes, molecular sequences (genome/proteome), or molecular patterns (RFLP, AFLP, etc) b) State definitions within CategoricalCharacter. Abstract base type and derived types to be used in instance documents. For categorical states. Used in concept (= 'project-wide') and local character state definitions. Any use of a character state in descriptions is a reference to an object of this type or one of its derivations. If present and true, the current state/ category allows unconstrained text not tied to a truly analytical state. Such states (which may be labeled: 'Text', 'Other:', 'none of the above, please specify:') prevent, especially if the terminology is still under development, that during data entry potentially inappropriate category must be chosen. DELTA text character are modeled using these states, but they also can occur in combination with categorical states. UnconstrainedText states are somewhat similar to the 'unknown' coding status, since the free-form text information is not available to most analytical processors (incl. identification programs). (This 2nd annotation contains detailed informations not entered in the first annotation, which is visible in the standard schema diagrams.) The name for this data element was contentious. Proposals were: Bob: IsIsolatedState with default false. Gregor: IsAnalyticalState, StateComparisonIsRecommended, or IsWellDefinedState, all with default true. ImpreciseEquality with default false? Furthermore, one may want to make a distinction between a category saying "enter free form text here" and one explicitly saying "none of the above". However, the action of choosing a separate free form text state instead of scoring a category (if available) and adding free-form note text, implies that choosing free-form text is always of the type "none of the above", whether this is explicitly stated in the text state label or not. CharacterAbstractStateDef plus a new, character-local CharacterState id CharacterAbstractStateDef plus ConceptState id, used to define generic states at concepts that can be re-used in multiple characters c) Character and state references Refers to a character (e. g., from within concept trees or from descriptions). It consists only of a reference to a Character definition id. ref refers to a character definition id (Terminology/Characters/Character) Refers to a character state (e. g., from descriptions). It consists only of a reference to a Character state definition id. The ref attribute refers to a character state id. A collection of state references (CharacterStateRef type) [ATTR: ref] Refers to a project-wide definition of a categorical state at a concept node Refers to a concept state (those defined within the concept tree, which may be used in multiple characters). d) Statistical measures: The base semantics and labels are already available through UBIF. At concepts node further elaboration may occur: a) wording and value formatting b) definition of recommended measure sets. A kind of local extension of the base definition of a statistical measure; used inside in concepts, adding, e. g., formatting information. Properties describing machine-readable partial semantics for a statistical measure. Provided to support generic application code that continues to function if additional measures are defined. Simple statistical measures not requiring a parameter (mean, variance, sample size). Statistical measures with a parameter value like confidence interval, percentile, etc. A default value for the parameter of the measure. Example: 0.95 for the upper limit and -0.95 for the lower limit of the 95% confidence interval. Format rules as used in the xslt format-number function. # = significant digits; 0 (zero) = signif. digits or insignif. leading/trailing zeros; '.' = decimal point, ',' = group separator. Note that this is NOT culture sensitive in xslt!!! - Examples: "0,0#" formats 5 / 0.59 as 5,0 / 0.59. "# ###,#" formats 5000 / 0.59 as 5 000 / .6. (Rules for exponential formats or percent may be added in later versions of SDD!) When mapping numerical ranges to categorical states (essentially creating a histogram), several methods which statistical measures are used for the mapping are possible. Using the central value compares a point with the mapping range, whereas using ranges or extremes results in a comparison of two kind of ranges for overlap. Only the central value method can guarantee an unambiguous partitioning into categories. However, the ranges or extremes methods may be desirable because of their improved error tolerance. Central measure -- The first central measure encountered (mean, median, mode) is used as the basis of comparison. If none is found, but ranges or extremes are present, a central value is calculated based on the these. Ranges -- Any ranges that are not the extremes (quantile, percentile, confidence interval, mean plus/minus s.d., etc.) is attempted to use for comparison. If none is found, Extreme values are used. Extremes -- The extreme range values (= minimum and maximum) are used as the basis of comparison. --------- The following types are used in descriptions or identification key to code descriptive data by reference to characters, states, and modifiers defined in the Terminology. 2. --- Character references in coded descriptions: SummaryData a) abstract and non-abstract derived types used in coded descriptions Note: The non-abstract derived types are to be used in instance documents. The type names have been shortened to simplify instance documents, especially if an xsi:type would be used (Char xsi:type='CatSummaryData'). Abstract base type. Used in CodedDescription/CodedData/Char to make statements for a single character in a class or unit. [ATTR in CharSummaryData base type:] ref (= to char. definition) origin (= enumeration; data may be original data or derived from other sources like calculation, mapping, aggregation/ generalization, inheritance @@Is there a better name for 'origin'? Character modifiers, modifying the all categorical states, statistical measures, etc. collectively. A character may occur multiple times in a description with different modifiers ('in winter/summer', 'at base/tip', etc.) or origins (e. g. from samples). (The element sequence in instance documents is informative!) [ATTR: ref (= for all elements above)] If origin='Calculated' and data are based on a specific sample that is present within the description, this sample may be identified here. [ATTR: ref] Media specific to the character and the current object or class described. Example: microscopic picture of spore shape in a specimen. Coding status values like Inapplicable, unknown, etc.; may have a free-form Note, but not modifiers. [ATTR: ref] Note: In a unit (= specimen) description this should be an alternative to categorical or numerical data and limited to 1 status value per character and (not enforced by schema). However, for classes (e. g., a genus) it is up to the aggregation/generalization process whether to create multiple status values ("unknown or not applicable") or not. Public notes or comments on the entire character statement, i. e. all status values and states, measures (depending on type), etc. together. Multiple languages are supported. Applications may, e. g. report the text in brackets after all other data. Provenance of value/state. The current data may be original data or may be cached information derived from other sources. The origin of the derivation may be a calculation, a mapping, an aggregation/generalization (class hierarchy, from below), or an inheritance (class hierarchy, from above). # Derived from abstract CharSummaryData to be used for categorical (char. state) data in instance documents (non-abstract type) Type-specific extension of the base character data type. States are 'scored' in a description by referring to a state defined in the current character. [ATTR: ref] Distinguishes different types of state collections. 'AndSet' and 'OrSet' define state distributions that are not explicitly ordered in instance documents. Applications may reorder states using the state order defined in Terminology or state frequency values/ranking. For the corresponding 'AndSeq'/'OrSeq' the sequences of states in instance documents defines the preferred order of states (distinguishing, e. g., between 'round or elliptic' and 'elliptic or round'). WithSeq expresses a specially worded form of 'AndSeq'. With 'Between' the scored states form a range around the true value ('orange' to 'red'). # Derived from abstract CharSummaryData to be used for numerical (statistical measures) data in instance documents (non-abstract type) Type-specific extension of the base character data type. (The element sequence in instance documents is not informative!) Simple measures like mean, variance, or sample size. [ATTR: ref, value] Statistical measures like confidence interval or percentile, expressed using an additional parameter par. [ATTR: ref, par, value] --- The ref attributes in both types point directly to enumerations in UBIF (UnivarStatMeasureEnum/WithParam). An elaboration for measure definitions is supported at concept nodes but optional. --- Individual measures have no separate Modifiers/Notes. However, a numerical character may occur multiple times in coded descriptions, e. g., to separately express width at base and at center. Refers to a measurement unit like mm, µm, °C, defined in Terminology/General. If missing, the 'recommended measurement unit' declared in the character definition is to be assumed. Note: although each data item may use different units, they should all be compatible and convertible (like °F/°C/°K, ml/mm3). This is not controlled by the schema! # Derived from abstract CharSummaryData to be used for numerical (statistical measures) data in instance documents (non-abstract type) An inclusive range defining a color range through color vertices forming a polygon in color space. b) types used inside the CharSummaryData-derived types A categorical state including frequency, state modifier, and Notes Currently limited to a single frequency and modifier per state, in an explicit attempt to simplify the SDD data model! [ATTR: ref (= for all elements above)] Public notes or comments, for multiple languages. Applications may, e. g., report the text in brackets after the character state. Similar to StateData, this one is intended for CodingStatus references. It support notes, but no modifiers! [ATTR: ref] Public notes or comments, for multiple languages. Applications may, e. g., report the text in brackets after the status value wording. 3. --- Character references in coded descriptions: SampleData a) abstract and non-abstract derived types used in sample data Abstract base type. Used in CodedDescription/SampleData/ Sample/SamplingUnit. [ATTR: ref (to def. of character)] # Derived from abstract CharSampleData to be used for categorical (char. state) data in instance documents (non-abstract type) States are 'scored' in a description by referring to a state in the character definition. All notes and modifiers are applicable to this element. [ATTR: ref] # Derived from abstract CharSampleData to be used for numerical data in instance documents (non-abstract type) in coded descriptions (Sample/ SamplingUnit). [ATTR: value (xs: double, a directly measured/observed value. Not for statistical measures; these cannot occur in sampling units)] Public notes or comments, for multiple languages. Applications may, e. g., report the text in brackets after the value. A single value of a single measurement for a character in a sampling unit. This may not be used for ranges, minimum, mean, etc., which cannot possibly occur on sampling units. Significant figures. 1.300 has 4 significant figures, 72000 may have 2, 3, or more significant figures. # Derived from abstract CharSampleData to be used for ColorRange data in instance documents (non-abstract type) An inclusive range defining a color range through color vertices forming a polygon in color space. -------------------------------- END Characters and dependent objects (states, statistical measures) --------------------------- Concept tree and node definitions Defines an entire concept tree (which may be a single tree node containing a flat list) Label to identify the current object in the user interface Concepts describing the entire tree using a constrained vocabulary to support application interoperability The type of a tree is constrained to an enumerated list to support application interoperability. Usage of concept tree that is intended by its designers; constrained to an enumerated list to support application interoperability. Important Roles are InteractiveIdentification, NaturalLanguageReporting, or Filtering. Trees that have no value are implicitly visible in applications only when designing the terminology! They are not normally shown to consumers of descriptive data. True if the intention of the designer of a concept tree is that all characters should be included in the tree. A terminology editing application may use then warn about missing characters or directly offer inserting newly created characters in all such trees. The root node of the tree. Note that it has a label in addition to the tree label. The tree label uniquely identifies a tree when selecting it among a list of all trees, whereas the root node label can be very short and is shown when a single tree is displayed. [ATTR: id] A node in a concept tree. Concepts may be basic properties (color, shape, texture), structural types (fruit types), methods (naked eye, hand lens, microscope) or other hierarchical generalizations that can be applied to characters (e. g., relative region: tip versus base of structure) Tree nodes may remain unlabeled! A set of project-wide state definitions tied to the part (e. g., for fruit: capsule, berry, nutlet, ...), property (e. g., for color: red, green, ..., for shape: round, ovate, ...), method, etc. described in the current concept tree. ConceptStates become operational for descriptions only when a StateReference has been added in a specific character. The definition of concept states is identical to the local definition of states within a character. Using concept states simplifies the management of terminology and improves data analysis (states from different characters can be compared if they refer to identical concept states). [ATTR: id] Inheritable def. automatically apply to all characters/concepts starting at this node. The modifiers contained in the listed modifier sets are considered applicable to all characters placed in the current branch of the concept tree. Note. In descriptions, all modifiers are prinicpally valid in all characters. However, editing tools are expected to offer only the recommended modifiers defined here. Reference to a set of modifiers. The element sequence in instance documents is not informative! [ATTR: ref] A set of univariate statistical measures (e. g., mean, min, max, s. d., sample size) considered applicable to all numerical (sic!) characters placed in the current branch of the concept tree. In descriptions, all measures are valid in all numerical characters. However, editing tools are expected to normally offer only the recommended measures defined here. In addition to listing measures, these elaborations provide for improved report generation (wording, value formatting). Note: Statistical measures applicable to ordinal or nominal data (min, median, mode, etc.) are yet supported, since they can easily be calculated ad-hoc from the frequency distributions of categorical states that are supported. Within each description, the applicability of all characters references within the branch or the tree starting with the current concept may optionally be governed by rules depending on the presence of categorical states in the same description. Note: rules for individual characters (rather than a set) can be defined in the terminal nodes. By default the characters below this node are inapplicable. They become applicable if any of the listed controlling character/state combinations is present in a description. By default the characters below this node are applicable. They become inapplicable if any of the listed controlling character/state combinations is present in a description. Meta information, rating characters under various aspects. Intended to guide a best- next character algorithm. A node either contains other nodes, or contains a single character reference. It may also be empty to decouple the definition of hierarchies (e. g., a complete part hierarchy) from characters defined at a given moment. Element may be missing, which results in the option to have empty nodes with neither a character nor further nodes. [ATTR: id] Characters are the 'leaves' of the tree. Each character is embedded in a node providing labeling information in the context of the current tree (which is usually different from the default character label). A single character may appear in several places in the tree, if this is desired. [ATTR: ref] ==== TERMINOLOGY END === ==== DESCRIPTIONS START === Abstract base type for NaturalLanguageDescription and CodedDescription. The id attribute is currently not used in keyrefs from within this schema. However, it is considered generally useful to uniquely identify descriptions in federated situations. Subject of the description is either an abstract class (e. g., a biological species) or an individual object or unit (e. g.,a specimen). Refers to a class name (= in biology a taxon name) [ATTR: ref, @@check classifier design: add. attributes?] Refers to an individual physical object (e. g., a biological specimen). This may refer to observed objects as well as to collected and preserved objects. The identification (= a class name) is defined in the ExternalDataInterface/Units list. [ATTR: ref] A description may be further defined through a published data source for the nat. language or coded description. If Citation is missing, it is assumed that the compiler or editor of the data is the original source of information. A description may have a limited geographical scope, if geographical variability is know to exist or is expected. Creators, Revision status, and dates of individual description (compare RevisionData in Metadata) Contains resources like images that are not specific to a character (else add them to character elements below). Defines a id for a coded or natural language description Coded description data are highly controlled by the vocabulary and structures defined in the Terminology, using references to characters, states, modifiers, numerical values for measurements. They also support a limited amount of free-form text (in Notes or Annotation only). Separating data and terminology allows rearranging and refactoring the terminology, multilingual support through central terminology translations, and multiple hierarchical views. Coded descriptions must fulfill more rigorous consistency requirements than natural language descriptions and are more suitable for analysis. Furthermore, language-dependent annotations are minimized so that data can be easily reorganized and translated into multiple languages. Summary data for aggregated or summarized data (using statistical measures, state distributions, etc.). The element is optional to support descriptions containing only sample data or media resources. Note: Characters are NOT required to have unique ref attributes! Data for one character may be recorded with different modifications (in spring/autumn, at tip/base). (The sequence of elements in instance documents is not informative and may be changed at any time) [ATTR: ref] [ATTR: ref] [ATTR: ref] Raw sample data are recorded here. The analysed and generalized (e. g. using statistical measures) results are normally also reflected under SummaryData (with origin='calculated' and BasedOnSample identifying a sample ID. (The sequence of Sample elements in instance documents is not informative and may be changed at any time!) A container for direct ('raw') measurement results in a study. All sampling observations are assumed to be made under identical conditions. Descriptions may contain an unlimited number of Samples. [ATTR: id, random (= is random sample, if false sample may or may not be biased)] A special subtype of CodedDescription are original sampling data, which are organized into referable Sample containers: A container for a sampling, with repeated sampling units, each of which may record multiple characters that are observed together. Public notes on the sample (circumstances, etc.) that are not already identified in the description header. Multiple languages are supported (although rarely required). Optionally a fully or partial date/time of start and end of the sampling event may be recorded. A sampling unit may be an individual organism, a leaf of a tree, a piece of tissue, etc. In each sampling unit multiple characters may have been observed together ('paired observations'). Example: 'leaf shape, length, and width' of a single leaf). Value frequencies (e. g., '2.3': observed 4 x) are not supported; they are useful when only a single character variable is supported, but complicate paired observations unnecessarily. Char. values with a frequency should be entered in repeated SamplingUnits. (The sequence of SamplingUnit elements in instance documents should be preserved. It has no analytical semantics, but it may be relevant if data entry is compared with the source.) (The sequence of elements in instance documents is not informative and may be changed at any time) [ATTR: ref] [ATTR: ref, value] [ATTR: ref] Defines a id for a sample. This is used when analysis data in coded description are based on a specific sample If true, the sample is a random sample. If false, the sample may or may not be biased. Refers to a specific sample inside CodedDescriptions Refers to a specific sample (//CodedDescription/SampleData/Sample) ==== DESCRIPTIONS END === ==== Other basic types used by SDD (compare also the types used by UBIF) Used in descriptive data (not in terminology): Collections of states in instance documents may be ordered (sequence) or unordered (set), and may be connected with 'and', 'or', 'with', or 'between'. Since set/sequence and operators are dependent on each other, the two aspects are combined into a 'model' enumeration Unordered set of states, combined with 'or' -- Multiple states scored for a character in a description form a set. The order of states has no special meaning and may be changed. In natural language output the states should be combined with 'or' to express that in individual objects (that belong to the class that is being described), the states may occur together or alone. Ordered sequence of states, combined with 'or' -- Multiple states scored for a character in a description form a sequence, i. e. the state order carries some semantics and should be preserved in output. The sequence semantics is not explicitly defined, but intelligable to human consumers and presumably relates to some concept of relevance or importance. In natural language output the states should be combined with 'or' to express that in individual objects (that belong to the class that is being described), the states may occur together or alone. Unordered set of states, of states, combined with 'and' -- Multiple states scored for a character in a description form a set. The order of states has no special meaning and may be changed. In natural language output the states should be combined with 'and' to express that in any individual object (that belong to the class that is being described), the states will always occur together. Example: two colors that occur together in a pattern. Ordered sequence of states, combined with 'and' -- Multiple states scored for a character in a description form a sequence, i. e. the state order carries some semantics and should be preserved in output. The sequence semantics is not explicitly defined, but intelligable to human consumers and presumably relates to some concept of relevance or importance. In natural language output the states should be combined with 'and' to express that in any individual object (that belong to the class that is being described), the states will always occur together. Example: a black part with small red markings, is more appropriately described as 'black and red' than 'red and black'. One state occurring together with others of secondary relevance. -- This is a special case of AndSeq, and in many circumstances (except natural language generation) may be treated as AndSeg. Example: "Green with brown" (often this may be two characters, e. g. base color and dot color). True value lying between (usually two) states -- Example: "Between oval and elliptic" = "Oval to elliptic". Defines the type of a concept tree (list of enumerated values to support application interoperability). Categorizing characters into basic property types (e. g., color, 2-dim. shape, 3-dim. shape, surface texture, taste, smell, behavior, physiology, measurements, etc.) greatly improves the analysis and management of larger character sets and is therefore recommended. [@@Note: Only a single concept tree should have this hierarchy type. (not enforced in schema, how can it be enforced? Other types occur multiple, i. e. one cannot make a UNIQUE statement on attribute! @@] A hierarchy that organizes characters by observation method or instrumentation, e. g., field observation, light microscopy, electron microscopy, molecular methods, culture techniques, etc. A hierarchy that organizes characters by a morphological or anatomical "contains" or "part-of" hierarchy: plant = root/stem/leaf, leaf = base/stipules/petiole/lamina, etc. A hierarchy that organizes structural parts in a kind-of hierarchy (e. g., a 'teliospore' is a kind of 'spore') Used for concept trees that fall into none of the categories property, method, part. Such trees may be intended only for internal purposes (e. g., defining dependency rules) or for browsing by the user. PresentationTable concept trees are small sets of a usually a few characters that allow to display data in a tabular arrangement. It is possible to define tables in more than 2 dimensions. By default the innermost dimension is considered cells in a row, the next rows in a table. Any further dimension may be displayed as multiple 2-dimensional tables one below the other. However, applications may also offer a browser based on pivot tables. - Note: Trees of type PresentationTable should not be offered in the user interface when selecting a browsing tree. A concept tree of type "SubsetFilter" is intended only for the purpose of filtering characters. It will often be a flat list of characters. Applications should not offer it as a choice when the user selects a hierarchy for displaying or reporting purposes. Note that conversely, the filter selection dialog in applications should not be restricted to trees of type SubsetFilter. Any concept tree, including part, method or property hierarchies may be used as a filter to define character subsets. Defines the intended roles that a designer may assign to a concept tree (list of enumerated values to support application interoperability). Setting this value in a concept tree is a recommendation to applications with a user interface to offer this tree for editing the description data set (the application may, however, enable the user to select any concept tree). Setting this value in a concept tree is a recommendation to applications with a user interface to offer this tree for building stored identification keys (e. g., dichotomous keys). Setting this value in a concept tree is a recommendation to applications with a user interface to offer this tree for interactive identification. Setting this value in a concept tree is a recommendation to applications to use this for creating a report of the character terminology. (Note that no TerminologyEditing value is defined; all concept trees should be available when designing the terminology. However, the tree marked as TerminologyReporting may be used as the initial editing view.) Setting this value in a concept tree is a recommendation to applications to offer this tree for natural language reporting. Setting this value in a concept tree is a recommendation to applications to offer this tree for filtering purposes. Some trees are explicitly (separately) typed as being intended exclusively for filtering/subset definition; but many trees are useful for filtering purposes. Defines the origin of data that may have been entered, calculated, aggregated or inherited The data are directly entered by a machine or human agent. These are the original data all other cached data (Origin unequal 'OriginalData') are based upon. The data are calculated from other data using a calculation rule. Examples: a ratio calculated from other characters, a mean calculated from a sample that is available under SampleData/Sample (if a mean is calculated from data no longer available, it would be recorded as 'OriginalData'). The data are calculated from other data based on a mapping definition (either from numeric to categorical, or from fine-grained categorical to coarse-grained categorical. The data are derived from data in classes placed below the current class in the class hierarchy. This applies both to aggregating data from objects to classes, as generalizing lower classes to higher classes. Note: BioLink calls this 'Compile from below'. The data are derived from data in classes placed above the current class in the class hierarchy. Defines the origin of concept/character ratings. Similar to DataOriginEnum, but fewer enumerated values. The data are directly entered by a machine or human agent. Concept ratings may inherit from ratings at higher concept nodes, and character ratings may inherit from all concept nodes they belong to (possibly in multiple concept trees). A rating of 1 (low) to 5 (high), with 3 as central value, plus indication whether inherited (= calculated based on related definitions) or defined directly. inherited = inherited from a concept parent. Concept ratings are inherited to all further concepts and to all characters in a branch of the concept tree. A collection of ratings to rate the consistency, etc. of a character or concept. Relevant during interactive identification to rank the remaining characters for discriminative power and convenience. How convenient is the character or concept for identification? [ATTR: rating, origin] How available is the character or concept for identification? The rating should be low if it is only available at a short time during the life on a object, or only expressed with low frequency in populations. [ATTR: rating, origin] How reliable is the character or concept for identification? This should include both variability of values and variability in scoring the objects. [ATTR: rating, origin] How convenient is the character or concept for identification? [ATTR: rating, origin] The following types define and variants of wording definitions for natural language report rendering. These types are used exclusively in the Audience-specific LabelPlusWording1-3 container types. Natural language wording for elements without content (= 'SimpleWording'). Wording for elements that have no further children in the natural language wording tree, e. g., char. states. Natural language wording for container elements with non-repeated content (e. g., modifiers around states) (= 'ContainerWording') Wording output before the contained elements. For characters this is the main character wording that is output before the states. (Optionally both before and after may be present) Wording output after contained elements. In the case of a character this is the wording after all states, or after numerical data and after a measurement unit where present. Natural language wording for elements with repeated content like characters that contain multiple modifiers + states. (= 'Array-' or 'ContainerWording') The following types are audience-specific (i. e. they refer by a ref mechanism to audiencekey values). Note that some types are used only a single time, but it was thought more transparent to define all audience-specific collections and representations through types rather than make this dependent on the frequency of use. A label = collection of audience-specific label representations (without abbreviations or natural language reporting wordings). Used, e. g., for concept trees or modifier sets. Audience-specific simple label representation (= without abbreviations or natural language reporting wordings) [ATTR: audience] Audience-specific label representations (without abbreviations or natural language reporting wordings). Used, e. g., for concept trees or modifier sets. Text of the normal label, intended for screen display or reports that accommodate unabbreviated labels. Audience-specific label representations (incl. abbreviations) Restricted to 50 characters maximum length, including blanks (recommended to be much shorter!). Label abbreviations are especially important when displaying information in a tabular format. When missing, applications may abbreviate the label (this may create duplicates, however). Small multimedia resource to be displayed in addition to the label. It should be quicly recognized and will usually not be informative enough to base decisions on it alone. Example: in a concept tree a leaf icon image is provided for the node containing leaf characters. [ATTR: ref] A set of representative multimedia resources that convey the meaning even when used without a Text representation (but applications may choose to combine text and media). Example: display shape images to select a state during identification. If more than one resource is defined here, the assumption is that they will normally all be consumed before making a selection. The size of the resource should be sufficiently concise to view ca. 6 images from different labels concurrently on the intended output device, or listened to ca. 6 audio extracts before making a selection. - Both Icon and these selectors resources are audience-specific (e. g., image with abbreviation, bird-call with spoken text). [ATTR: ref] Label (incl. abbreviations and a single wording) Language/audience-specific label representations (incl. abbreviations and a single nat. lang. reporting wording ) [ATTR: audience] Extends LabelPlusAbbreviationRepr with a single wording element. Label (incl. abbreviations and a wording before and after the contained elements) Audience-specific label representations (incl. abbreviations and wording for natural language reports) [ATTR: audience] Extends LabelPlusAbbreviationRepr with a wording before and after the contained elements. Label (incl. abbreviations and a wording text before, after, and between the contained elements) Audience-specific label representations (incl. abbreviations and wording for natural language reporting) [ATTR: audience] Extends LabelPlusAbbreviationRepr with a complex wording element. Used in concept tree nodes and character references. Allows to define a text before, after, and between elements; used during natural language reporting. Container for multiple audience-specific representations of a (publicly reported) Note as text (optionally with basic formatting). Used, e. g., inside state, statistical measure, coding status, etc. references in descriptions. [ATTR: audience] Audience-specific representation of a (publicly reported) Note as text (optionally with basic formatting). The type provides an audience reference in an attribute. [The presence of the (seemingly superfluous) text element has two advantages: 1. Cleaner typing; adding an audience attribute directly to FormattedSimpleText type would require multiple inheritance. 2. In nat. language markup, Text surrounds all verbatim text. Retrieving all Text content retrieves the original text prior to markup.] --- Abstract base type for some vocabulary definitions: Abstract base type used to derive concepts in Terminology/General and Terminology that require only a single label and wording (states, coding status, etc.); the label natural language wording has only a single text element. Audience-specific labels, abbreviations, media icons/selectors & wording Abstract base type used for stat. measures and modifier definitions (certainty, frequency, etc.); the label natural language wording has text before and after! Label with abbreviations and wording for natural language reports. Note: It would be possible to define a VocabularyW3Base abstract base type, but this would be used only for concept nodes. === EXTENSIONS of UBIF (Unified Biosciences Information Framework) elements ProxyData objects: Include file for the main SDD schema. This file isolates a number of derived simple types used to define ID-based relations between object definitions and object references. For each kind of relation in SDD a specific type is used. The use of the type is intended to clarify the relations, which otherwise are hidden in the xml schema identity constraints that are difficult to study. Bob Morris proposed using this to help when wording with tools like Castor. Clearly, these types are technically redundant, and the semantics could also be documented separately (and are already in the identity constraints), but they hurt very little either. They are isolated in this include file so that they do not clutter the type list in the main SDD schema file. ---- Relation types used in general declarations(defined to help in type-safe programming; this duplicates information also defined in schema identity constraints): Derived from RelationID simple type without changes. Declares a unique type to clarify relations between key definition and key references and supports type-safe programming. Derived from RelationID simple type without changes. Declares a unique type to clarify relations between key definition and key references and supports type-safe programming. Derived from RelationID simple type without changes. Declares a unique type to clarify relations between key definition and key references and supports type-safe programming. ---- Relation types used in terminology (defined to help in type-safe programming; this duplicates information also defined in schema identity constraints): Derived from RelationID simple type without changes. Declares a unique type to clarify relations between key definition and key references and supports type-safe programming. Derived from RelationID simple type without changes. Declares a unique type to clarify relations between key definition and key references and supports type-safe programming. Derived from RelationID simple type without changes. Declares a unique type to clarify relations between key definition and key references and supports type-safe programming. Derived from RelationID simple type without changes. Declares a unique type to clarify relations between key definition and key references and supports type-safe programming. Derived from RelationID simple type without changes. Declares a unique type to clarify relations between key definition and key references and supports type-safe programming. ---- Relation types used in descriptions (defined to help in type-safe programming; this duplicates information also defined in schema identity constraints): Derived from RelationID simple type without changes. Declares a unique type to clarify relations between key definition and key references and supports type-safe programming. Derived from RelationID simple type without changes. Declares a unique type to clarify relations between key definition and key references and supports type-safe programming. ### Copyright © TDWG (Taxonomic Databases Working Group, www. tdwg.org), 2004. This file is a special version of the Unified Biosciences Information Frameword (UBIF) XML schema. It may be used only for viewing convenience and may not be distributed independently from the primary schema files (UBIF.xsd, UBIF_TypeLib.xsd, etc.). The inclusion of all parts starts below: !###

Unified Biosciences Information Frameword (UBIF) XML schema for data exchange and integration across knowledge domains. The schema has been design for biological data, but is applicable to other knowledge areas as well. It is based on work of the TDWG SDD and ABCD subgroups and currently jointly authored by the SDD, ABCD, TaxonName subgroups and by GBIF (Global Biodiversity Information Facility). The framework may be used without changes for new schemata, no registration is necessary. Its main features are:
* A foundation of shared simple and complex types, including some enumerations to simplify world-wide data integration and interoperability across language barriers. * A top-level structure of Datasets collections containing independent Dataset objects. The collection is purposely semantically neutral; relations between Dataset have to be discovered by the data consumer or are assumed to be implicit in the protocol requesting the data.
* Derivation metadata that support tracing and debugging the online transformation history data. They provide important technical information about access providers and the path of potentially multiple portals involved.
* Metadata describing the principal data collection from which the dataset was derived. The dataset may represent the entire source dataset or it may be filtered, normalized, or enriched with secondary information. A dataset is never an aggregation of multiple data collection sources with different authorship, copyright, or other IPR; these are assumed to be delivered as separate datasets. Note: Derivation and content/source metadata together provide all necessary information for UDDI support.
* External data interface (EDI) providing a standard mechanism to link to external data providers for knowledge domains outside of the scope of the current dataset. This includes a collection of supported object linking mechanisms involving globally unique identifiers and resolving mechanisms. Proxy objects can replace a links in cases where a specific object is (perhaps not yet) available in an external data source, and they cache a minimalized data interface on the assumption that access is asynchronous, slow, or may be temporarily unavailable. Furthermore, these cached data provide semantic information to human readers, preserving the semantics of a link even if it has become permanently broken.
* A single "payload" element which must come from a different namespace. Note that within a Datasets collection each Dataset object may have a payload from a different external schema. It is the responsibility of the consumer to decide which dataset payload it is interested in or can process.

Conventions: Element or attribute names starting with underscores (__) are present in the schema for discussion purposes only and should be only experimentally used. Annotations containing @ indicate unfinished points of discussion.
Note: blockDefault="#all" in xs:schema prevents that in instance documents derived types can be used in elements typed to the base type (which otherwise is possible using xsi:type=""). - finalDefault is not set, further type derivation is currently not considered problematic. Please contact us if you believe otherwise. Note that according to the w3c discussion forum, the developers of xml Schema consider to drop the final attribute in the upcoming XML Schema version 1.1. - Nillable: xsi:null is not supported in UBIF documents (schema declaration nillable="false" is default, not explicitly stated).

Copyright © TDWG (Taxonomic Databases Working Group, www. tdwg.org), 20. July 2004. Licensed under the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version (http://www.gnu.org/licenses/gpl.html). Schema designed and annotations authored by G. Hagedorn & W. Berendsohn, Berlin with help from members of the SDD, ABCD, TaxonName subgroups.

The Datasets collection is the only root element allowed in UBIF: Root element for files or data streams. Multiple Dataset objects are completely independent. Potential relationship may be detected by the consumer, but are not expressed in the UBIF format. The sequence of Dataset objects has no semantics and does not have to be preserved. The version of the UBIF standard used is defined in the namespace declaration and needs no separate data element. A single file or data stream may consist of multiple data sets ##This is a highly simplified version to concentrate discussion on SDD! Please look at UBIF itself for a discussion of this structure itself! Data from other knowledge domains to which the data set refers may be represented by collections of proxy data objects. In the absence of available external databases a proxy object may be used as a local placeholder. The data inside the proxy object usually provide a reduced interface data model that abstracts from a potentially more complex external data model. Examples: persons, publications, geographical localities, media resources, but also class names (biology: taxa) and objects/units (biology: specimens). ##To concentrate discussion on SDD some interface extensions are removed! Metadata referring to the principal source of the entire data collection (the metadata scope may be wider than the objects actually contained in the data set). ##To concentrate discussion on SDD some parts are removed! === Data derivation, Meta data about the entire data collection from which the data set was derived: Describes the providers and application/ script(s) that produced the current data set, plus a derivation history of all automatic or semi-automatic transformation with negligable or automated content changes.[ATTR: datetime (= When was it done?)] = Date and time (UTC or local time with timezone information) at which the current document or data stream was created by the generator. Using UTC (Universal time coordinates = Greenwich mean time) is recommended. ##This is a highly simplified version to concentrate discussion on SDD! Please look at UBIF itself for a discussion of this structure itself! Which tool did it? Metadata about the software (application, script, etc.) that performed the derivation/transformation. [ATTR: name, version, notes, routine] (Detailed attribute annotations exist, but are not visible in graphical schema view!) Name of the application performing the transformation. The term 'application' should be understood in a loose sense; it may be a script that is not part of a larger application (compare the Routine attribute, which may provide the detailed name of scripts that are part of an application!). Version of the application that has generated this document. The attribute should not be named 'Version' to avoid confusion with the version of the content (see content Metadata). Additional information about the generating application that is not part of the name or version. If the copyright of the generating application is specified, it should be understood that this does not affect the content copyright of the data. Optionally allows a generating application to identify which of possibly multiple transforming routines (database code, xslt, etc.) was used. This attribute may also be used, to identify different conditions under which the export routine may behave differently. [ATTR: datetime] When did it occur? Date and time (UTC or local time with timezone information) at which the current document or data stream was created by the generator. Using UTC (Universal time coordinates = Greenwich mean time) is recommended. Metadata referring to the principal source of the entire data collection (thus the metadata scope may be wider than the objects actually contained in the data set). If a history of the data collection (revised or expanded in various projects or at different institutions) exist, this must be reflected in the IPR statements and possibly in the list of Owners. Language-specific header information [ATTR: language] Number and date of current version The major version number ('1' in 1.2) as defined by the content creators. An optional minor version number ('2' in 1.2) Unconstrained text specifying status + optional number, e. g., 'beta', 'alpha', 'rc/release candidate', 'internal'. If missing, release status is assumed. Citable 'publication date' of the current version (comp. RevisionData/ Initiation- and LastRevisionDate for version- independent dates). This date must be missing if the current version is not yet published! (= DC.Date.issued; http://purl.org/dc/terms/issued) Note: currently no mechanism exist to record the date of the first version release. Is this needed? Creators, Revision status, and dates of the entire data collection from which the current dataset is derived. Entities having legal possession of the data collection content. Owners are defined only for the entire data collection, not for individual descriptions etc. (= http://www.loc.gov/ marc.relators/own) Copyright, terms of use, license and other IPR-related statements like disclaimer or acknowledgement. Giving a copyright statement and a (if possible public) licence is highly recommended! (=DC.Rights) [ATTR: language] Language-specific content metadata (title, description, etc.) with *required* Language attribute added. A short, concise title. Does not support any formatting! (= DC.Title) General Note on DublinCore translation: In addition to those that can bee transformed from UBIF metadata, an additional DC.Type='dataset' should be added. Free-form text containing a longer description of the project. (= DC.Description) Free-form text describing geographic, taxonomic, or other coverage aspects of terminology or descriptions available in the current project. (= DC.Coverage) Optionally an image media resource containing an icon/logo symbolizing the project. [ATTR: ref] URL pointing to an online source related to the current project, which may or may not serve an updated version of the terminology or descriptions. === Proxy data objects (representing external resources) and references to these objects: Collections of non-abstract data proxy elements, forming an interface to potentially existing more object representations Class (biology: taxon) names used in the project. Each proxy object contains a name - either locally defined or representing an external resource defined in a linking mechanism and defines a local id attribute that may be referred to multiple times from within the project. Biology: Object in a nomenclator [ATTR: id] Optional hierarchy (= tree, biology: taxonomy) of classes defined above. A hierarchy may be incomplete, i. e. some ClassName object may not be in the hierarchy. ClassHierarchies may be locally defined or represent an external source. Biology: Taxonomic hierarchy, or arbitrary set of taxa. [ATTR: id] Units are physical objects (biology: specimens) that are collected, described, or observed. In biology a collected object is often called a specimen. Biology: Object in a collection (= specimen) or an observation. Units may be identified or assigned to a Class name. [ATTR: id] Documentation of persons/organizations involved in the authoring, compiling, editing, etc. of the data set. @@ The specific elements are only a preliminary sketch, this should be synchronized with TDWG ABCD! [ATTR: id] Publications used in the project, defined through proxy objects (= local or external link, see under Agents). Printed or digital publication (including database source) [ATTR: id] Geographical locations (often country names, but potentially on any level), defined through proxy objects (= local or external link, see under Agents). An example of an external gazetteer referred to is the TDWG Geography standard. [ATTR: id] Resource definitions containing links like URLs or actually embedding the resource (e. g. encoded images). These are proxy objects (= local or external link, see under Agents). [ATTR: id] Measurement units like mm-square, °C, ml, pH, and dimensionless scaling factors like %, promille. [ATTR: id] Abstract base type for proxy objects representing external resource objects (publications, class names, specimens, etc.). Provides a free-form label (this may be locally defined and the only data item if no external object is available) plus an ID-based link to an external object. Human readable representation. This may be the only data item if no machine readable ObjectLink exists. Example for a publication: "Smith 1998. Flora of Erehwon, XY Publishers." Even if an external ID exist, the Label is required. It preserves the semantics of the proxy object (= keep interpretable by humans) even if the machine-readable object links are broken. Label should be updated automatically (without human control) only after a human decided that the semantic management of an external object provider can be fully trusted. Some Labels like scientific taxon names or publication references can be expressed more or less language-independent, others like geographic names are always language dependent. @@Discussion neccessary: language type is currently extended with neutral and unknown codes ('-', '?'), is this necessary?@@ The Abbreviation element provided is not necessary for all proxies, but especially useful for class names (e. g., for tabular reports) and publication abbreviations (author/year style). Defines an ID of an external object or one to several services providing it. The format in which the object is returned is undefined and needs to be interpreted by the receiving application. Ideally, common standards (TDWG, MARC, etc.) should be used. === Class names (biology: taxon names): Used for class names (biology: taxon names). Provides a locally defined simple free-form text plus an optional link to an external resource object. This may be changed to allow entering a structured form of taxonomic names (Genus/Higher taxon, rank, optional specific/infraspecific epithets, authors). However, note that simply splitting into taxon name and authors does not work, because authors may be in the middle of the parts of the taxon name (e. g. in botanical autonyms). Currently the development of the TDWG taxon names standard should be awaited first. Note that Class names are not restricted to accepted names (also referred to by Synonyms in ClassHierarchyNode type) Defines an element with a ref attribute pointing to a ClassName in ExternalDataInterface (in biology: Class = Taxon) Refers to a class name (biology = 'taxon'; ExternalDataInterface/ClassNames/ClassName) A collection of ClassRef type elements Reference to a class name (in biology = taxon name) defined in ExternalDataInterface/ClassNames [ATTR: ref] === Class hierarchy (biology: taxon concepts): Used for class hierarchies (taxonomies) For example, SDD supports taxonomic (order/family/genus etc.) and non-taxonomic (weed species, diseases, herb/shrub/tree) hierarchies. For many analytical purposes it is relevant whether a hierarchy is based on phylogenetic (= evolutionary) relatedness or whether it is an operational categorization. Note: a conventional taxonomic hierarchy should be considered phylogenetic until proven to be not. Root of the recursive tree A node in a class hierarchy tree (biology: taxonomical hierarchy) A node either contains a class reference (biology: taxon) and optionally (if it is a higher level class) further child Nodes, or it is anonymous and contains only further child Nodes. Nodes may not be empty. (The complex choice/sequence expresses the A, or B, or A and B constraint which is difficult to express in xml-Schema.) The class (biology: taxon; with optional synonyms) that identifies the node. Refers to a class name (in biology a taxon name) [ATTR: ref] Rather specific to biology: Taxa above rank of species have a lower taxon by which they are typified. Rather specific to biology: Taxa of species rank or below have a physical unit (specimen) by which they are typified. Collected and preserved unit(s) (biology: specimens) by which the name is typified. (The expression of synonyms may be essential for reports and to convey the concept of a class to information consumers.) If class identification is present, further nodes are optional. The class identification may be missing, but then further Nodes are required. A collection of objects with ClassHierarchyNode type === Units (biology: specimen, 'Objects' in earlier versions of SDD): Used to define objects that are collected, described, or observed (collected objects may be preserved permanently in a specimen collection). In biology a collected object is often called a specimen. Provides either a simple free-form descriptive label ('so-and-so in freezer 14, with tag 1233'), or a link to an external collection unit. Note that the term 'Unit' as used here has no relation to 'measurement units' or 'organization units'. @@ SomeElementsAnalyzedBySDD: These are just the preliminary elements identified by SDD to be necessary as local extensions. A decision needs to be made, compare the DWC-based present in an alternative interface group! @@ Identification of specimen object. The information may come from the service provider. If the service provider only provides a name, this must be compared with and if necessary added to the list of ClassNames so that a ClassName reference may be used here. This may point to a higher taxon (family, order, or even "plantae") to indicate incomplete, broad identifications. [ATTR: ref] Default is 'certain'; 'Abies cf. alba' would be recorded as 'uncertain'. False = object has not been collected and preserved (it may still be databased in an observation database and have an ExternalID!). The default for this element is true, i. e. if the element is missing the object has been collected/preserved. Defines an element with a ref attribute pointing to a Unit (biology: observation or specimen) defined in ExternalDataInterface. Refers to a Unit object identifier (biology = 'specimen') Extension of UnitRef with a required type status attribute (NomenclaturalTypeStatusOfUnitsEnum) The type status of a unit (biology: specimen). See the enumerated type for further information. === Publications, references, and citations: Used for resources like publications, laboratory notes, speeches, etc. Provides either a simple free-form text, or a connection to an external resource. Defines an element with a ref attribute pointing to a Publication (ExternalDataInterface/Publications/Publication) --- The following types build on the PublicationProxy infrastructure: Combines a publication resource reference with a detail location within that reference (esp. page number) Refers to a publication as defined under ExternalDataInterface/Publications [ATTR: ref] Location within publication where the cited data can be found: Page, table, figure number, database record, html document bookmark, etc. (Note: this is not the page range of the entire article!). If publication is a non-persistent web resource that may change or disappear, the date at which the citation was verified to be appropriate should be recorded. It may later be updated, but not through a link checker verifying only technical access: the semantics of the citation have to be verified! If publication is a non-persistent web resource that can not longer be verified, the date it was found to have disappeared (or became semantically inappropriate) may be recorded. === Agents (persons, organization, software agent): Used for Agent documentation (an Agent is a person, project, organization, or software agent). Currently used for authors, editors, contributors, and translators. Ideally it connects to an outside definition or documentation of the Agent. Abstract base type for AgentRef and MicroAgent. The ref attribute is optional here! Reference to a Agents (ExternalDataInterface/Agents/Agent) Defines an element with a required ref attribute pointing to an Agent (ExternalDataInterface/Agents/Agent) Makes the optional base type attribute required. --- The following types build on the AgentProxy infrastructure: Extension of AgentRef with a role attribute and three attributes recording object-specific contributions. The first time an agent (creator or contributor) has edited/made a contribution to an object. If a creator has contributed both as an author and later as an editor of data, two references in these two roles will exist and the contribution dates will be recorded separately. The number of contributions by a specific agent (editing, revising, adding to an object). A collection of RichAgentRef elements. (The sequence of elements in instance documents should be preserved. Within each role it is mandatory. Different roles may, however, be reported in separate sequences.) [ATTR: ref, role, ...] Restriction of RichAgentRef to Creator roles only. Collection (sequence) of Agent elements of type CreatorRef (The sequence of elements in instance documents should be preserved. Within each role it is mandatory. Different roles may, however, be reported in separate sequences.) [ATTR: ref, role, ...] Restriction of RichAgentRef to Contributor roles only. Collection (sequence) of Agent elements of type ContributorRef (The sequence of elements in instance documents should be preserved. Within each role it is mandatory. Different roles may, however, be reported in separate sequences.) [ATTR: ref, role, ...] Restriction of RichAgentRef to Owner roles only (contribution attributes prohibited). Collection (sequence) of Agent elements of type OwnerRef (The sequence of elements in instance documents should be preserved. Within each role it is mandatory. Different roles may, however, be reported in separate sequences.) [ATTR: ref, role] RevisionData (creators, dates, revision) for the entire project/data set or individual objects. If RevisionData exist at all, at least one creator(author or editor) is required. (= DC.Creators) General contributors, or translators. (= DC.Contributors) @@Request for discussion: Translator-Contributors are currently not listed on individual Representation elements. Only a general statement about all translations together can be made. Should this be changed? Also: should one Representation be marked as 'Original/ SourceForTranslation'? @@ Date/time when the intellectual content (project, term, description, etc.) was created. Applications may initially set this to the system date for new data objects, but authors must be able to change it to an earlier date if necessary. If for legacy data this is imprecisely known, it may be missing here. Earlier versions in other data formats should then be mentioned in the copyright or acknowl. statements. (= DC.Date.Created) Date/time when the last modification of the object was made. If in online data sources the provider can not assess this, the current date/time may be substituted. For legacy data this may be set to the file date of imported data, or estimated. (= DC.Date.Modified) === Geography: Used for resources like geographical names or places. Provides either a simple free-form text, or a connection to an external resource. Defines an element with a ref attribute pointing to a Locality (ExternalDataInterface/Geography/Locality) A collection of LocalityRef-type elements. The sequence of elements in instance documents is semantically irrelevant and may be changed. Reference to a locality defined in ExternalDataInterface/Geography/Locality [ATTR: ref] === Media (especially images, audio/video): Extends resource proxy type with optional encoded data content (esp. images embedded in xml document) and with a Type (Image/Audio/Video, etc.). Type of medium, based on DCMI Type vocabulary (= DC.Type) An optional caption for a resource, esp. if it will be presented embedded in another document. Captions can be provided in multiple languages. Differs from the resource Label, wihich is closer related to a 'title'. @@ Issue: captions, even in multiple languages, may be obtained from the service provider. Even then it may be desirable to override them! Do we need two collections: InheritedCaption and CaptionOverride? This seems to be awkward whenever there is no ServiceProvider! Also, Label can contain a "title" only in a single language! @@ Creators, Revision status, and dates for the media resource Entities having legal possession of the data collection content. Owners are defined only for the entire data collection, not for individual descriptions etc. (= http://www.loc.gov/ marc.relators/own) Copyright, terms of use, license and other IPR-related statements like disclaimer or acknowledgement. Giving a copyright statement and a (if possible public) licence is highly recommended! (=DC.Rights) [ATTR: language] Optionally the full resource data may be embedded (as an alternative or in addition to defining a URI). Note: A resource like an image should be directly encoded, i.e. not wrapped into a MIME object first. Defines an element with a ref attribute pointing to a MediaResource (ExternalDataInterface/MediaResources/MediaResource) A collection of MediaResourceRef elements. The sequence of elements in instance documents is semantically relevant and should be preserved. (the sequence in instances is informative!) [ATTR: ref] === Measurement units: Provides an extensible definition mechanism for measurement units like meter, mm, µm, liter/litre, °C, m/s, etc. May also be used dimensionless scaling factors like %! Label contains a language/culture- specific long form of the measurement unit, e. g., 'liter' (en-us) or 'litre' (en-uk) for 'L.' Label and InternationalAbbreviation text allow some xhtml formatting to support, e. g., "mm2". Note: "International Standard ISO 31 (Quantities and units), 1992 may be relevant here, but it seems not available online. Printed version: ISO Standards Handbook: Quantities and units. 3rd ed., International Organization for Standardization, Geneva, 1993, 345 p., ISBN 92-67-10185-4, 182.00 CHF. A useful online resource is http://hem.fyristorg.com/ojarnef/fys/ metric-units-comp.txt A scientific abbreviation considered language and audience independent. It may contain formatting to express "mm2". Note that the Abbreviation element available in most label types does not support formatting! True if unit is SI unit or a derived unit acceptable in scientific publications. False for local/historical units like feet and velocity in fathoms per fortnight :-). True indicates that unit should be output before the value (as in 'pH 7.0'). Default is false. Describes relations to other units that can be expressed through a simple multiplication factor (i. e. not cubic meter = meter * meter * meter, or Celsius to Fahrenheit) @@ Do we really need multiple relations or is a single relation to the base unit sufficient? @@ Ideally the relation should always be defined towards the base unit, e. g., km, cm, mm, µm all to meter. Multiply current unit with this factor to obtain related unit referenced above. Refers to a MeasurementUnit (attribute ref is required) Abstract base type for MeasurementUnitRef and MicroMeasurementUnit. Here the ref attribute is optional! ref refers to a measurement unit id (Terminology/General/MeasurementUnits) Provides a minimalized measurement unit identified through a local (and presumably international) abbreviation - together with an optional Measurement Unit proxy reference (ref attribute). A scientific abbreviation considered language and audience independent. It may contain formatting to express "mm2". === Public objects carrying a key also generally provide for developer annotations/comments (undefined language), version extensions for future versions of UBIF, and custom extensions (= "application annotations"). === Key/ref infrastructure for linking within a data set: This allows to define (and redefine) the value type for keys and keyrefs Note: the use of attribute groups instead of globally defined and referred attributes is a work-around for problems occurring with attribute definitions in included library schemata. The use of global attributes by ref caused validation or namespace problems, even though this library has no target namespace (chameleon pattern); Spy 2004.4 says, e. g., ... attributes that need to be qualified because your schema uses attributeForm = qualified or global attributes. You must specify a prefix for your schema namespace. === Options to link using URLs or GUID + resolving mechanisms (used especially for UBIF data proxies): The object linking mechanisms used by the ProxyBase type may also be used by other objects! LifeScience ID (without the constant prefix 'urn:lsid:'). 3 to 4 parts separated by colon, the 1st part is the url of a life science authority service that provides metadata on how to obtain the object references in part 2 (namespace = data collection), 3 (object ID) and 4 (optional object version). Example: lsid.gbif.org:DataCollectionID:ID/1§31~b+:v2 Digital object identifier (an ID scheme advanced by the library community). A URL directly providing an object representation. In contrast to the URN types LSID or DOI this should resolve directly. The URL may be a query string (with ID embedded), for example: "http://x.y.fr/pub/au=smith?yr=1998". In the case of URLs multiple definitions may be defined to reduce the likelihood of failure. [The element sequence in instance documents is informative and should be preserved.] === Basic type library: === Basic generic types normalized string required to contain at least 1 character (this removes the xml string anomaly, i. e. either element/attribute may be optional, but if they are required the content may not be an empty string) normalized string restricted to 1..50 character length to be used for abbreviations (the recommended length of abbreviations is usually much shorter, but 50 characters should be a normalized string restricted to 1..255 character length (i. e. required, may not be empty string) Double precision numeric value in the range of [0..1] Colors defined as RGB (red-green-blue) values combined as hex-encoded into a string, like in html. Example: #EE88FF. Colors may also be expressed as HSV (hue-saturation-luminance), but this is convertible to RGB. RGB is preferred because it is used in HTML. Derived string type with restricting patterns Compare LSID, this omits the prefix 'urn:lsid:' Digital Object Identifier (standalone, not embedded into URI syntax) String containing a format pattern of the type used in the xslt format-number function === The following Range, Date, and Coordinate types describe frequently recurring simple type combinations in a element with attributes -- Element with 2 attributes to define a range: Lower and upper value as required attributes (no default values) Contains lower/upper estimate attributes; used, e. g., for certainty and frequency! The default values are 0 and 1, indicating that no estimate was possible. -- RGB color polygon expressed as a list of RGB values (these should form a single polygon when connected, which is not validated in the schema!) A single color value or a color polygon defining an area in color space (i. e. not a spatial polygon having a color!) A single point in color space, or multiple points forming vertices of a polygon area in color space. When using a polygon this defines an estimated color range into which the single or variable true color values of the object fall. -- Types for composite gregorian calendar date/time (points in time where parts may be missing; following the seven property model described, e. g., in xml Schema 1.1 (http://www.w3.org/TR/2004/WD-xmlschema11-2-20040716/#theSevenPropertyModel). Instead of gYear, gMonth, gDay integer types with constraining facets are used for two reasons: a) each of them may have a timezone, which may lead to inconsistent data with multiple timezones; b) the lexical representation seems to be occasionally poorly implemented (e.g. where '31', or '---5' are accepted, whereas valid examples are '---31', '---05', and '---05+02:00'). In addition to the seven property model additional text attributes for either unsharp additions or complete verbatim dates are added. Note that incomplete dates in most cases are calendar specific and incomplete non-gregorian dates can not be expressed. Furthermore, for complete dates it may be unclear whether a reformed or unreformed date has been used (e.g. in Russia in the 19th century). Date separated into attributes so that any part of the date may be missing [ATTR: year = four digit year; month = two digit month of year; day = two digit day of month verbatim = unparsed textual date representation supplement = text additional or modifying the exact dates, e. g., 'end of summer', 'first half or year', 'first decade of month', '1888-1892'. timezone = expressed as integer according to the xml schema seven parameter model The four digit year in the Gregorian calendar (in Western cultures usually without a suffix or with 'AD/Anno Domini', 'CE/Common Era'; negative years with 'BC/Before Christ', 'BCE/Before Common Era'). Whether a year 0 is used or not differs between a true Gregorian calendar and recent astronomic usage, xml schema is likely to change its position, see xml schema draft 1.1. Thus database designers should not use 0 as a missing value representation for year. two digit day Text in addition to or modifying the exact date components, e. g., 'end of summer', 'first half or year', 'first decade (of month)', '1888-1892'. An uninterpreted text representation of the original date information (date range, 'summer', perhaps unreformed Russian dates, etc.); as close as possible to the (digital/printed/handwritten) information source. Timezone expressed in minutes. In the seven property model (http://www.w3.org/TR/2004/WD-xmlschema11-2-20040716/#theSevenPropertyModel) the timezone has a range of +/- 14 hours (14 * 60 = 840 minutes). Date + Time separated into attributes so that any part of the date may be missing. [ATTR: see CompositeDate type, plus: time] '24' may only occur if both minute and second are zero (http://www.w3.org/TR/2004/WD-xmlschema11-2-20040716/#theSevenPropertyModel). The normal range should be 0-59, but 60 may occur for UTC leap-seconds (http://www.w3.org/TR/2004/WD-xmlschema11-2-20040716/#theSevenPropertyModel). An additional validator may choose to validate this. The simplest validation would attempt to convert those Composite date instance that containing all seven elements to a xs:dateTime value. === Extension of xs:language and a reference element using Language Union of xs:language with '-' for language-neutral (e.g. scientific names) and '?' for unknown. Language follows RFC 3066 'Tags for the Identification of Languages': a two-letter code taken from ISO 639 part 1 or a three-letter code taken from ISO 639 part 2, followed optionally by a two-letter country code taken from ISO 3166. (Notes: When a language has both a two-letter and three-letter code, use the two-letter code. RFC 3066 replaces RFC 1766.) Defines an element with a required 'language' attribute Attribute for Language, used by-reference Complex types that add attributes 'language' or 'preferred' to the simple types String, String255, anyURI: Note: the use of attribute groups instead of globally defined and referred attributes is a work-around for problems occurring with attribute definitions in included library schemata. (single 'language' attribute) Attribute for Language, used by-reference (single 'language' attribute) Attribute for Language, used by-reference (single 'preferred' attribute) Elements with preferred = true indicate recommendation by the data provider. The consumer may have reasons to make a different choice. Note on current usage: these types are used by ABCD and UBIF, but not by SDD (which uses mostly audiences instead of language) === Some text data support limited xhtml. (Could appropriate elements from xhtml be imported and encapsulated here?) Collection of language-specific label representations Language-specific label representation [ATTR: language] Language-specific simple label, using simple formatted text Label text in a specific language. Restricted to 50 characters maximum length, including blanks (recommended to be shorter!). Label abbreviations are especially important when displaying information in a tabular format. Collection of language-specific label representations Language-specific label representation [ATTR: language] LabelRepr with short inherited Text extended with longer Details text. Optional text of unconstrained length, elaborating details of the ShortText === Statements are a special form of complex text expressions Text, optional Details (both free-form text) and optional URI. A concise representation of a statement (copyright, acknowledgement, etc.). Recommended to be as short as possible, but actual length is unconstrained. Optional text of unconstrained length, elaborating details of the ShortText An optional resource on the net providing details on the statement (may be used as an alternative to the long text). A sequence of various intellectual property right (= IPR) statements, with a language attribute on the entire sequence. Other forms of IPR declaration not yet covered (e.g., database rights); also used in cases where an automatic converter can not decide whether a statements is copyright, licence, etc. Copyright may include the information that the data has been released to the public domain. To be used if data are placed under a public license (GPL, GFDL, OpenDocument). Placing data under a public license while maintaining copyright is recommended! (= DC.Rights.Licence; new 2004) Defines conditions under which the data may be analyzed, distributed or changed. "Terms of use" includes concepts like "Usage conditions" and "Specific Restrictions". Disclaimer statement, e. g. concerning responsibility for data quality or legal implications. A free form text acknowledging support (e. g. grant money, help, permission to reuse published material, etc.) === The following types are currently unused (August 2004), but may be used in the future or by other standards. === Enumerations to support interoperability. THE ANNOTATIONS ARE HERE REMOVED, please open the full files or see the provided html documentation! Internal formatting note: Annotations of individual enumerated values should be written as ^"short label" + " -- " + "detailed information". An xslt transforms such schema annotations into a data document that can directly be used in user interfaces. === Complex types referring to UnivarStatMeasureEnum (used e.g. by SDD): Reference to a univariate statistical measure (without parameter) Refers to an enumerated value in the UBIF type, declaring which kind of statistical measure has been used. Reference to a univariate statistical measure (with 1 parameter) Refers to an enumerated value in the UBIF type, declaring which kind of statistical measure has been used. Reference to a univariate statistical measure (without parameter) plus a numeric value Reference to a univariate statistical measure (with 1 parameter) plus a numeric value This is a parameter value that further defines the univariate statistical measure. Example: for a percentile (ref='PercLower'), '0.10' would define the 10%-percentile.