XML schema to encode descriptive data in biology and other subjects. The primary goal of the design is to increase the knowledge and availability of knowledge about the diversity of life on earth. However, it may be used in many other areas (including medicine, pathology, archeology, anthropology) wherever objects or classes of objects are described for later reidentification.
The schema was designed by the Structure of Descriptive Data (SDD, http://160.45.63.11/Projects/TDWG-SDD/index.html) group. SDD was established 1999 as a subgroup of the Taxonomic Databases Working Group (TDWG, www. tdwg.org) of the International Union of Biological Sciences (IUBS). The author of the current schema version and of all annotations is G. Hagedorn, Berlin. The requirements for an SDD schema where elaborated in 6 major meetings of the SDD group and in discussions over the SDD email list. Over 60 people contributed to these discussions. However, the help, criticism and energy of Bob Morris, Kevin Thiele, Bryan Heidorn, Guillaume Rousse, Steve Shattuck, Donald Hobern, Trevor Patterson and Nicolas Bailly is specially acknowledged!
Copyright © TDWG, 26. June 2004. Licensed under GNU GPL 2 (http://www.gnu.org/licenses/gpl.html) - with the following restriction: This is a preliminary version (0.91!) for testing purposes. Permission to use this schema is granted to all scientific or commercial projects for a testing period of up to 3 years. After this time computer programs using this schema must either be discontinued or converted to the final version of this schema.
Conventions:
Element or attribute names starting with underscores (__) are present in the schema for discussion purposes only and should be only experimentally used. Annotations containing @ indicate unfinished points of discussion.
Note: blockDefault="#all" in xs:schema prevents substitution and that in instance documents derived types can be used in elements typed to the base type (which otherwise is possible using xsi:type=""). - finalDefault is not set, further type derivation is currently not considered problematic. Please contact us if you believe otherwise. Note that according to the w3c discussion forum, the developers of xml Schema consider to drop the final attribute in the upcoming XML Schema version 1.1. - Nillable: xsi:null is not supported in SDD documents (schema declaration nillable="false" is default, not explicitly stated).
Unified Biosciences Information Frameword (UBIF) XML schema for data exchange and integration across knowledge domains. The schema has been design for biological data, but is applicable to other knowledge areas as well. It is based on work of the TDWG SDD and ABCD subgroups and currently jointly authored by the SDD, ABCD, TaxonName subgroups and by GBIF (Global Biodiversity Information Facility). The framework may be used without changes for new schemata, no registration is necessary. Its main features are:
* A foundation of shared simple and complex types, including some enumerations to simplify world-wide data integration and interoperability across language barriers.
* A top-level structure of Datasets collections containing independent Dataset objects. The collection is purposely semantically neutral; relations between Dataset have to be discovered by the data consumer or are assumed to be implicit in the protocol requesting the data.
* Derivation metadata that support tracing and debugging the online transformation history data. They provide important technical information about access providers and the path of potentially multiple portals involved.
* Metadata describing the principal data collection from which the dataset was derived. The dataset may represent the entire source dataset or it may be filtered, normalized, or enriched with secondary information. A dataset is never an aggregation of multiple data collection sources with different authorship, copyright, or other IPR; these are assumed to be delivered as separate datasets. Note: Derivation and content/source metadata together provide all necessary information for UDDI support.
* External data interface (EDI) providing a standard mechanism to link to external data providers for knowledge domains outside of the scope of the current dataset. This includes a collection of supported object linking mechanisms involving globally unique identifiers and resolving mechanisms. Proxy objects can replace a links in cases where a specific object is (perhaps not yet) available in an external data source, and they cache a minimalized data interface on the assumption that access is asynchronous, slow, or may be temporarily unavailable. Furthermore, these cached data provide semantic information to human readers, preserving the semantics of a link even if it has become permanently broken.
* A single "payload" element which must come from a different namespace. Note that within a Datasets collection each Dataset object may have a payload from a different external schema. It is the responsibility of the consumer to decide which dataset payload it is interested in or can process.
Conventions: Element or attribute names starting with underscores (__) are present in the schema for discussion purposes only and should be only experimentally used. Annotations containing @ indicate unfinished points of discussion.
Note: blockDefault="#all" in xs:schema prevents that in instance documents derived types can be used in elements typed to the base type (which otherwise is possible using xsi:type=""). - finalDefault is not set, further type derivation is currently not considered problematic. Please contact us if you believe otherwise. Note that according to the w3c discussion forum, the developers of xml Schema consider to drop the final attribute in the upcoming XML Schema version 1.1. - Nillable: xsi:null is not supported in UBIF documents (schema declaration nillable="false" is default, not explicitly stated).
Copyright © TDWG (Taxonomic Databases Working Group, www. tdwg.org), 20. July 2004. Licensed under the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version (http://www.gnu.org/licenses/gpl.html). Schema designed and annotations authored by G. Hagedorn & W. Berendsohn, Berlin with help from members of the SDD, ABCD, TaxonName subgroups.
Codes for sex value in humans (clinical status) or animals. The codes are largely based on those defined in DICOM (Digital Imaging and Communications in Medicine, http://medical.nema.org/, Coding Scheme Designator DCM Version 01, PS3.16 Annex B, CID 7455) and ASTM E1633 (= "Standard Specification for Coded Values Used in the Electronic Health Record. Document Number: ASTM E1633-02a. ASTM International, 10-Nov-2002, 76 pages"). Additional codes specific to biology have been added.
An alternative standard is ISO 5218, which provides only four codes: "0 = Not known, 1 = Male, 2 = Female, 9 = Not specified". The difference between 0 and 9 is: "(0) implies that the sex of the person is not provided in the personal details i.e. the data has not been supplied and sex cannot be ascertained from the data provided"; "(9) implies that the sex of the person cannot be determined for physical reasons, e. g. a new born baby". ISO 5218 contains fewer and less intuitive codes. For biological purposes many codes would have to be arbitrarily added. G. Hagedorn, 10. August 2004
This list is a first version of a constrained vocabulary to express typifying relations between taxonomic names and units (specimens or objects preserved in collections). Beyond those type categories explicitly governed by nomenclatural codes (Zoology, Botany, Bacterioloy, Virology), the list also includes some additional type status terms. These categories may be helpful when interpreting the original circumscription (topotypes, ex-types), but do not have the same binding status as terms governed by the nomenclatural codes. The enumeration attempts to strike a balance between listing all possible terms, and remaining comprehensible. In general, including too many terms was considered less problematic than omitting terms. Applications may easily select a subset for presentation in their user interface.
This list is intended as a first version and it is hoped that in the review process through TDWG it will achieve sufficient maturity to be truly useful. It is expected that over time revisions will have to be made. Please use the WIKI (http://efgblade.cs.umb.edu/twiki/bin/view/UBIF/NomenclaturalTypeStatusOfUnitsDiscussion) to discuss the current list and the lists of synonymous, doubtful, or excluded type terms provided therein.
Some background information: A type provides the objective standard of reference to determine the application of a taxon name. The type status of a unit (specimen) is only meaningful in combination with the name that is being typified (a unit may have been designated type for multiple names in different publications). The type status of an object may be designated in the original description of a scientific name (original designation), or - under rules layed out in the respective nomenclatural codes - at a later time (subsequent designation). -- For taxa above species rank the type is always a lower rank taxon (e. g., species for genus, genus for family). The type terms for this situation are not included in the enumeration. Ultimately, typication of all taxa goes back to physical type units, but this should not be recorded as such in data sets. The indirect type reference in higher taxa means that typification changes to the lower taxon automatically affect the higher taxon.
The exact definitions of type status differ between nomenclatural codes (ICBN, ICZN, ICNP/ICNB, etc.). The term definitions are intended to be informative and generally applicable across the different codes. The should not be interpreted as authoritative; in nomenclatural work the exact definitions in the respective codes have to be consulted. A duplication of status codes (bot-holo, zoo-holo, bact-holo, etc.) is not considered desirable or necessary. Since the application of the type status terms is constrained by the relationship of the typified name with a specific code, the exact definition can always be unambiguously retrieved.
The following publications have been consulted to determine the number of type terms that should be included and to prepare the semantic definitions:
Many thanks for review and help to Dr. Miguel A. Alonso-Zarazaga and Dr. Walter Gams. Gregor Hagedorn, 13.7.2004
Enumerated codes to express the rank of a taxon (scientific organism name) in a taxonomic hierarchy. The list is intended to be interoperable between name providers for bacteria, viruses, fungi, plants, and animals. It is not assumed that in each taxonomic group all ranks have to be used. Individual applications may select appropriate subsets (which may be based on information given inside the enumerated values, see Specifications/BioCode-, Botany-, Zoology-, and BacteriaStatus). The enumeration attempts to strike a balance between listing all possible rank terms, and remaining comprehensible. For example, the "infra-" ranks specifically mentioned in BioCode have been included (although very rarely used), but the additional intermediate zoological ranks (micro, nano, pico, etc.) are not included. Whether the selection of infraspecific ranks (some informal ranks, esp. from bacteriology, may be missing!) probably needs some discussion. However, it is believed that this list may help to start developing data sets that can easily be integrated across the barriers of language and taxonomic traditions.
The following publications have been consulted to determine the number of type terms that should be included and to prepare the semantic definitions:
Many thanks for review and help go to Dr. Walter Gams.
Note: the list of all ranks is implemented as a union of all following rank subsets. Note that although BioCode has been used to define the partition into subsets, the ranks are not limited to BioCode but should be an interoperable superset of ranks used in Virology, Bacteriology, Botany and Zoology.