### Copyright © TDWG (Taxonomic Databases Working Group, www. tdwg.org), 2004. This file is a special version of the Unified Biosciences Information Frameword (UBIF) XML schema. It may be used only for viewing convenience and may not be distributed independently from the primary schema files (UBIF.xsd, UBIF_TypeLib.xsd, etc.). The inclusion of all parts starts below: !###

Unified Biosciences Information Frameword (UBIF) XML schema for data exchange and integration across knowledge domains. The schema has been design for biological data, but is applicable to other knowledge areas as well. It is based on work of the TDWG SDD and ABCD subgroups and currently jointly authored by the SDD, ABCD, TaxonName subgroups and by GBIF (Global Biodiversity Information Facility). The framework may be used without changes for new schemata, no registration is necessary. Its main features are:
* A foundation of shared simple and complex types, including some enumerations to simplify world-wide data integration and interoperability across language barriers. * A top-level structure of Datasets collections containing independent Dataset objects. The collection is purposely semantically neutral; relations between Dataset have to be discovered by the data consumer or are assumed to be implicit in the protocol requesting the data.
* Derivation metadata that support tracing and debugging the online transformation history data. They provide important technical information about access providers and the path of potentially multiple portals involved.
* Metadata describing the principal data collection from which the dataset was derived. The dataset may represent the entire source dataset or it may be filtered, normalized, or enriched with secondary information. A dataset is never an aggregation of multiple data collection sources with different authorship, copyright, or other IPR; these are assumed to be delivered as separate datasets. Note: Derivation and content/source metadata together provide all necessary information for UDDI support.
* External data interface (EDI) providing a standard mechanism to link to external data providers for knowledge domains outside of the scope of the current dataset. This includes a collection of supported object linking mechanisms involving globally unique identifiers and resolving mechanisms. Proxy objects can replace a links in cases where a specific object is (perhaps not yet) available in an external data source, and they cache a minimalized data interface on the assumption that access is asynchronous, slow, or may be temporarily unavailable. Furthermore, these cached data provide semantic information to human readers, preserving the semantics of a link even if it has become permanently broken.
* A single "payload" element which must come from a different namespace. Note that within a Datasets collection each Dataset object may have a payload from a different external schema. It is the responsibility of the consumer to decide which dataset payload it is interested in or can process.

Conventions: Element or attribute names starting with underscores (__) are present in the schema for discussion purposes only and should be only experimentally used. Annotations containing @ indicate unfinished points of discussion.
Note: blockDefault="#all" in xs:schema prevents that in instance documents derived types can be used in elements typed to the base type (which otherwise is possible using xsi:type=""). - finalDefault is not set, further type derivation is currently not considered problematic. Please contact us if you believe otherwise. Note that according to the w3c discussion forum, the developers of xml Schema consider to drop the final attribute in the upcoming XML Schema version 1.1. - Nillable: xsi:null is not supported in UBIF documents (schema declaration nillable="false" is default, not explicitly stated).

Copyright © TDWG (Taxonomic Databases Working Group, www. tdwg.org), 20. July 2004. Licensed under the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version (http://www.gnu.org/licenses/gpl.html). Schema designed and annotations authored by G. Hagedorn & W. Berendsohn, Berlin with help from members of the SDD, ABCD, TaxonName subgroups.

The Datasets collection is the only root element allowed in UBIF: Root element for files or data streams. Multiple Dataset objects are completely independent. Potential relationship may be detected by the consumer, but are not expressed in the UBIF format. The sequence of Dataset objects has no semantics and does not have to be preserved. The version of the UBIF standard used is defined in the namespace declaration and needs no separate data element. A single file or data stream may consist of multiple data sets A history (tree) of all automatic or semi-automatic data derivations (transformations) through computer programs: database export, filtering, merging, or unmodified data provision through portals. The elements immediately in this element describe the process that created the current xml document. [ATTR: datetime (= When was it done?), gooduntil (= caching interval)] Data from other knowledge domains to which the data set refers may be represented by collections of proxy data objects. In the absence of available external databases a proxy object may be used as a local placeholder. The data inside the proxy object usually provide a reduced interface data model that abstracts from a potentially more complex external data model. Examples: persons, publications, geographical localities, media resources, but also class names (biology: taxa) and objects/units (biology: specimens). Metadata referring to the principal source of the entire data collection (the metadata scope may be wider than the objects actually contained in the data set). The 'payload' of the dataset exchanged using UBIF. At this point a new namespace is defined (and usually the default namespace is redefined). Note that if an xsi:schemaLocation is desired, it should not be defined here but added to an xsi:schemaLocation attribute in the Datasets root element. Example from SDD instance document: <DescriptiveData xmlns="http://www.tdwg.org/2004/SDD"> === Data derivation, transformation, and derivation history: Describes the providers and application/ script(s) that produced the current data set, plus a derivation history of all automatic or semi-automatic transformation with negligable or automated content changes. Derivation examples: a) Generation of file from a database, b) Adding/removing data to/from an existing UBIF xml file, c) Passing data through a portal without intentionally changing any data. The information provided here is intended to a) facilitate debugging b) react to known deficits of generators, esp. if generators produce syntactically correct but semantically faulty data (misapplication of data elements, etc.) c) evaluate the quality and scope of archived data, especially whether the data contained in the document are complete or an excerpt from a larger data set. d) inform about options to update/refresh data [ATTR: datetime (= When was it done?)] = Date and time (UTC or local time with timezone information) at which the current document or data stream was created by the generator. Using UTC (Universal time coordinates = Greenwich mean time) is recommended. [ATTR: gooduntil (= information about expiration of validity for caching purposes.] Which tool did it? Metadata about the software (application, script, etc.) that performed the derivation/transformation. [ATTR: name, version, notes, routine] (Detailed attribute annotations exist, but are not visible in graphical schema view!) Name of the application performing the transformation. The term 'application' should be understood in a loose sense; it may be a script that is not part of a larger application (compare the Routine attribute, which may provide the detailed name of scripts that are part of an application!). Version of the application that has generated this document. The attribute should not be named 'Version' to avoid confusion with the version of the content (see content Metadata). Additional information about the generating application that is not part of the name or version. If the copyright of the generating application is specified, it should be understood that this does not affect the content copyright of the data. Optionally allows a generating application to identify which of possibly multiple transforming routines (database code, xslt, etc.) was used. This attribute may also be used, to identify different conditions under which the export routine may behave differently. Who did it? Technical contacts are those to whom questions about accessibility of a provider or resource should be directed. Who did it? Administrative contacts [= Content contacts] are those to whom questions and feedback about data, or restrictions on use of the data should be directed. Optional description of the derivation actions, acknowledgement, copyright, etc. statements. The statement should be complete and identify the speaker (Technical/AdministrativeContact should not be expected to be displayed). - This is the only item in Derivation expected to be displayed on web reports addressing the general public. All other items in Derivation are normally displayed only on technical pages. -- Note: Claiming copyright/database rights on derivations may interfere with the usability of data and is not recommended. Care must be taken to avoid violating the rights of holders of the original content copyright! The derivation history includes all automatic or semi-automatic transformation with negligable or automated content changes. It does NOT include the history of content revisions and expansions, possibly combined with changes of copyright or ownership; this history must be acknowledged in the Description, Owner and IPR statements in Metadata. Whenever a data provider receives a dataset already containing derviation data, it will put these unchanged into previous derivations and add its own data as a new outer layer. Thus the outermost Derivation is the most recent (immediate) one, the innermost the first. Usually this contains only a single node! The history is not an array, but the recursion or Derivation within Derivation! However, multiple earlier derivations may be present if information has been merged. Example: SDD descriptions are enriched with images created by a geography server and based on ABCD collection data. Datasets should be kept separate whereever possible, e. g. in the case of specimen data from multiple collections. [ATTR: datetime] When did it occur? Date and time (UTC or local time with timezone information) at which the current document or data stream was created by the generator. Using UTC (Universal time coordinates = Greenwich mean time) is recommended. The data in this Dataset are guaranteed not to change until this date. No guarantee is given after this date and a cache should be refreshed. If the provider cannot guarantee that the data will not be changed until a future date, this attribute should be omitted. === Meta data about the entire data collection from which the data set was derived: Metadata referring to the principal source of the entire data collection (thus the metadata scope may be wider than the objects actually contained in the data set). If a history of the data collection (revised or expanded in various projects or at different institutions) exist, this must be reflected in the IPR statements and possibly in the list of Owners. Language-specific header information [ATTR: language] Language-independent expressions of limited geographical, taxonomic, etc. scopes. In the case of projects in progress, 'scope' may define the planned or intended, rather than the achieved scope (or coverage). Compare also Coverage in Description (which is language-specific). (Items from Scope may be added to DC.Coverage) A data collection may have a limited geographical scope. Example: 'Germany', 'Austria'. A data collection may have a limited class scope (biology: taxonomic scope). Example: 'Hymenoptera' Information in the entire dataset may come from these (printed or digital) publications. Note that if data are not just copied from publications into independent descriptions, but revised and combined with expert knowledge, SourcePublications should not be used. Such a process creates an independent new work and the publications are only cited in the descriptions). Number and date of current version The major version number ('1' in 1.2) as defined by the content creators. An optional minor version number ('2' in 1.2) Unconstrained text specifying status + optional number, e. g., 'beta', 'alpha', 'rc/release candidate', 'internal'. If missing, release status is assumed. Citable 'publication date' of the current version (comp. RevisionData/ Initiation- and LastRevisionDate for version- independent dates). This date must be missing if the current version is not yet published! (= DC.Date.issued; http://purl.org/dc/terms/issued) Note: currently no mechanism exist to record the date of the first version release. Is this needed? Creators, Revision status, and dates of the entire data collection from which the current dataset is derived. Entities having legal possession of the data collection content. Owners are defined only for the entire data collection, not for individual descriptions etc. (= http://www.loc.gov/ marc.relators/own) Copyright, terms of use, license and other IPR-related statements like disclaimer or acknowledgement. Giving a copyright statement and a (if possible public) licence is highly recommended! (=DC.Rights) [ATTR: language] A globally unique ID-string, distinguishing a data collection (which may be identical or larger than the current dataset) from all others. The value should never be changed once it has been introduced. To refer to objects within the dataset from elsewhere, this value is combined with the object. If you don't have this, it will be difficult to compare versions of data collections. Recommendation: Avoid choosing simple names that are likely to be used multiple times ('plants', 'French bees', etc.). Authors working at research institutions that allow to use their name as permanent identifiers (even if the author stops working there), may use institutional-URI/personal-or-team-name/ data-collection-label (example: xyz.de/hagedorn/coelomycetes). Note that this is only an identifier and does NOT help to locate real web resources. Language-specific content metadata (title, description, etc.) with *required* Language attribute added. A short, concise title. Does not support any formatting! (= DC.Title) General Note on DublinCore translation: In addition to those that can bee transformed from UBIF metadata, an additional DC.Type='dataset' should be added. Free-form text containing a longer description of the project. (= DC.Description) Free-form text describing geographic, taxonomic, or other coverage aspects of terminology or descriptions available in the current project. (= DC.Coverage) Optionally an image media resource containing an icon/logo symbolizing the project. [ATTR: ref] URL pointing to an online source related to the current project, which may or may not serve an updated version of the terminology or descriptions. === Proxy data objects (representing external resources) and references to these objects: Collections of non-abstract data proxy elements, forming an interface to potentially existing more object representations Class (biology: taxon) names used in the project. Each proxy object contains a name - either locally defined or representing an external resource defined in a linking mechanism and defines a local id attribute that may be referred to multiple times from within the project. Biology: Object in a nomenclator [ATTR: id] Optional hierarchy (= tree, biology: taxonomy) of classes defined above. A hierarchy may be incomplete, i. e. some ClassName object may not be in the hierarchy. ClassHierarchies may be locally defined or represent an external source. Biology: Taxonomic hierarchy, or arbitrary set of taxa. [ATTR: id] Units are physical objects (biology: specimens) that are collected, described, or observed. In biology a collected object is often called a specimen. Biology: Object in a collection (= specimen) or an observation. Units may be identified or assigned to a Class name. [ATTR: id] Documentation of persons/organizations involved in the authoring, compiling, editing, etc. of the data set. @@ The specific elements are only a preliminary sketch, this should be synchronized with TDWG ABCD! [ATTR: id] Publications used in the project, defined through proxy objects (= local or external link, see under Agents). Printed or digital publication (including database source) [ATTR: id] Geographical locations (often country names, but potentially on any level), defined through proxy objects (= local or external link, see under Agents). An example of an external gazetteer referred to is the TDWG Geography standard. [ATTR: id] Resource definitions containing links like URLs or actually embedding the resource (e. g. encoded images). These are proxy objects (= local or external link, see under Agents). [ATTR: id] Measurement units like mm-square, °C, ml, pH, and dimensionless scaling factors like %, promille. [ATTR: id] Abstract base type for proxy objects representing external resource objects (publications, class names, specimens, etc.). Provides a free-form label (this may be locally defined and the only data item if no external object is available) plus an ID-based link to an external object. Human readable representation. This may be the only data item if no machine readable ObjectLink exists. Example for a publication: "Smith 1998. Flora of Erehwon, XY Publishers." Even if an external ID exist, the Label is required. It preserves the semantics of the proxy object (= keep interpretable by humans) even if the machine-readable object links are broken. Label should be updated automatically (without human control) only after a human decided that the semantic management of an external object provider can be fully trusted. Some Labels like scientific taxon names or publication references can be expressed more or less language-independent, others like geographic names are always language dependent. @@Discussion neccessary: language type is currently extended with neutral and unknown codes ('-', '?'), is this necessary?@@ The Abbreviation element provided is not necessary for all proxies, but especially useful for class names (e. g., for tabular reports) and publication abbreviations (author/year style). Defines an ID of an external object or one to several services providing it. The format in which the object is returned is undefined and needs to be interpreted by the receiving application. Ideally, common standards (TDWG, MARC, etc.) should be used. === Class names (biology: taxon names): Used for class names (biology: taxon names). Provides a locally defined simple free-form text plus an optional link to an external resource object. This may be changed to allow entering a structured form of taxonomic names (Genus/Higher taxon, rank, optional specific/infraspecific epithets, authors). However, note that simply splitting into taxon name and authors does not work, because authors may be in the middle of the parts of the taxon name (e. g. in botanical autonyms). Currently the development of the TDWG taxon names standard should be awaited first. Note that Class names are not restricted to accepted names (also referred to by Synonyms in ClassHierarchyNode type) Extensions of ProxyBase specific to ClassNameProxy For biological taxonomic names: order, family, species, etc. Derived from an enumerated value list. This element needs to be interoperable; formatting often depends on specific ranks rather than on relative place in the hierarchy alone. Defines an element with a ref attribute pointing to a ClassName in ExternalDataInterface (in biology: Class = Taxon) Refers to a class name (biology = 'taxon'; ExternalDataInterface/ClassNames/ClassName) A collection of ClassRef type elements Reference to a class name (in biology = taxon name) defined in ExternalDataInterface/ClassNames [ATTR: ref] === Class hierarchy (biology: taxon concepts): Used for class hierarchies (taxonomies) Extensions of ProxyBase specific to ClassHierarchyProxy For example, SDD supports taxonomic (order/family/genus etc.) and non-taxonomic (weed species, diseases, herb/shrub/tree) hierarchies. For many analytical purposes it is relevant whether a hierarchy is based on phylogenetic (= evolutionary) relatedness or whether it is an operational categorization. Note: a conventional taxonomic hierarchy should be considered phylogenetic until proven to be not. Root of the recursive tree A node in a class hierarchy tree (biology: taxonomical hierarchy) A node either contains a class reference (biology: taxon) and optionally (if it is a higher level class) further child Nodes, or it is anonymous and contains only further child Nodes. Nodes may not be empty. (The complex choice/sequence expresses the A, or B, or A and B constraint which is difficult to express in xml-Schema.) The class (biology: taxon; with optional synonyms) that identifies the node. Refers to a class name (in biology a taxon name) [ATTR: ref] Rather specific to biology: Taxa above rank of species have a lower taxon by which they are typified. Rather specific to biology: Taxa of species rank or below have a physical unit (specimen) by which they are typified. Collected and preserved unit(s) (biology: specimens) by which the name is typified. (The expression of synonyms may be essential for reports and to convey the concept of a class to information consumers.) If class identification is present, further nodes are optional. The class identification may be missing, but then further Nodes are required. A collection of objects with ClassHierarchyNode type Defines an element with a ref attribute pointing to a ClassHierarchy. === Units (biology: specimen, 'Objects' in earlier versions of SDD): Used to define objects that are collected, described, or observed (collected objects may be preserved permanently in a specimen collection). In biology a collected object is often called a specimen. Provides either a simple free-form descriptive label ('so-and-so in freezer 14, with tag 1233'), or a link to an external collection unit. Note that the term 'Unit' as used here has no relation to 'measurement units' or 'organization units'. Extensions of ProxyBase specific to UnitProxy @@ SomeElementsAnalyzedBySDD: These are just the preliminary elements identified by SDD to be necessary as local extensions. A decision needs to be made, compare the DWC-based present in an alternative interface group! @@ Identification of specimen object. The information may come from the service provider. If the service provider only provides a name, this must be compared with and if necessary added to the list of ClassNames so that a ClassName reference may be used here. This may point to a higher taxon (family, order, or even "plantae") to indicate incomplete, broad identifications. [ATTR: ref] Default is 'certain'; 'Abies cf. alba' would be recorded as 'uncertain'. False = object has not been collected and preserved (it may still be databased in an observation database and have an ExternalID!). The default for this element is true, i. e. if the element is missing the object has been collected/preserved. ### To be decided! Extensions of ProxyBase specific to UnitProxy This is derived from DarwinCore, "version 1.25 2003/05/24 11:14:24 John Wieczorek", but in a first attempt tried to rework into structures compatbile with UBIF usage. The following is not yet a serious proposal, just a basis for further work. Most likely this is too rich at the moment for a simplified interface... DarwinCore 'core' fields A description indicating whether the record represents an object or observation (e.g., tissue sample, living organism, voucher specimen, germplasm/seed, genetic information, etc.) The code (or acronym) identifying the institution administering the collection in which the object or observation record is cataloged. No global registry exists for institutional codes; use the code that is "standard" in your discipline. This attribute must contain no spaces. The code (or acronym) identifying the collection within the institution in which the object or observation record is cataloged. This attribute must contain no spaces. The alphanumeric value identifying an individual object or observation record within the collection. It is highly recommended that each record is uniquely identified within a collection by this value. It is also recommended that each record is universally uniquely identified by the combination of InstitutionCode, CollectionCode and CatalogNumberText. The name(s) of the collector(s) of the original data for the object or observation. Date in which the object or observation was collected from the field. Each part of the date may be missing. (= DarwinCore: YearCollected, MonthCollected, DayCollected, VerbatimCollectingDate) ATTR: year = four digit year; month = two digit month of year; day = two digit day of month An identifying string applied to the object or observation at the time of collection. Serves as a link between field notes and the object or observations. An identifying string applied to a set of objects or observations resulting from a single collecting event. Notes taken in the field for the object or observation, or a reference to such notes. The combination of all geographic elements less specific than locality. "Like" query operations on this element will search for a substring that might be in any of the higher geography elements. The full, unabbreviated name of the continent or ocean from which the object or observation was collected. The full, unabbreviated name of the island group from which the object or observation was collected. The full, unabbreviated name of the island from which the object or observation was collected. The full, unabbreviated name of the country or major political unit from which the object or observation was collected. The full, unabbreviated name of the state, province, or region (i.e., the next smaller political region than Country) from which the object or observation was collected. The full, unabbreviated name of the county, shire, or municipality (i.e., the next smaller political region than StateProvince) from which the object or observation was collected. The description of the locality from which the object or observation was collected. Need not contain geographic information provided in other geographic fields. Geographical coordinates (decimal longitude/latitude) of the location from which the object or observation was collected. Includes geodetic datum and an optional verbatim text representation. The upper limit of the distance (in meters) from the given latitude and longitude describing a circle within which the whole of the described locality must lie. Use NULL where the uncertainty is unknown, cannot be estimated, or is not applicable (e.g., because there are no coordinates). The minimum and maximum altitude in meters above (positive) or below (negative) sea level of the collecting locality. (= DarwinCore.MinimumElevationInMeters / MaximumElevationInMeters A text representation of the altitude in its original format in the source database. The minimum distance in meters below the surface of the water at which the collection was made; all material collected was in this range. (= DarwinCore.MinimumDepthInMeters / MaximumDepthInMeters A text representation of the depth in its original format in the source database. A reference to the methods used for determining the coordinates and uncertainties. This includes DarwinCore GeoreferencingMethod and GeoreferencingReferences The extent to which the georeference has been verified to represent the location where a Cataloged Item was collected. The name(s) of the person(s) who applied the currently accepted ScientificName to the object or observation. The date in which the unit (specimen, observations, strain, culture, animal) was identified as having the ScientificName. A standard term to qualify the identification of the object or observation when doubts have arisen as to its identity(e.g., "cf.", "aff.", "subspecies in question", etc.). The name of the phylogenetic kingdom in which the object or observation is classified. The name of the phylogenetic phylum (or division) in which the object or observation is classified. The name of the phylogenetic class in which the object or observation is classified. The name of the phylogenetic order in which the object or observation is classified. The name of the phylogenetic family in which the object or observation is classified. The full name of the lowest level taxon to which the object or observation can be identified (e.g., Family, Genus, Genus+" "+SpecificEpithet, Genus+" "+SpecificEpithet+" "+SubspecificEpithet, etc.). The name of the genus in which the object or observation is classified. The specific epithet of the scientific name applied to the object or observation. The subspecific epithet of the scientific name applied to the object or observation. The author of the ScientificName. Can be more than one author in a concatenated string. Should be formatted according to the conventions of the applicable taxonomic discipline. A list of one or more nomenclatural types (including type status and typified taxonomic name) represented by the object (e.g., "holotype of Ctenomys sociabilis. Pearson O. P., and M. I. Christie. 1985. Historia Natural, 5(37):388."). Does not apply to observations. Free text references to information not covered elsewhere (e.g., URLs to specimen details, photographs, publications, etc.). DarwinCore Curatorial The sex of a biological individual represented by the cataloged object or observation (e.g., male, female, hermaphrodite, gynandromorph, not recorded, indeterminate, transitional - between sexes, for sequential hermaphrodites). The age class, reproductive stage, or life stage of the biological individual (e.g., juvenile, adult, eft, nymph, etc.) referred to by the catalog number. A concatenated list of preparations and preservation methods (skin, skull, skeleton, whole animal (Ethanol), slide, etc.) for the object. Includes tissue preparations (frozen, EDTA, etc.). Does not apply to observations. GenBank Accession number(s) associated with the biological individual(s) referred to by the cataloged object. A list of previous or alternative fully qualified catalog numbers for the same object or observation, whether in the current collection or in any other. The fully qualified identifier (InstitutionCode+" "+CollectionCode+" "+CatalogNumberText) of the related object or observation, preceded by the nature of the relationship (e.g., "(sibling of) MVZ Mamm 1234"). The current disposition of the cataloged item (e.g., "in collection", "lost", "voucher elsewhere", etc. Free text comments accompanying the object or observation record. DarwinCore Microbial Fate of the isolate between isolation and deposit in the present collection. The backward sequence of deposits is used separated by "<" meaning "received from". Each entry may contain the name of the collection, (month and) year of the acquisition. Between parenthesis can be entered: strain designation or collection numbers (only when confusion is possible between two or more numbers from the same collection) and/or a name when a name change has occurred. Example: [in Bacillus sphaericus DSM 488] NCTC, Nov. 1973 (Bacillus loehnisii) < T. Gibson, 1935 < Kral Collection (Bacillus probatus) Name of the Depositor The date in which the unit (strain, culture, animal) was deposited in the collection. Substrate from which the strain was isolated (soil, water, blood, leaf, etc) Name of the person perfoming the isolation into pure culture Method used to isolate the strain Any specific conditions related to cultivation and maintenance of the strain such as culture medium, atmospheric and light conditions, temperature, etc Names of chromosomal markers of the strain Type and parent of mutant if strain is a mutant strain Name of the race of the strain and authors of the race Name of the alternate state of the strain and authors of the alternate state Any specific properties of the strain (enzyme production, metabolites production, degradation, etc) Any specific applications that the strain may have, such as in bioremediation, inoculants, biologic control, etc Hazard group, pathogen class, plague type Any specific disease that the strain may cause Defines an element with a ref attribute pointing to a Unit (biology: observation or specimen) defined in ExternalDataInterface. @GH@: Discuss whether to add a separate element for collection abbreviation (cached information form provider or from Refers to a Unit object identifier (biology = 'specimen') Extension of UnitRef with a required type status attribute (NomenclaturalTypeStatusOfUnitsEnum) The type status of a unit (biology: specimen). See the enumerated type for further information. === Publications, references, and citations: Used for resources like publications, laboratory notes, speeches, etc. Provides either a simple free-form text, or a connection to an external resource. Extensions of ProxyBase specific to PublicationProxy @@GH: Two proposals for publication-specific extensions of the proxy base data. Both have advantages and I can imagine either solution. The important thing would be to select a common solution for SDD, ABCD, TaxonNames, LinneanCore, etc.! GENERAL Note: Some parts of publication representations are already available as proxy base data. These are: - unconstrained text form as commonly found inpubliched referende (i.e. not atomized belongs into the Label. - URL location of the article on the web and DOI (digital object identifier) can be found in ObjectLinks) Extensions of ProxyBase specific to PublicationProxy This structure is based on the Linnean Core proposal and checked against the DiversityReferences and ReferenceManager(TM) data structures. It would provide a relatively satisfying full structure usable in the absence of other literature management systems. Note: Many aspects of reference managers such as keywords, abstracts, availability, or reference types are not supported in the current data interface. However, they may be added and managed inside the generic extension mechanism, see "CustomExtensions" above, . @@Open question: How to reference a software? Year as appearing on the publication. Compare TruePublicationDate below. Effective date of publication; may be different from year stated on/in the publication. Important for taxonomic or other priority. [ATTR: year (required), month, day (optional). Typed as gYear, gMonth, gDay; note that gMonth requires '01' instead of '1' for 'Jan.') Series of books or articles (the latter may be published in edited books. journals, or on the web). Series title Series editors Printed book: monograph or edited book with articles Book title (monograph or edited book) Book creators are authors if Book is used alone or in combination with Series or Chapter, but editors if used in combination with article Volume or part in a series Total range of pages, including foreword, appendices, index and plates/figures. International Standard Book Number Number of the edition of a book. Publisher, reprint year, note, etc. for historical books that are reissued. Periodical/magazine /journal information We really need BPH and TL2 as standard dictionaries to drive these titles Standardized abbreviated form of title International Standard Serial Number Publishers of a book, periodical, or independently published article. The name of the publisher (publishing company or institution, including universities or scientific societies). The location where the item being referenced was published, such as a city and state. Articles may, e. g., be published in periodicals, edited books, the internet. Volume of periodical (empty if article appears in edited book) Part or issue of a periodical volume (empty if article appears in edited book) Pages of article. This may include table, or figure numbers for the reference. Examples: '23-41', '341 pp.', or '20, 22-24, 32' (for non-consecutive pages). Optional information about a chapter, section, etc. that has the same authors as the publication in which it is contained. Compare Article for authored chapters in edited books. Number of chapter, section, etc. as used in the publication. Pages of current part ('22-34') Extensions of ProxyBase specific to PublicationProxy This structure is less satisfying in the absence of a literature management system, but it provides some atomization helpful in finding or filtering local proxy data and in associating locally recorded data with external databases at a later time. For article, chapter, or monographic book the authors, for an entire edited book the editors. The editors of the book in which an chapter appears are not listed here, but as part of the Source text string. Title of the immediate publication (i.e. title of authored chapter, but not of source book or journal). Year as appearing on the publication. Compare TruePublicationDate below. True date of publication, especially if different from stated year. Important for taxonomic or other priority. All remaining information, including Periodical/Volume for articles, or edited book for articles and chapters in a book, with the exceptions of the separate Pages (see below). Pages of article. This may include table, or figure numbers for the reference. Examples: '23-41', '341 pp.', or '20, 22-24, 32' (for non-consecutive pages). International Standard Book Number. @@Although this is an ideal key, this element may be dropped from the selective structure! Only very few references are covered by entire books with ISBN. Articles in journals are far more frequent and it would be more valuable to be better support those. Defines an element with a ref attribute pointing to a Publication (ExternalDataInterface/Publications/Publication) A collection of elements of PublicationRef type. [ATTR: ref] --- The following types build on the PublicationProxy infrastructure: Combines a publication resource reference with a detail location within that reference (esp. page number) Refers to a publication as defined under ExternalDataInterface/Publications [ATTR: ref] Location within publication where the cited data can be found: Page, table, figure number, database record, html document bookmark, etc. (Note: this is not the page range of the entire article!). If publication is a non-persistent web resource that may change or disappear, the date at which the citation was verified to be appropriate should be recorded. It may later be updated, but not through a link checker verifying only technical access: the semantics of the citation have to be verified! If publication is a non-persistent web resource that can not longer be verified, the date it was found to have disappeared (or became semantically inappropriate) may be recorded. A collection of Citation-type elements === Agents (persons, organization, software agent): Used for Agent documentation (an Agent is a person, project, organization, or software agent). Currently used for authors, editors, contributors, and translators. Ideally it connects to an outside definition or documentation of the Agent. Extensions of ProxyBase specific to AgentProxy (The Agent-specific proxy extension is partly modeled after elements defined in vCard 3.0 and Jabber, see http://www.jabber.org /jeps/jep-0054.html.) (Mostly vCard:Org) Full organization or corporate name in multiple languages (en: 'Botanical Garden of ...', de: 'Botanischer Garten von ...'). (vCard:Org.OrgName) The standard Label mechanism also supports acronyms/abbreviations (no vCard equivalent!). For collections, the organisation abbreviation maps to Darwin Core 2: Institution Code. If Agent contains no person definition: the unit within the organization the agent represents, else a list of the various organisational units to which a person may belong. (vCard:OrgUnit) (vCard:OrgUnit) (There is no equivalent to vCard:FN/full name here, this is already covered by proxy Label above). For the problems involved in atomizing names from different cultures compare http://dublincore.org/documents/1998/ 02/03/name-representation/ See also http://efgblade.cs.umb.edu/twiki/bin/ view/SDD/ProxyDataAgentProxy on our own WIKI. @@ To be decided before schema can be published! @@ The full name in preferred sorting sequence, i. e. with main name first. Use case: sorting, reporting in sorted lists. Examples: 'Duarte, Amália Mourinha' (pt), 'Pina de Morales, Ana Maria' (es). (vCard:Sort-String) Professional or academic title of individual person (prefer using Role for job titles!) (vCard:Title) Enumeration of male, female, unknown (vCard no equivalent) Birthday of person. (vCard:BDay, may include time) Death date of a deceased person. (vCard: not surprisingly no equivalent) (Software agents are not handled by vCard!) (Software agents probably need to be extended in future versions.) Role of Person or Organization in context. This element can be used to provide a title such as "Database Administrator" or "Curator" even when no individual person is named. (vCard:Role) @@Note gh: I see a problem with the unparsed address proposals in the original ABCD model and in two of the alternatives presented here, in that the Label for the Agent often requires the addition of city/country to disambiguate multiple agents with the same name (vCard:Adr) @@ To be decided before schema can be published! @@ Telephone/fax/modem numbers (vCard:Tel) [ATTR: number = should be provided in the ITU Recommendation E.164 international format ("+CountryCode AreaCode Number") (vCard:Tel.Number) ATTR: devicetype = voice, fax, mobile, pager, modem (identical with vCard:Tel.Voice etc.; if several are on a single phone number list the phone number with each device type!) ATTR: usagenote = free-form text for constraints on use e. g. "weekdays only" or "home number" (partly: vCard:Tel.Home/Work flags) ATTR: preferred = preferred number, may occur multiple times for different device types (vCard:Tel.Pref)] E-mail addresses (vCard:Email) E-mail address for contact (vCard:Email.UserID; this also has Home/Work flags not supported here) [ATTR: preferred (vCard:Email.Pref)] URI pointing to a homepage with further information. Note: If the Agent has a permanent URN representation, it is expected in ObjectLink in the base type. (vCard:URL, vCard supports only 1 URL) URL for person or organization [ATTR: preferred] URL of logo or icon image; usually of organization but may also be used by a person. (vCard:Logo) (Note: vCard:Note maps to Annotation in the base type!) ### To be decided! PROPOSAL 1: Atomized structure Family names, generational names, clan name, parents/grandparents personal names, etc. This (= last name in western cultures) may be compound ('Fischer von Waldheim', 'da Selva', 'Silvano Morales'). Depending on culture it is not necessarily the name of the parents nor common to the married couple and children, thus 'family name' should be avoided even though used in vCard. (vCard:N.Family) Prefix to name that should be output before name, but is usually not included in sorting. Examples: 'Prof.', 'Dr.', 'von', 'Lord'. (vCard:N.Prefix) Suffix to name that should be output after name, regardless whether it is in sorting sequence (Inherited, Given) or not. Examples: 'Jun.', 'III.'. (vCard:N.Suffix) The name given to a person as a personal name (= first or christian name in western cultures, including 'middle initials') may contain several words ('Ana Maria', 'Jerry B.'). Applicable only to persons. (vCard:N.Given + vCard:N.Middle) May differ from the first given name: second given name, nickname ('Bob' for 'Robert'), etc. (vCard:Nickname) ### To be decided! PROPOSAL 2: Name-variant structure @@ Seq. temporarily made optional @@ Preferred version of complete name in forward sequence as defined by the culture of the name-bearer. Use case: reporting. Examples: 'Maria Amália Mourinha Duarte' (pt), 'Ana Maria Pina de Morales' (es), A version of the name in forward sequence used in informal usage. Use case: reporting. Example: 'Bob Morris' for 'Prof. Dr. Robert Morris', 'Amália Mourinha Duarte' (pt), 'Ana Pina de Morales' (es). ### To be decided! Proposal 1: ABCD-style single string Contact address. Each element should be one address; do not use multiple elements for each line! (vCard:Adr.POBox + .ExtAdr + .Street + .Locality + .Region + .PCode + .Ctry) [ATTR: language, preferred (vCard:Pref)] @@vCard defines further attributes: Home/Work, Postal/Parcel, Dom/Intl Also, vCard atomizes the address, see proposal 2 below. Perhaps at least the country should be specified in ISO 2-letter codes? ### To be decided! Proposal 2: Similar to ABCD-style, but using UDDI-style address lines Contact address. (vCard:Adr.POBox + .ExtAdr + .Street + .Locality + .Region + .PCode + .Ctry) [ATTR: language, preferred (vCard:Pref)] Address line ### To be decided! Proposal 3: model following the atomized vCard fields 1:1. (vCard:Adr.POBox) (vCard:Adr.ExtAdr) (vCard:Adr.Street) (vCard:Adr.Locality) (vCard:Adr.Region) (vCard:Adr.PCode) (vCard:Adr.Ctry) Abstract base type for AgentRef and MicroAgent. The ref attribute is optional here! Reference to a Agents (ExternalDataInterface/Agents/Agent) Provides a minimalized local Agent definition together with an optional Agent reference (ref attribute). In principle this is derived from AgentRef, but to properly do it Person or role name (e. g., 'head of departement') (voice phone) Defines an element with a required ref attribute pointing to an Agent (ExternalDataInterface/Agents/Agent) Makes the optional base type attribute required. A collection of AgentRef-type elements, i. e. Agents forming a team like an author team. (The sequence of elements in instance documents is informative!) [ATTR: ref] --- The following types build on the AgentProxy infrastructure: Extension of AgentRef with a role attribute and three attributes recording object-specific contributions. The first time an agent (creator or contributor) has edited/made a contribution to an object. If a creator has contributed both as an author and later as an editor of data, two references in these two roles will exist and the contribution dates will be recorded separately. The number of contributions by a specific agent (editing, revising, adding to an object). A collection of RichAgentRef elements. (The sequence of elements in instance documents should be preserved. Within each role it is mandatory. Different roles may, however, be reported in separate sequences.) [ATTR: ref, role, ...] Restriction of RichAgentRef to Creator roles only. Collection (sequence) of Agent elements of type CreatorRef (The sequence of elements in instance documents should be preserved. Within each role it is mandatory. Different roles may, however, be reported in separate sequences.) [ATTR: ref, role, ...] Restriction of RichAgentRef to either Creator or Contributor (but not Owner) roles. Collection (sequence) of Agent elements of type CreatorContributorRef (The sequence of elements in instance documents should be preserved. Within each role it is mandatory. Different roles may, however, be reported in separate sequences.) [ATTR: ref, role, ...] Restriction of RichAgentRef to Contributor roles only. Collection (sequence) of Agent elements of type ContributorRef (The sequence of elements in instance documents should be preserved. Within each role it is mandatory. Different roles may, however, be reported in separate sequences.) [ATTR: ref, role, ...] Restriction of RichAgentRef to Owner roles only (contribution attributes prohibited). Collection (sequence) of Agent elements of type OwnerRef (The sequence of elements in instance documents should be preserved. Within each role it is mandatory. Different roles may, however, be reported in separate sequences.) [ATTR: ref, role] --- Note: A modeling problem is that in instance documents Agents within a role are usually ordered (sequence), but different roles not (authors+editors = editors+authors). UBIF 1.0 until beta 14 (available on WIKI!), attempted to solve the problem by introducing a 2-layer collection with Creators/AgentRole[@role='aut']/Agent[@ref]. Now this has been abandoned because it introduced too much complexity. --- types related to Agent references: A collection (seq) of name strings, used for publication authors or editors and for collectors, i. e. whenever the identity of an Agent is doubtful and can not associated with an Agent without doubt Authors or Editors expressed only as string, e.g. in publications where the identity of creators can often not be discovered. Optionally, the ref attribute may refer to an agent if the relation between string and Agent can be assessed. (The sequence of elements in instance documents is informative!) [ATTR: ref] Reference to a Agents (ExternalDataInterface/Agents/Agent) RevisionData (creators, dates, revision) for the entire project/data set or individual objects. If RevisionData exist at all, at least one creator(author or editor) is required. (= DC.Creators) General contributors, or translators. (= DC.Contributors) @@Request for discussion: Translator-Contributors are currently not listed on individual Representation elements. Only a general statement about all translations together can be made. Should this be changed? Also: should one Representation be marked as 'Original/ SourceForTranslation'? @@ Date/time when the intellectual content (project, term, description, etc.) was created. Applications may initially set this to the system date for new data objects, but authors must be able to change it to an earlier date if necessary. If for legacy data this is imprecisely known, it may be missing here. Earlier versions in other data formats should then be mentioned in the copyright or acknowl. statements. (= DC.Date.Created) Date/time when the last modification of the object was made. If in online data sources the provider can not assess this, the current date/time may be substituted. For legacy data this may be set to the file date of imported data, or estimated. (= DC.Date.Modified) === Geography: Used for resources like geographical names or places. Provides either a simple free-form text, or a connection to an external resource. @@ Problem: in contrast to class names, and even publications, locality names are necessarily language-specific! Extensions of ProxyBase specific to LocalityProxy Geographical coordinates (decimal longitude/latitude) of the location from which the object or observation was collected. Includes geodetic datum and an optional verbatim text representation. Defines an element with a ref attribute pointing to a Locality (ExternalDataInterface/Geography/Locality) A collection of LocalityRef-type elements. The sequence of elements in instance documents is semantically irrelevant and may be changed. Reference to a locality defined in ExternalDataInterface/Geography/Locality [ATTR: ref] === Media (especially images, audio/video): Extends resource proxy type with optional encoded data content (esp. images embedded in xml document) and with a Type (Image/Audio/Video, etc.). Extensions of ProxyBase specific to MediaResourceProxy Type of medium, based on DCMI Type vocabulary (= DC.Type) An optional caption for a resource, esp. if it will be presented embedded in another document. Captions can be provided in multiple languages. Differs from the resource Label, wihich is closer related to a 'title'. @@ Issue: captions, even in multiple languages, may be obtained from the service provider. Even then it may be desirable to override them! Do we need two collections: InheritedCaption and CaptionOverride? This seems to be awkward whenever there is no ServiceProvider! Also, Label can contain a "title" only in a single language! @@ Creators, Revision status, and dates for the media resource Entities having legal possession of the data collection content. Owners are defined only for the entire data collection, not for individual descriptions etc. (= http://www.loc.gov/ marc.relators/own) Copyright, terms of use, license and other IPR-related statements like disclaimer or acknowledgement. Giving a copyright statement and a (if possible public) licence is highly recommended! (=DC.Rights) [ATTR: language] Optionally the full resource data may be embedded (as an alternative or in addition to defining a URI). Note: A resource like an image should be directly encoded, i.e. not wrapped into a MIME object first. Defines an element with a ref attribute pointing to a MediaResource (ExternalDataInterface/MediaResources/MediaResource) A collection of MediaResourceRef elements. The sequence of elements in instance documents is semantically relevant and should be preserved. (the sequence in instances is informative!) [ATTR: ref] [Not yet used] A media resource element embedded in a group is provided solely to allow reuse together with the necessary identity constraints for the ref attribute. Limitations of xml Schema prevent the definition of identity constraints on the MediaResourceReftype itself. (the sequence in instances is informative!) [ATTR: ref] === Measurement units: Provides an extensible definition mechanism for measurement units like meter, mm, µm, liter/litre, °C, m/s, etc. May also be used dimensionless scaling factors like %! Label contains a language/culture- specific long form of the measurement unit, e. g., 'liter' (en-us) or 'litre' (en-uk) for 'L.' Label and InternationalAbbreviation text allow some xhtml formatting to support, e. g., "mm2". Note: "International Standard ISO 31 (Quantities and units), 1992 may be relevant here, but it seems not available online. Printed version: ISO Standards Handbook: Quantities and units. 3rd ed., International Organization for Standardization, Geneva, 1993, 345 p., ISBN 92-67-10185-4, 182.00 CHF. A useful online resource is http://hem.fyristorg.com/ojarnef/fys/ metric-units-comp.txt Extensions of ProxyBase specific to MeasurementUnitProxy A scientific abbreviation considered language and audience independent. It may contain formatting to express "mm2". Note that the Abbreviation element available in most label types does not support formatting! True if unit is SI unit or a derived unit acceptable in scientific publications. False for local/historical units like feet and velocity in fathoms per fortnight :-). True indicates that unit should be output before the value (as in 'pH 7.0'). Default is false. Describes relations to other units that can be expressed through a simple multiplication factor (i. e. not cubic meter = meter * meter * meter, or Celsius to Fahrenheit) @@ Do we really need multiple relations or is a single relation to the base unit sufficient? @@ Ideally the relation should always be defined towards the base unit, e. g., km, cm, mm, µm all to meter. Multiply current unit with this factor to obtain related unit referenced above. Refers to a MeasurementUnit (attribute ref is required) Abstract base type for MeasurementUnitRef and MicroMeasurementUnit. Here the ref attribute is optional! ref refers to a measurement unit id (Terminology/General/MeasurementUnits) Provides a minimalized measurement unit identified through a local (and presumably international) abbreviation - together with an optional Measurement Unit proxy reference (ref attribute). A scientific abbreviation considered language and audience independent. It may contain formatting to express "mm2". === Public objects carrying a key also generally provide for developer annotations/comments (undefined language), version extensions for future versions of UBIF, and custom extensions (= "application annotations"). === Key/ref infrastructure for linking within a data set: This allows to define (and redefine) the value type for keys and keyrefs Note: the use of attribute groups instead of globally defined and referred attributes is a work-around for problems occurring with attribute definitions in included library schemata. The use of global attributes by ref caused validation or namespace problems, even though this library has no target namespace (chameleon pattern); Spy 2004.4 says, e. g., ... attributes that need to be qualified because your schema uses attributeForm = qualified or global attributes. You must specify a prefix for your schema namespace. === Options to link using URLs or GUID + resolving mechanisms (used especially for UBIF data proxies): The object linking mechanisms used by the ProxyBase type may also be used by other objects! LifeScience ID (without the constant prefix 'urn:lsid:'). 3 to 4 parts separated by colon, the 1st part is the url of a life science authority service that provides metadata on how to obtain the object references in part 2 (namespace = data collection), 3 (object ID) and 4 (optional object version). Example: lsid.gbif.org:DataCollectionID:ID/1§31~b+:v2 Digital object identifier (an ID scheme advanced by the library community). A URL directly providing an object representation. In contrast to the URN types LSID or DOI this should resolve directly. The URL may be a query string (with ID embedded), for example: "http://x.y.fr/pub/au=smith?yr=1998". In the case of URLs multiple definitions may be defined to reduce the likelihood of failure. [The element sequence in instance documents is informative and should be preserved.] === Basic type library: === Basic generic types normalized string required to contain at least 1 character (this removes the xml string anomaly, i. e. either element/attribute may be optional, but if they are required the content may not be an empty string) normalized string restricted to 1..50 character length to be used for abbreviations (the recommended length of abbreviations is usually much shorter, but 50 characters should be a normalized string restricted to 1..255 character length (i. e. required, may not be empty string) Double precision numeric value in the range of [0..1] Derived string type with restricting patterns Compare LSID, this omits the prefix 'urn:lsid:' Digital Object Identifier (standalone, not embedded into URI syntax) === The following Range, Date, and Coordinate types describe frequently recurring simple type combinations in a element with attributes -- Element with 2 attributes to define a range: Lower and upper value as required attributes (no default values) Lower and upper probability value as required attributes (no default values) Contains lower/upper estimate attributes; used, e. g., for certainty and frequency! The default values are 0 and 1, indicating that no estimate was possible. -- Types for composite gregorian calendar date/time (points in time where parts may be missing; following the seven property model described, e. g., in xml Schema 1.1 (http://www.w3.org/TR/2004/WD-xmlschema11-2-20040716/#theSevenPropertyModel). Instead of gYear, gMonth, gDay integer types with constraining facets are used for two reasons: a) each of them may have a timezone, which may lead to inconsistent data with multiple timezones; b) the lexical representation seems to be occasionally poorly implemented (e.g. where '31', or '---5' are accepted, whereas valid examples are '---31', '---05', and '---05+02:00'). In addition to the seven property model additional text attributes for either unsharp additions or complete verbatim dates are added. Note that incomplete dates in most cases are calendar specific and incomplete non-gregorian dates can not be expressed. Furthermore, for complete dates it may be unclear whether a reformed or unreformed date has been used (e.g. in Russia in the 19th century). Date separated into attributes so that any part of the date may be missing [ATTR: year = four digit year; month = two digit month of year; day = two digit day of month verbatim = unparsed textual date representation supplement = text additional or modifying the exact dates, e. g., 'end of summer', 'first half or year', 'first decade of month', '1888-1892'. timezone = expressed as integer according to the xml schema seven parameter model The four digit year in the Gregorian calendar (in Western cultures usually without a suffix or with 'AD/Anno Domini', 'CE/Common Era'; negative years with 'BC/Before Christ', 'BCE/Before Common Era'). Whether a year 0 is used or not differs between a true Gregorian calendar and recent astronomic usage, xml schema is likely to change its position, see xml schema draft 1.1. Thus database designers should not use 0 as a missing value representation for year. two digit day Text in addition to or modifying the exact date components, e. g., 'end of summer', 'first half or year', 'first decade (of month)', '1888-1892'. An uninterpreted text representation of the original date information (date range, 'summer', perhaps unreformed Russian dates, etc.); as close as possible to the (digital/printed/handwritten) information source. Timezone expressed in minutes. In the seven property model (http://www.w3.org/TR/2004/WD-xmlschema11-2-20040716/#theSevenPropertyModel) the timezone has a range of +/- 14 hours (14 * 60 = 840 minutes). Date + Time separated into attributes so that any part of the date may be missing. [ATTR: see CompositeDate type, plus: time] '24' may only occur if both minute and second are zero (http://www.w3.org/TR/2004/WD-xmlschema11-2-20040716/#theSevenPropertyModel). The normal range should be 0-59, but 60 may occur for UTC leap-seconds (http://www.w3.org/TR/2004/WD-xmlschema11-2-20040716/#theSevenPropertyModel). An additional validator may choose to validate this. The simplest validation would attempt to convert those Composite date instance that containing all seven elements to a xs:dateTime value. -- Types for geographical coordinates Latitude of geographical coordinates in decimal degrees (i.e. 30° 30' would be expressed as 30.5) Longitude of geographical coordinates in decimal degrees (i.e. 30° 30' would be expressed as 30.5) ATTR: latitude, longitude (in decimal degrees), geodeticdatum (esp. if different from a Greenwich-based datum). Longitude is expressed from -180 to 180°, East longitude being plus and West longitude being minus. Where knowledge of the geodetic datum is readily available it should be passed on. However, in most situations no undue resources should be invested into researching the geodetic datum when this is unknown. Many geodetic datum systems result in differences only up to a 100 m, some up to several hundred meters. For many purposes in biodiversity sciences are acceptable. The 'World Geodetic System 1984 (WGS-84)' is the most commonly used geodetic datum. It is used, e. g., by the 'Global Positioning System (GPS)'. Other important systems are used (e. g., ITRF, ETRS89, NZGD2000, OSGB36, ED50, see also http://www.ncgia.ucsb.edu/education/curricula/giscc/units/u015/tables/table03.html or http://www.colorado.edu/geography/gcraft/notes/datum/edlist.html). The differences between WGS-84 and International Terrestrial Reference Frame (ITRF) are in the centimeter range worldwide, and ETRF 89 and NAD 83 are identical to WGS84 for Europe and North America, respectively. -- As an exception to what has been said above are historical coordinates (for most countries up to ca. 1900, much later for France) may be based on a prime meridian other than Greenwich/Airy (e. g., the NTF datum uses Paris as its prime meridian, 2.33723° east of Greenwich). An uninterpreted text representation of the coordinate data (latitude/longitude, UTM, TRS, etc.), as close as possible to the (digital/printed/handwritten) information source. === Various complex types Three attribute provide options to express sex as code (enumerated vocabulary), free-form text (perhaps interpreted), or verbatim (uninterpreted original version). At least one attribute should be present; this can not be validated by the schema. Controlled vocabulary to express sex status for clinical human or biological purposes. The string present in the source database, either in addition to or instead of code (especially no mapping to the controlled vocabulary has been implemented yet, or if a specific value can not be mapped. This differs from verbatim in that it claims no special status and may contain any amount of interpretation relative to the original source (e. g., a specimen label) An uninterpreted text representation of the original sex information; as close as possible to the (digital/printed/handwritten) information source. Telephone, fax, etc. number ATTR: number = should be provided in the ITU Recommendation E.164 international format ("+CountryCode AreaCode Number") (vCard:Tel.Number) ATTR: devicetype = voice, fax, mobile, pager, modem (identical with vCard:Tel.Voice etc.; if several flags apply to a single phone number list the phone number multiple times!) ATTR: usagenote = free-form text for constraints on use e. g. "weekdays only" or "home number" (partly: vCard:Tel.Home/Work flags) ATTR: preferred = preferred number, may occur multiple times for different device types (vCard:Tel.Pref) Numbers should be provided in the ITU Recommendation E.164 international format ("+CountryCode AreaCode Number"). Note that telephone device types are not necessarily exclusive (voice/fax, mobile/modem, etc.) and vCard 3.0 allows multiple for a single number. However, in UBIF this can be represented by adding a single number multiple times for each device type. This attribute should not have a default value voice, even though this is the most likely case. However, an exporting database may not have properly reported the type, or the type may be indicated only in the usage note. Free-form text for constraints on use e. g. "weekdays only" or "home number" (partly: vCard:Home/Work flags) === Extension of xs:language and a reference element using Language Union of xs:language with '-' for language-neutral (e.g. scientific names) and '?' for unknown. Language follows RFC 3066 'Tags for the Identification of Languages': a two-letter code taken from ISO 639 part 1 or a three-letter code taken from ISO 639 part 2, followed optionally by a two-letter country code taken from ISO 3166. (Notes: When a language has both a two-letter and three-letter code, use the two-letter code. RFC 3066 replaces RFC 1766.) Defines an element with a required 'language' attribute Complex types that add attributes 'language' or 'preferred' to the simple types String, String255, anyURI: Note: the use of attribute groups instead of globally defined and referred attributes is a work-around for problems occurring with attribute definitions in included library schemata. (single 'language' attribute) Attribute for Language, used by-reference (single 'language' attribute) Attribute for Language, used by-reference (single 'preferred' attribute) Elements with preferred = true indicate recommendation by the data provider. The consumer may have reasons to make a different choice. Note on current usage: these types are used by ABCD and UBIF, but not by SDD (which uses mostly audiences instead of language) String (i. e. xs:string with minimum length=1) extended with *optional* language attribute String255 (i.e. xs:string with length 1-255), extended with *optional* language attribute String (i. e. xs:string with minimum length=1) extended with *optional* preferred attribute String255 (i.e. xs:string with length 1-255), extended with *optional* preferred attribute String (i. e. xs:string with minimum length=1) extended with *optional* language and preferred attributes String255 (i.e. xs:string with length 1-255), extended with *optional* language and preferred attributes xs:anyURI extended with *optional* Preferred attribute === Some text data support limited xhtml. (Could appropriate elements from xhtml be imported and encapsulated here?) Collection of language-specific label representations Language-specific label representation [ATTR: language] Language-specific simple label, using simple formatted text Label text in a specific language. Restricted to 50 characters maximum length, including blanks (recommended to be shorter!). Label abbreviations are especially important when displaying information in a tabular format. Collection of language-specific label representations Language-specific label representation [ATTR: language] LabelRepr with short inherited Text extended with longer Details text. Optional text of unconstrained length, elaborating details of the ShortText Text with primary language plus multiple optional translations; used, e. g., in PublicationProxy type. A string, e. g. the title of a publication, having a single primary language. [ATTR: language] Translations from the primary language [ATTR: language] === Statements are a special form of complex text expressions Text, optional Details (both free-form text) and optional URI. A concise representation of a statement (copyright, acknowledgement, etc.). Recommended to be as short as possible, but actual length is unconstrained. Optional text of unconstrained length, elaborating details of the ShortText An optional resource on the net providing details on the statement (may be used as an alternative to the long text). A sequence of various intellectual property right (= IPR) statements, with a language attribute on the entire sequence. Other forms of IPR declaration not yet covered (e.g., database rights); also used in cases where an automatic converter can not decide whether a statements is copyright, licence, etc. Copyright may include the information that the data has been released to the public domain. To be used if data are placed under a public license (GPL, GFDL, OpenDocument). Placing data under a public license while maintaining copyright is recommended! (= DC.Rights.Licence; new 2004) Defines conditions under which the data may be analyzed, distributed or changed. "Terms of use" includes concepts like "Usage conditions" and "Specific Restrictions". Disclaimer statement, e. g. concerning responsibility for data quality or legal implications. A free form text acknowledging support (e. g. grant money, help, permission to reuse published material, etc.) === The following types are currently unused (August 2004), but may be used in the future or by other standards. === Enumerations to support interoperability. THE ANNOTATIONS ARE HERE REMOVED, please open the full files or see the provided html documentation! Internal formatting note: Annotations of individual enumerated values should be written as ^"short label" + " -- " + "detailed information". An xslt transforms such schema annotations into a data document that can directly be used in user interfaces.