TDWG    

Overview of available schema versions and documentation (UBIF and SDD 1.0)

TDWG working group: Structure of Descriptive Data (SDD)

Introduction

In preparation of the Christchurch, New Zealand SDD (TDWG October 2004) meeting, UBIF and SDD schema1 versions have been released 60 days before the meeting (UBIF 1.0 beta 14 and SDD 1.0 beta 1). In the days after the release several minor and one major corrections have been made, so that here the versions UBIF 1.0 beta 18 and SDD 1.0 beta 2 are documented. It is recommended to base the discussion on these versions. The purpose of this phase is to invite thorough review, preferably with the aim that fewer changes than previously have to be made and that the final 1.0 is useful to all parties planning implementations. After the meeting in Christchurch a final version of 1.0 should be presented.

A major problem of SDD is its complexity. First it needs some metadata on the level of the data collection or project. Then it provides a cross-domain linking infrastructure (to refer to publications, taxon names, taxon hierarchies, specimen units, geographical names, or measurement units). Finally it deals with four separate but closely related descriptive tasks: Definition of terminology, application of terminology to descriptions in a strictly data-oriented way (coded descriptions), application of terminology as markup for natural language descriptions, and stored/manually authored identification keys.

An ongoing process currently tries to unify the cross-cutting requirements of various biodiversity schemata (SDD, ABCD, Taxon concept) into an Unified Biosciences Information Framework (UBIF). The UBIF schema is used by SDD for the metadata and infrastructure parts. The UBIF files will be included when you download any version of SDD (but the UBIF schema can also be downloaded separately, see "Material available for UBIF" below). The SDD schema discussions should concentrate on the SDD parts, but naturally any comment on the UBIF parts is welcome.

To further focus the dicussion, "reduced" versions both of UBIF and SDD have been created. From these some of the more technical schema aspects have been removed. Furthermore, any part with names starting with "__" (indicating ongoing discussion processes or undecided proposals that are not vital for a first version). In the SDD schema, furthermore only the Coded Description method (including the DELTA and NEXUS-like SummaryData and the new SampleData) have been left in the schema. Both NaturalLanguageDescriptions and IdentificationKeys have been removed from the reduced version. Anybody interested in these parts is most welcome to comment on these features, however!

How you can help depends on your past experience and technical expertise:

Material available for UBIF 1.0 (beta 18):

Format Full development version2 Single-file version3 Simplified version4
w3c XML schema zip packages UBIF10b18.zip (ca. 150 kB) UBIF10b18_
CompleteSingleFile.zip
(ca. 63 kB)
UBIF10b18_Reduced.zip (ca. 46 kB)
w3c XML schema xsd files
(also contained in zip above)
UBIF.xsd, UBIF_EnumLib.xsd, UBIF_TypeLib.xsd (ca. 372 kB) UBIF_
CompleteSingleFile.xsd
(ca. 372 kB)
UBIF_Reduced.xsd
(ca. 188 kB)
Example instance documents
(also contained in zip above)
UBIF-TestFiles.zip
Enumerated values5
(please review!)
Html tables and data XML  
XML Spy documentation
(elements and annotations displayed as diagrams)
1.2 MB html + 380 images = 2.7 MB total 1.0 MB html + 360 images = 2.2 MB total
Titanium xs3p documentation6 (not generated) 795 kB html
Supplementary schema report
(useful for experts/implementers)
(not created) UBIF complete (96 kB html) UBIF reduced (78 kB html)

Material available for SDD 1.0 (beta 2):

Format Full development version2 Single-file version3 Simplified version4
w3c XML schema zip packages SDD10b02_with_
UBIF10b18.zip
(ca. 274 kB)
SDD10b02_CompleteSingle
FileWithUBIF10b18.zip
(ca. 107 kB)
SDD10b02_
ReducedWithUBIF10b18.zip
(ca. 64 kB)
w3c XML schema xsd files
(also contained in zip above)
SDD.xsd, SDD_RelationIDs.xsd (ca. 299 kB) SDD_CompleteSingle
FileWithUBIF.xsd
(ca. 669 kB)
SDD_Reduced.xsd (ca. 250 kB)
Example instance documents
(also contained in zip above)
SDD-TestFiles.zip (= SDD-Test-Min1.xml, SDD-Test-Tech.xml, SDD-Test-X-ID.xml)
Enumerated values5
(please review!)
Html tables and data XML  
XML Spy documentation
(elements and annotations displayed as diagrams)
2.6 MB html + 835 images = 6.3 MB total 1.1 MB html + 401 images = 2.5 MB total
Titanium xs3p documentation6 (not generated) 944 kB html
Supplementary schema report
(useful for experts/implementers)
SDD base (no UBIF) (123 kB html) SDD complete (212 kB html) SDD reduced (92 kB html)

General hints

If you use a static html documentation, start browsing at the root element "UBIF:Datasets" or "SDD:DescriptiveData". In the documentation generated by XML Spy you can click on the elements and types both in the text and in the diagrams to jump to the next level of detail.

In the full development version some type or element names start with a double underscore, e. g. "__OOP_PolymorphicModifierDefs". These are not considered functional and are present only for the purpose of reminding us that we need further discussion. They may be ignored, unless you want to raise a discussion of the issue, which will be most welcome.

Please also note that double '@@'-characters have been used in annotations to mark problematic points that need further discussion. At the end of the "Supplementary schema report" listed above you find a table of places where such annotations are found in the schema. The list has been automatically generated using xslt, so it may not completely comfortable, but it is usable.

How to contribute

UBIF: Please add to existing topics or start a new topic on the UBIF Wiki.

SDD: Please send your criticism or suggestions to the SDD mailing list or start a topic on the SDD Wiki.

Gregor Hagedorn; Vers. 1.0; 14. August 2004


Footnotes

1 What is a schema? Although it is written in XML, the SDD schema is not about the things that SDD describes, but about grammatical constraints on how those descriptions must be structured. In general, an XML schema describes the syntax of XML instance documents. A document conforming to a schema is said to be "valid" under the schema. Opening schema files as plain text makes no more sense than to master Latin grammar before reading any Latin texts. However, if the schema is opened in a schema modeling tool like Altova XML Spy (or viewed in a report formatted for web browsers generated by such a tool), it can be viewed as an information model (similar to ER diagram in database modeling). We have tried to find intuitive names for the various data elements we want to include in the descriptive data and we use annotations to clarify the meaning. Occasionally the annotations even include examples. Therefore, browsing through the schema can provide you with insight how we propose to structure descriptive data in biology.

2 Full development version: The complete version includes all features proposed or under discussion (marked with leading underscores). This is the preferred version to study if you have participated in the discussions and have a schema tools like xml spy. All others should also download this version to get the sample files and other accessory information only contained in these versions.

3 Single-file version: These files combine all parts of the respective schema into a single file. Thus the file "UBIF_CompleteSingleFile.xsd" contains "UBIF.xsd", "UBIF_EnumLib.xsd", and "UBIF_TypeLib.xsd". The file "SDD_CompleteWithUBIF.xsd" contains "SDD.xsd", "SDD_RelationIDs.xsd", and the "UBIF_CompleteSingleFile.xsd". Viewing these huge combined files in a schema browser may make it difficult to find the Only the GenerationMetadata, the ProjectDefinition, and the Resource interface remains. For the most part, these are very general definitions that may be useful to all kind of projects. If you are experienced in data exchange management, the federation or modularization of data sources, etc. and are willing to help us, please take a look at these issues!

4 Simplified / reduced version: Please see the file "HowReducedXSDsDiffer.txt" that is included in the zip file for a list of features that have been removed. Note: If you are trying to convert DELTA, NEXUS or Lucid LIF data, you may try to base your program on this version alone.

5 Schema enumeration type tools: ("Data xml" is created from schema, "Html tables" from data xml; see UBIF-EnumerationTools.zip for the xslt code.)

5 Titanium xs3p documentation: In contrast to the Altova xmlspy documentation, this schema documentation contains no diagrams. However, it has a good short overview of types with example instance code and provides better documentation of type inheritance. Furthermore, only this documentation provides the links to external documentation present in annotation "source" attributes (which unfortunately are ignored by XML Spy).



Return to the SDD starting page.

First published 2004-08-11, last updated: 2004-08-19.

Valid XHTML 1.0! Valid CSS1! Viewable With Any Browser