TDWG working group:
Structure of Descriptive Data (SDD)

Minutes of the SDD meeting in Paris, France, 13-16. February 2003

(Version 1.0)


Summary

The meeting in Paris (organized by Nicolas Bailly and Guillaume Rousse: many thanks to them!) was originally planned for three days. However, a small discussion with those still present was added on Sunday. A total of 11 people participated; among them were Yde de Jong and Yuri Roskov with the aim of exchanging ideas with the projects they represent (ENBI, Fauna Europaea, EuroCat, and ILDIS).

During the previous SDD meeting in Brazil (2002), the SDD group moved from example documents with autogenerated schemata (as in Canberra/Sydney 2002) to the design of an xml-schema itself. The work on this "straw man" in Brazil continued throughout the TDWG and GBIF meetings in Brazil, and many tentative decisions were discussed with only two or three people. The current meeting in Paris offered a welcome opportunity to discuss and revise the schema draft. Consequently, the focus of the meeting in Paris was less to expand the scope of SDD to new, problematic areas, but rather to remove conceptual or technical xml-problems that were present in the Brazil schema. The result is a much improved and considerably changed "straw man" schema (version 0.61) to start experimental coding. Note that no example instance document is available for this version of the schema and that all experimental xml identity constraints have been removed because they were no longer functioning.

[Editorial note: they are full reintroduced and hopefully all tested in the post-Lisbon schema, version 0.9, which also has an accompanying example instance document.]

The next steps should be:

The following list gives an overview about open issues and road map of what future versions of the SDD schema are expected to contain:

We will try to meet in the autumn at TDWG in Portugal again [editorial note: minutes of that meeting are available], and hope that the developers of major applications for descriptive data will be able to come. The plan is to meet for 3 days before the main TDWG meeting, and reserve an extra day after the meeting to discuss points that may still have been left open after the material originally discussed has been reviewed and some time was available to make changes to the schema.


Table of Contents

Thursday, 13. February 2003
1. Overview of model, IPR attribution issues
  1.1 "Expert knowledge"
  1.2 Anonymous data
  1.3 Item container with multiple descriptions and meta data
  1.4 Definition of the item container
  1.5 A simpler model
2. Other Resources
3. Attribution, credit or acknowledgment for contributions and work on the project ("meta data")
4. Issues of versioning and recording user actions
5. Interoperability
6. Aggregating ("compiling", "collating", "summarizing") data for species by geographic parameters
7. Repeated measurement/observations, fact versus knowledge

Friday, 14. February 2003
1. Shared ("global") versus local state definitions
2. Relations between characters
  2.1 Calculated characters
  2.2 Character transformations
  2.3 Relative statements
  2.4 Homology
3. Use term "Measures" or "Statistics"?
4. Measurement method and accuracy
5. Presentation by Jean-Marc Vanel
6. Presentation by Régine Vignes-Lebbe about modifiers
  6.1 Frequency modifiers
  6.2 Probability and likelihood
  6.3 Reliability: uncertainty as a result of trust in the researcher
7. Complex challenges for character hierarchy

Saturday, 15. February 2003
1. Definition of descriptive data
2. Frequency modifier (continued)
3. Revision of item definition (= what is being described) and meta data

Sunday, 16. February 2003
1. Technical schema discussion: Type derivation
2. Naming and structure of the audience-dependent elements (label and wording)
3. Position and typing of keyref attributes


Thursday, 13. February 2003

Participants

Nicolas Bailly (MNHN Paris, ENBI, EuroCat, FishBase)
Robert Bossy (LIS, Univ. Paris 6)
Yde de Jong (ENBI, EuroCat, Fauna Europ.)
Cyril Gallut (LIS, Univ. Paris 6)
Gregor Hagedorn (Berlin)
Robert Morris (Univ. Boston, USA)
Yuri Roskov (ENBI, EuroCat, ILDIS, Reading, UK)
Guillaume Rousse (LIS, Univ. Paris 6)
Guillaume Sauvenay (LIS, Univ. Paris 6)
Jean-Marc Vanel (France)
Régine Vignes-Lebbe (LIS, Univ. Paris 6)

Agenda for the following days

Overview of model, IPR attribution issues

The discussion started with a short presentation by G. Hagedorn of the current model. It soon got caught at the point of attributing the work to information sources and persons compiling, editing, or authoring data.

In LIS (Laboratoire Informatique et Systématique, Paris) the bibliographic information, specimen information, and pers. observations/expert knowledge are handled as three different databases. This is similar to the design of the DiversityWorkbench with DiversityReferences, DiversityCollection, DiversityDescriptions. This structure was chosen for practicality and management purposes; reference data sources can also cite specimen so that these cases can co-occur. The linking between the different data sources is not formalized at the LIS.

"Expert knowledge"

The information source "expert knowledge" or "personal opinion" was discussed. Currently it is proposed to treat this as a reflexive relation to the xml document itself, i. e. identifying that the SDD document is the manuscript itself. It is unclear, however, whether this is explicit (filling data in the bibliographic reference part) or implicit (implied if the reference part is empty) [unresolved question].

Anonymous data

Also it was discussed whether it should be possible to enter or provide data anonymously. Two variants exist: The author is truly unknown, or the author prefers to be unnamed. In the first case it should be possible to manage the data. The latter case is more difficult, given the current changes in IPR (which are friendly to industry and unfriendly to science). Without an identified data source, any person distributing the data may become the target of a database copyright law suit. Anonymous data are therefore not recommended. Anonymity does not have to be built into the schema design. Even if the current schema requires project authors or editors, it is always possible to enter the string "anonymous" to provide anonymity. More important than requiring an author or editor, is to require a copyright statement and to recommend using a public license like GPL for documents.

Item container with multiple descriptions and meta data

The straw man 0.5 from Brazil distinguishes between an Item container and its ItemDefinition, and authoring information contained in multiple MetaData elements. The ItemDefinition defines the relations to taxonomic identification, bibliographic information sources, and specimen definitions if a single specimen (or other object) is being described. The MetaData sections are contained within each of the multiple envelopes within the item (NaturalLanguageDescription, CodedDescription, Resource) and attribute the work of compilation, interpretation, or authoring of data in the coded or natural language descriptions.

This setup was considered difficult to understand. The implications of the structure were explored in detail.

We first noted that descriptions that refer to observations are currently inadequately defined. The intention is that they should be handled like specimen references, but the current structure does not make this clear. The element names should perhaps be clarified (Brazil: "CollectionSpecimenID" and "CollectionSpecimenFreeDescription"). Observations may be databased, e. g. floristic databases (with or without voucher specimen) or may be just identifiable by observation date and observer.

[No solution was proposed and no discussion on this in Lisbon 2003. However, I have added a tentative "IsPreservedInCollection" boolean element to the DescribedObjectConnectorType in the schema version 0.9 published after Lisbon.]

Definition of the item container

Returning to the general structure of "Item/ItemDefinition/multiple Metadata", we discussed that most people would find it intuitive to have a taxon reference as the uppermost "envelopes", followed by further more detailed containers or envelopes. However, we confirmed that it is difficult to turn the various information sources cited and the identification/name reference into any hierarchy, since any combination is possible:

Case Name SpecimenRef. Publ. Ref.
 1    x       -          -     Capturing Expert knowledge about a taxon
 2    x       x          -     Describing a specimen
 3    x       -          x     Recoding (and interpreting!) data from a publication
 4    x       x          x     As above, the publication cites a specimen for this description
 5    -       -          -     This should be prevented (not possible with xml schema?)
 6    -       x          -     As above, unidentified specimen (frequent case if used for ongoing work!)
 7    -       -          x     As above, rare case, e. g. happens in phytopathology, where information
                                 must be distributed even if identification not yet possible.
 8    -       x          x     As above, rare case

Putting name/identification in a primary envelope makes description about unidentified objects difficult to manage. Furthermore, the name may change based on information from the specimen reference. That is, in the specimen management system, the specimen may have been reidentified, this information is polled by the descriptive database system and the manager accepts the decision. All descriptive information should now be treated as belonging to the other name.

It should be noted that Taxon Name, Specimen reference and Publication reference have overlapping purposes. Two identify the object that is being described, and two have a function equivalent to citations of information sources in a scientific article:

                   Definition of object        Citation of source                
                   that is being described       of information
  Taxon Name                 x                       -
  Specimen Reference         x                       x
  Publication Ref.           -                       x

Specimens are preserved in natural history museums like publications are preserved in libraries. However, the citation of a publication and the citation of a specimen as information sources are slightly different, since in the first case interpreted data are reinterpreted, whereas for the specimen the interpretation is done by the current authors itself. However, both kinds of "citation" are important if a verification of scientific conclusions or results is necessary, e. g. during a revision of a taxonomic group.

Unclear cases: [unresolved questions]

In Brazil and Australia the "item" was designed as a virtual object, combining a unique combination of taxon name reference, specimen reference and publication reference. The item is a container for multiple descriptions (both multiple coded and multiple natural language descriptions). The advantage of this construction as perceived at the previous meetings is that these multiple descriptions are held together structurally, rather than by value comparison. For example, this makes it easier to know that an original natural language description and a coded description derived from the same publication are related. Similarly, multiple studies of the same specimen by different researchers are kept together by this model.

In contrast, the information for a single taxon name will in most cases be present in multiple items. To see all descriptive information for a single taxon (e. g. all specimens described, or descriptions from multiple cited sources) the application has to aggregate/generalize (= "summarize", = "collate") the data through value comparison based on the taxon name or ID.

The different strategy regarding taxon names and specimens/publications has been guided by practical considerations. It is already, (or will be soon with the advances in GBIF) reasonably possible to make automated value comparisons for taxon names (or preferably for unchanging numeric IDs for them). In contrast, comparing two specimen references or two publications, entered by multiple collaborators in a descriptive data project, or even by a single person at different times, is very difficult and highly likely to fail.

However, besides the unintuitiveness perceived throughout the discussion in Paris, the Brazil model has further problems:

A simpler model

The consequence is that although it may be useful to combine related descriptive data in an item container, this cannot be guaranteed and is in fact likely to be often not the case. A system that detects the relation between descriptions in all three dimensions (taxon name, specimen or publication reference) using value comparison would be more consistent and ultimately easier to understand.

Within the coming years we consider it unrealistic to rely on external systems to uniquely identify specimens or publications. Unique specimen IDs provided by collections (e. g. where specimens are bar coded or unique accession numbers are available) are still rare and free-form specimen description not safe to compare using value comparisons. Similarly, databases of publications are limited in scope, and usually only modern literature is contained in commercial reference database. A functional system for descriptive data, however, must be able to handle all possible specimens and publications. The option to include fully structured specimen or publication databases within the SDD standard was not seriously considered ...

The solution we are proposing now is to provide an interface within the SDD schema that can buffer the relations to external data sources, but can provide locally unique identifiers usable for value comparisons.

As a first step, the references to taxon name, specimen, and publication are structurally very similar tuples of: "ProviderForID", "ID", and "human readable free description". The latter can be provided through ID (thus being a cache of a human readable description provided by a service), or, if no machine readable ID is present, can be used directly as free-form text. The tuples were restructured into a separate data type "Resource connector" that serves as an intermediate layer between information in the SDD document and information in other information areas.

(The exact mechanism through which the interaction with providers (through database access, direct http get/post, or xml-SOAP over http requests) will happen must be explored separately. The current ResourceConnectorType should then be modified as required! Open Issue!)

In a next step, the resource connectors were moved from the item definition to a document-wide resource section (in the root of the xml document) which provides resource connectors for taxon names, specimens, publication, and media resources. (Note: this decision was discussed and finalized on Saturday, but is largely reported here.)

The resulting model could now be simplified. The "Descriptions" elements contains an unlimited number of two types of description container elements: either a "NaturalLanguageDescription" or a "CodedDescription". Each "Description" relates to a tuple of taxon name, publication and specimen references. It is the decision of an application whether it tries to keep these tuples unique or not (that also depends on the way the application tracks editing changes!). In a federated system multiple Description elements with identical definition may exist, and aggregated/summarized information can be reported without further problems.

The abstract object "item" is no longer used. In previous versions of the SDD model we kept "item" because we did not have a better name and because it has been used for many years in the DELTA standard. However, the term suggests a thing, and this "reification" (treating a concept like a real thing) caused confusion in understanding previous SDD versions.

Gregor: If a natural language description is digitized, partial markup is added, and it is then a copy is converted into a fully coded description, from which a new natural language description in multiple languages has been generated, each of these descriptions is an independent object that may potentially be stored in different federated databases. The relation between these descriptions must be discovered by the application through value comparisons of the resource connector keys. To make the system functional in a federated context, the resource connector lists would have to be exchanged frequently and would profit from being implemented as replicated data/databases.

Other Resources

After the decision to centralize the taxon names, specimen and publication resources, two more resources were added to the Resources sections (reported here, decisions were made on Saturday):

As a result, resources now contains all objects or concepts that are not truly descriptive terminology.

A note regarding Contributors: it is important to distinguish between user data for access verification and access control, and user documentation. Contributors is not a list of active users of a system (although an application may extend the contributor list using application specific properties for such purposes), but a documentation of the intellectual contributions of person to the data set.

Finally, we discussed the problem that a single state may have multiple contributors. The problem of importing non-SDD-sources could be dealt with by citing bibliographic source instead of always providing multiple authors/workers at each data item. Gregor was against multiple contributors for a single atomic item, most others were in favor of it. Gregor was concerned of the burden for system developers and the need for the biologist to deal with it where it should be hidden from her or him and handled by the application.

Attribution, credit or acknowledgment for contributions and work on the project ("meta data")

For the data of the "item definition" (= new: Description) as a whole, somebody is the creator, author, or reviser. One description is one action or "activity" of describing. Meta data should be recorded to document who did something and when. It is unclear whether the description as a whole is a natural metadata container or not. Part of the information is the definition of the description ("header information", i. e. Internal Notes plus the relations to taxa, specimen, publication), another part are the statements made in the the coded or natural language description itself ("body information"). Do we need separate IPR-Metadata on these two parts? IPR could further be recorded on an additional level, e. g. sets of characters, where each set could cover one to all characters in a description.

Different types of authorship exist for coded and natural language descriptions.

Bob and Guillaume proposed to allow metadata on any object and let the application decide what to do with them. Alternatively, Bob proposed a property on objects may be present which decides whether they have metadata or not. In Gregor's opinion this only works with an xml document view, but not if the SDD format is intended for loss-less data transfer between applications that have their own, independent data storage system.

Bob and Guillaume's proposal for a mechanisms for acknowledgments such as IPR and Contributor:

In case 2, we might also define (perhaps) all elements to derive from a type that has an optional id

We discussed the use of versioning systems for computer software (CVS) to record contributions. CVS uses document/file level versioning, which is of intermediate granularity between project and atomic changes (e. g. on character or states). Most participants present felt that IPR should be recorded at the character level (rather than on the more atomic state/measure/modifier level). Gregor thinks that the state level may even be more suitable, because it avoids problems when the terminology is later revised, perhaps splitting a character into two characters and moving subsets of states into each. The lowest level of IPR recording could also be application specific, i. e. not covered by interoperability.

An important aspect of IPR recording, versioning, and the question what is internal project management and what is an interoperability issue, is the editing metaphor used when thinking about projects.

But: We agreed that we do need mechanisms to revise data and perhaps to mark data as to be ignored. Revision is important, although it should be an integrated, decentralized and collaborative process, similar to the collaborative contribution and accumulation process. Also, the "final editing envelope" should not be neglected, and for many types of project it is the most important or perhaps even the only relevant IPR record. As a consequence, the ProjectDefinition should be appropriate to contain sufficient information to serve as a "final editing envelope".

See also the separate document "Acknowledgment and documentation of intellectual contributions".


As an aside: What if somebody provides a photograph for a project for which she or he is not the author? One option would be a bibliographic source citation for direct contribution. Other mechanisms necessary?

Note by Bob: Resource definition has no IPR in it (= Problem)!


Issues of versioning and recording user actions

(Gregor attempted to clarify the discussion on the next day in the following proposal. This material from Friday is shown here to keep it in the context:)

  1. Collaboration is an issue of SDD, but on-line collaboration or concurrent multi-user working techniques are not. Issues like locking documents or parts of documents, checking data in or out of a versioning system (e. g. a CVS), transaction management or implementation of database replication are not issues for SDD data exchange. To perform/conduct interoperable data exchange using the SDD format, a descriptive data project must be in an exclusive state without open locks or transactions.
  2. Securing and encrypting data to prevent unauthorized access or data manipulation is not an issue to SDD. Manipulating data will always be possible in a non-encrypted xml document.
  3. The documentation of the intellectual contributions of users working on the system is an issue for interoperability. This does not imply that security IDs need to be transferred, but the basic documentation of who made which intellectual contributions should be preserved when the data are moved from one application to another.
  4. The date and time of contributions and alterations by the same person are currently not preserved in the general SDD format. A fully logged system, allowing to trace the history of any changes of the descriptive data over time may be desirable for some applications. In the current SDD process it is, however, considered an application specific property not subject to interoperability demands. If demand becomes greater and multiple implementations of this feature exist, a future version of SDD may deal with this problem. Checking xml-based SDD documents into a versioning system (e. g. CVS) may be a good implementation independent solution for tracking changes over time.
  5. To some extent, management considerations are so basic that it is desirable to keep some management data interoperable rather than . This refers, for example, to revising and checking descriptive data. Also, this is both a management item and a piece of information to the end user, helping in establishing the level of trust in a particular piece of information. Other management tools may be left to application specific data.
  6. Versioning of the entire project is provided in the project definition.

The current project-based attribution schemes (as present in DELTA) makes it possible to work in small teams where all members know each other well, and where the relative share of the work is agreed upon in advance. However, even middle sized projects need to observe some management practices.

With multiple users, a scheme how to append and add to each others information is relatively easy, since the mechanism is identical to summarizing information from several specimens for a taxon descriptions. However, at the moment the question how to contradict each other is an unresolved issue!


Interoperability

Definition of interoperability (Bob):
  A Exports to B
  B makes no changes
  B Exports to A
  State of A after import is identical
Fulfilling this is trivial if B is only a read-only application (implementing export by returning the import file again) but it should also hold if B is a potential editor.


Aggregating ("compiling", "collating", "summarizing") data for species by geographic parameters

A specimen from Mexico may be different than one from Florida. If the specimens are identified through a specimen resource reference, they can be easily differentiated. However, there is currently no build-in mechanism to report their different provenance. It is possible to document differences in the InternalNotes, but this is not well suited for data summary/collation purposes.

Notes G. Hagedorn: The geographic provenance problem is not unique. It may be useful to group descriptive data by the altitude in which they are collected, by the soil type on which a plant has grown, by the time in the year that the observation has been made (e. g. for seasonal variants like in Araschnia levana L., "map butterfly"). In my opinion these data should either be handled as "character" (although geographic distribution is a "pseudo-character"), or if possible through direct reference to a collection database that stores all specimen related data and allows to group by them. It may be possible to provide in the resource connector for specimen additional "buffering-elements" to store these data. The design should then, however, be as limited as possible.


Repeated measurement/observations, fact versus knowledge

Régine Vignes-Lebbe differentiates between "fact" = single observation and "knowledge" = already synthetic description, taxonomic results already integrated. These kinds of data could be in two sets of items, or they could be in two projects, the structures of which are otherwise identical.

What does "synthetic" mean? Methodological summaries occur between (body length of multiple individual flies) and within organism (length of spores from fungus, leaves of a grass). This is often difficult to distinguish: The flight pattern of a flock of birds is impossible to obtain non-synthetically.

The original assumption was that this distinction in the data also relates to the distinction between individual objects and classes of objects. If the single spore or the single hair on a fly is considered an object, this is in fact true. However, there is no fundamental relation between individual organisms and "fact" versus "knowledge".


Friday, 14. February 2003

Participants

Nicolas Bailly (Paris, ENBI, EuroCat, FishBase)
Robert Bossy (LIS, Univ. Paris 6)
Yde de Jong (ENBI, EuroCat, Fauna Europ.)
Cyril Gallut (LIS, Univ. Paris 6)
Gregor Hagedorn (Berlin)
Robert Morris (Univ. Boston, USA)
Yuri Roskov (ENBI, EuroCat, ILDIS, Reading, UK)
Guillaume Rousse (LIS, Univ. Paris 6, object oriented modeling)
Guillaume Sauvenay (LIS, Univ. Paris 6)
Jean-Marc Vanel (France)
Régine Vignes-Lebbe (LIS, Univ. Paris 6)

Shared ("global") versus local state definitions

It is desirable that SDD should have a consistent policy whether to use keyref mechanisms or allow in-place-definitions: It was agreed that since a reference/keyref. mechanism to a central place (e. g. images, references) is already implemented, it should be used preferentially. If an in-place definition is desired, it is still possible to add it to the central place (even if this creates a duplicate) rather than to allow two alternatives: "by reference" or "in place" objects. This policy was implemented in several places in the new schema version, e. g. by moving all person references in the project definition to resources (Contributors/Agents) and using key-references.

However, this policy was not followed for the definition of local states within a character. The reason for this is that while duplicate references to the same person or publication may be combined during a revision of a project, this should not be done for states. Combining states has serious implications. For example, if two states "oval" are present, one may really mean 3-dimensional rather than 2-dimensional shapes. The decision whether to use global states or not therefore depends on an evaluation of the character context and whether the terminology used is well defined or preliminary. This must be transparent to the user.

The local state definitions may yet be treated analogous to shared/global state definitions, if an attribute "local" or "preliminary definition" would be added, which indicates that this state should only be used in a single character. Although it is globally defined, it would then not be available for multiple reuse.

This should be considered. It has, however, not been decided or introduced into the straw man.

Irrespective of this discussion, the implementation may treat the local and shared sets of states analogous. A local state set definition differs from a shared definition in three aspects:

  1. it is used only once
  2. it has no id/key to identify it to machines (= its key is identical with the character key)
  3. it has no label to identify it in the user interface

These aspects can be controlled in code, using for example in a relational database a single table for state definitions. However, the simplest way to express these conditions in an xml document seems to be to define the local state sets immediately within the character.


Relations between characters

Calculated characters

Often data within a description have logical or algebraic relations between characters. One example are statistics (mean, standard deviation, etc.) based on simple numeric values. Another example (length/width ratios) is discussed in the data challenge "Repeated observations of spore measurements". Many more examples exist, e. g. "length of petals equal/small/shorter than sepals" and petal/sepal measurements. How can knowledge about such relationships be expressed?

A minimal provision would be a recommendation that relations between characters are expressed in plain language. This would at least allow designers revising terminology which they have not developed themselves, to be alerted to the problems.

However, if the information is kept in the normal definitional text:

An improved solution would therefore be to define for each character a structure list of references to all characters it relates to. Each relation could then be explained in plain language, and ultimately also in some algebraic notation.

Note: The calculated character problem is somewhat confounded by the fact that certain modifiers, esp. frequency statements may also be the result of calculations as soon as raw observation data are present...

What could an algebraic notation be? Jean-Marc Vanel proposes to use c++ syntax or to use mathML as expression language. Alternatives could be SQL operators, or basic Java operators. MathML is primarily used as presentation language for mathematical formulas but seems also to be usable as symbolic notation language. Gregor was concerned, however, to specify immediately a complex and complete algorithmic notation language which would be very demanding for applications to support. Currently no existing application support calculated characters, and most biological calculations could be handled with a very limited set of operators and functions.

Conclusion: We recommended to evaluate mathML. Jean-Marc Vanel will research what is state of art about calculated characters and perhaps make a proposal. SDD should define the most important cases of calculated characters. If mathML is found suitable, SDD should be give a recommendation which mathML operators should be supported in basic application.

Note: the problem that some applications may not be able to compute had already been discussed in Brazil and special computed states are provided for this purpose

Character transformations

Related to the issue of calculated characters using truly mathematical operations is the issue of character transformations from interval scale to categorical (DELTA: "Key states"), or mapping between sets of categorical states at different levels of detail (proposed in DeltaAccess). Example for the first case: compare length in mm with categorical states "small" and "large". Example for the second case: code specimen data in a detailed categorical schema ("globose, subglobose, ovoid, subovoid, etc.") but present these data for identification purposes in a simplified way: "globose or ovoid". Note that it should perhaps be possible to map a state to multiple states, i. e. create n : m relations.

Both these issues are probably handled better by special mechanisms than a generic "calculated characters" feature.

Note: In Brazil the set of states that can be used by several characters had been called "global" state definitions. In Paris we tended to call them "shared" state definitions. The terminology remains somewhat undecided and is open for further suggestions!

Note: An important issue that needs further discussion is that in most cases (including statistics, and transformations) it is usual that in one case a character can be calculated based on information entered elsewhere, whereas in other cases the base information is missing (e. g. repeated measurements to calculate mean or a length/width ratio), but the "calculated" character information is available from external calculation. This "duality" probably requires a modifier or other marker similar to those discussed for inheritance. Calculated characters are a case of aggregation or compilation of information within a description, which is similar to the case of compiling/aggregating information from several descriptions into a hierarchically higher one. These discussions needs to be continued!

Relative statements

Another related issue: Relative statements between characters may refer to the same property (e. g. color) and using the same observation method, but may relate to different parts of an object. Example: "mushroom poisoning, patient is unable to describe the mushroom, but explains that the cap was light brown and the stem was darker than the cap."

If both characters use the same set of states (shared/global state definitions), comparisons are principally possible. However, in this example "darker" and "lighter" are perhaps not the order in which the color states are arranged. Color can be ordered in at least three dimensions: hue, saturation, and brightness. Neither does the use of modifiers (colors are pure colors, and color variations are described using "dark" and "light" modifiers) necessarily help: "dark yellow" is not necessarily darker than "light brown"! (Data challenge!)

Knowledge about the comparison in the example is not trivial to define and cannot be implied by application. Nobody present had an idea how to structure descriptive data so that queries for this class of relative statements would be possible.

Homology

Where can homology statements be made in SDD? To use descriptive character data for phylogeny one needs to make statements about relations between characters, esp. whether two characters are homologous or not. Methods to deal with these problems have been developed in the NEXUS format. Bringing the NEXUS community into the SDD process has failed so far. Any proposals how to deal with homology (and thus support features supported in NEXUS) is greatly appreciated.


Use term "Measures" or "Statistics"?

The term numerical "measure" as introduced in Brazil was found unintuitive or misleading by several members present in Paris. It seems that "Measurement theory" includes all types of measuring, scaling, and comparing, especially including categorical data. Importantly, we also realized that some statistics apply to nominal or ordinal categorical data as well (min/max/median and mode, respectively).

Revision of the short list of terms from Brazil:

From the Collins English Dictionary:

statistic (n.)

  1. any function of a number of random variables, usually identically distributed, that may be used to estimate a population parameter. See also sampling statistic, estimator (sense 2), parameter (sense 3).

measure (n.)

  1. the extent, quantity, amount, or degree of something, as determined by measurement or calculation.
  2. a device for measuring distance, volume, etc., such as a graduated scale or container.
  3. a system of measurement: give the size in metric measure.
  4. a standard used in a system of measurements: the international prototype kilogram is the measure of mass in SI units.
  5. a specific or standard amount of something: a measure of grain; short measure; full measure.
  6. a basis or standard for comparison: his work was the measure of all subsequent attempts.
  7. reasonable or permissible limit or bounds: we must keep it within measure.
  8. degree or extent (often in phrases such as in some measure, in a measure, etc.): they gave him a measure of freedom.

function (n.)

  1. the natural action or intended purpose of a person or thing in a specific role: the function of a hammer is to hit nails into wood.
  2. an official or formal social gathering or ceremony.
  3. a factor dependent upon another or other factors: the length of the flight is a function of the weather.
  4. Also called: map, mapping. Maths, logic. a relation between two sets that associates a unique element (the value) of the second (the range) with each element (the argument) of the first (the domain): a many-one relation. Symbol: f(x) The value of f(x) for x = 2 is f(2).
Notes on specific statistics/parameters/measures:
Notes on potential terms:

The preference in Paris was clearly to use statistic as a summary term. However, a value is a statistic only if conclusions from a sample to the population are drawn (parameter estimate). Also, currently the individual data are handled by the same method. In Paris we could not decide on an appropriate model for this; no final decision was taken.

[Editorial note: Discussed in Lisbon 2003, value moved exclusively into ObservationSet = repeated raw data]


Proposal GH: Repeated observations / raw data. Available terms:
  Study/Experiment
  Observations/Observation?
(No decision achieved, discussion moved on.)


Measurement method and accuracy

Guillaume Sauvenay: For morphometric data it is important to know the method to obtain data, error of measure, scale, etc. This is so far not adequately defined in SDD ([OPEN ISSUE]). Gregor: How to express, e. g. measurement accuracy in an analytically accessible way? For example, in a microscopic measurement of spore sizes the lowest resolution is 0.2 with 100 x, or 0.5 with 40 x. However, is this really value plus/minus 0.2, or is it plus/minus 0.1? Also, often accuracy and repeatability are two separate issues, and usually little known. We do need a proposal for this. Some elements have already been defined in the CODATA-TDWG-BioCASE-BioCol_1.38.xsd schema; perhaps it will help us to review them and then continue the discussion.


A discussion was held whether function definitions should be allowable as states of characters. The discussion could not clarify how the functions where to be defined in the terminology and what the use cases for this were. If possible, a data challenge should be submitted to clarify these points.


Afterwards, the discussion about arrays was picked up again. [Editor's apologies: I have insufficient notes to complete the report on this topic...]


Presentation by Jean-Marc Vanel

(Jean Marc wanted to provide a summary)


Presentation on modifiers by Régine Vignes-Lebbe

1. Frequency modifiers

If the object described is a class, it is possible to have a distribution of values because of polymorphism. (Discussion: if object described is an individual specimen object it is still possible to have a distribution, because many objects have multiple "subobjects", e. g. leaves on a plant, where multiple states can co-occur.) The distribution can be described by recording a set of statistics. For nominal data the mode can be calculated, for ordinal and higher (cardinal/interval) scales: Minimum, maximum, median to record distribution. More information is available if frequencies are given, either by wording ("rare", "frequent"), or as relative or absolute values ("10%", "80%").

Note Gregor: we currently only support relative frequencies. Do we need absolute frequencies ("24 x red, 19 x orange") as well?

Frequencies are directly possible for nominal, ordinal, and cardinal data. The result is a histogram. For data on the interval scale ("23.45") data can be analyzed only after partitioning (= conversion of interval to ordered categorical, which can have frequency).

Interval scale measurements are often represented as min, typical range, max. This presentation can be translated into a frequency statement for partitions: "2 (5-10) 15" relates to:

 2- 5 less frequent  (A)
 5-10 more frequent  (B)
10-15 less frequent  (C)

A ranking (order in which frequency terms are more or less frequent) is partly possible: A < B, B > C, but A ? C (with ? = unknown-relation-to). Even more unknown relation occur if a mean is given. "2 (5-) 7 (-10) 15" relates to:

 2- 5 less frequent  (A)
 5- 7 more frequent  (B)
 7-10 more frequent  (C)
10-15 less frequent  (D)

where only A < B, A < C, D < B, D < C, are known, but A ? D and B ? C.

Régines considers a ranking of frequency information (from modifiers, frequency values, or implicit frequency information in statistical range information (as shown above) the most useful method to analyze descriptive data containing frequency values. She proposes that verbal frequency modifiers should be ranked by the designers of the terminology.

A discussion about this ensued. The SDD Brazil model has an indirect method of defining a ranking order of verbal frequency statements by, e. g., "occasionally" versus "sometimes", by defining frequency estimates for modifiers. Ranking is currently not supported in the SDD model.

People may have different lists:
  impossible
  very rare
  rare
  frequent

These different list may partially overlap, but have independent rankings. However, a similar problem exists with estimating the LowerLimit/UpperLimit estimates for frequency in current SDD proposal: Ranking is an automatic consequence, but the basis of frequency estimation is often very weak and consequently the resulting ranking. Estimating frequency ranges is fairly easily possible with new data, but problem with legacy data transcribed from literature.

Currently no way in SDD how good the estimate about the LowerLimit/UpperLimit guesses is, whether they are crudely guessed or based on some good documentation. Informed guesses or wild guesses :-) ?

Agreed that ranking also needs partial order, so ranks may be tied: sometimes and rare may have the same rank. In any case, ranking cannot be restricted to within a set of frequency modifiers, but must be global within a project!

It is important to know what the situation at the time of coding was.

Problem: imported or online coding may occur without knowing the LowerLimit/UpperLimit -> not known at time of coding. New data / old data is then not the difference, but informed coding versus uninformed coding.

Nicolas: What is old situation (DELTA), where do we want to go? We want improved terminology.

A special problem are modifiers "typical" or "atypical": is this frequency statement? probably yes, but more difficult to interpret. The absolute frequency values are difficult to estimate: in a character with 2 or 3 states, typical may mean more than 50%, in a character with 20 states 15 may be called typical (each with a low frequency) and 5 which are very rare called "atypical".

What about: "most frequent", "least frequent"? This seems to be similar to "typical" or "atypical". These are relative statements which may change with the polymorphism/distribution recorded. These statements currently in SDD can only be captured as general modifiers, i. e. analysis software is not informed about special meaning of them.

Are these frequency statements at all? Can they be interpreted as frequency? It is perhaps rather a statement that some state is the mode!

Conclusion: Ranking could either be added if Limits are optional, or automatically deduced if limits are required. Guessing needs to be documented.

2. Probability and likelihood

Either:
doubts in descriptions, not sure which of several alternatives is correct, or
different hypothesis, each hypothesis is a distribution.
= probability of likelihood level.

[Editorial note: Currently already in SDD available: Probability/uncertainty modifiers:
- flower red, perhaps occasionally blue (combination of frequency and probability)
- shape ovate or perhaps obovate (because skill to distinguish may be lacking in person coding)
"Perhaps" could be added because observation was made, but species identification was lacking.
However, the following from Régines' presentation is not possible in SDD:]

(Not possible in SDD:)

  probably:
  (     a       b        c      )
  (   rare  frequent    rare    )
  (    10%      80 %     10%    )
  but perhaps:
  (     a       b        c      )
  (   rare  frequent    rare    )
  (     5%     90 %      5%     )

This is an probability/uncertainty hypothesis, where certainty does not relate to states, but to frequency sets associated with states.

Are these completely different cases or can be these be converted? Example from above:
- flower red, perhaps occasionally blue
which could be expressed as either:

  (   red  blue   )
  (   100%    0%  )
or
  (   red  blue   )
  (   95%    5%   )

However, is this equivalent also if no frequency statement was made?
- flower red, perhaps blue
which could be expressed as either:

  (   red  blue   )
  (   100%    0%  )
or
  (   red  blue   )
  (  <100%  >0%   )

This second case could be

  (   99%   1%   )
or
  (   50%   50%  )
or
  (    1%   99%  )

This is unlikely for red and blue, but quite possible if the uncertainty refers to terminological difficulties, e. g. in the ovate/obovate example.

Conclusion: Probability/uncertainty of hypothesis for frequency distribution are separate from uncertainty regarding presence of character state, which does not necessarily imply a frequency statement.

Gregor: What happens with "perhaps unknown"? Should this be allowed?


  (     a       b        c      )
  (   rare  frequent    rare    )
  (    10%      80 %     10%    )
or:
  (     a       b        c      )
  (   rare  frequent    rare    )
  (     5%     90 %      5%     )

Example: Two different tests run on the same specimen. One option would be to record the average (as if it would have been a single sample), i. e.:

  (     a       b        c      )
  (   rare  frequent    rare    )
  (    7.5%     85 %     7.5%   )

another, to attempt to record data twice as above.

Similar "problem" with numeric data, where such multiple data sets may also exist: Spores measured on different occasions, researcher would like to keep the sets separate, perhaps because some annotation exists. This is unlikely for manuscript of a monograph, where synthesis is required, but quite possible for work in progress. Or perhaps, at the moment one is unable to make a decision, while work is in progress.

Why can probability/certainty statements on distributions (which require repeated work) arise, rather than combining these?
probably:
...
but perhaps:
...

Example: in literature description of a fly, problem of pro parte synonymy, length of spermatotheca. Literature differs from own work, consider own work more probably, but not the other. This would be two items with certainty, which when combined during aggregation (= "collation") and need to be qualified. Is this worth a separate mechanism? Or is it sufficient to handle it with an annotation mechanism?

Conclusion: Régines will make a proposal on this.

3. Reliability: uncertainty as a result of trust in the researcher

Expert may be a good one or not: Do we need some "Reliability" for each information source? Perhaps as a modifier of everything else, i. e. like a bracket around entire statement sets?

Gregor:
Character may be reliable or not (in average for all items)
Character may be reliable for some items but not for others:
Character scoring in an item may be unreliable because expert is considered unreliable...

Régine intends to discuss the last is the scenario.

Skill or competence of user coding the data is a reliability. This translates into uncertainty. However, this should happen behind the scenes in the application, not being stored in data or output during export.

Probably best at application level, because it is politically undesirable to ever publish this information or send it around. What is the meaning of integrating two data sets, each of which trust himself most? Web of trust: If I trust you and you trust me...

Manual decisions can perhaps be handled with probability/uncertainty modifiers, to express doubt about the coding of another person.

Complex challenges for character hierarchy

Cyril: A simple hierarchy is not enough to represent anatomy. Example: gut of fly goes through thorax and abdomen.

Currently available through character tree view, but no explicit anatomy available. Interesting especially for character dependency (which so far is postponed). For character dependency, a special hierarchy is not enough. It also needs defining presence absence characters for the nodes.

Flowers: petals are white -> flower is white. But: if flower is white, petals are not necessarily white (Cornus florida!).

If leg is brown, coxa, trochanter, tibia, tarsus may be brown as well. But if leg is hairy, only coxa, trochanter, tibia may be hairy, tarsus may not be.

Conclusion: Inheritance of character states along structural hierarchies (or perhaps other hierarchies as well?) does need some extra definition, is not automatically and always implied.


Still to do:
Revisit modifiers (probability, frequency, general, etc.) in the light of Régines' presentation.
Lessons from XPer, very elaborated concept of modifiers: frequency, certainty, availability.
Discuss natural language generation with Régines, uses custom program, but how much can be put into a data driven program?


Saturday, 15. February 2003

Participants

Nicolas Bailly (Paris)
Gregor Hagedorn (Berlin)
Robert Morris (Boston)
Yuri Roskov (Reading)
Guillaume Rousse (Paris)
Jean-Marc Vanel (France)

Definition of descriptive data

Yuri asked some valuable general questions about "What are descriptive data?" and "Why are people doing it?". We discussed these and the following document is an attempt to summarize our current definitions: Definition of the scope of descriptive data.

Frequency modifier (continued)

The discussion from Friday afternoon was continued and revisited.

Frequency modifier defined through frequency range estimates: if uncomfortable with guesses the designer is free to define the range as 0-1 (= 0-100%). This would still work, although it has no analytical value. It could be recommended best practice if it is not possible to make any better decision, or for designers who refuse to give any estimates.

Recommendation is to use reasonable range estimates, e. g. 2-30 % or 10-40% would rank the modifiers.

Pure ranking of verbal frequency modifiers:
1. very rare
2. rare
3. occasionally
3. sometimes
5. frequent
6. always

Then there are other data with frequency values, which can also be ranked:
1. 5%
2. 6-8%
2. 7%
4. 20%
5. 80%

However, now data are split into two sets of relative within-set ranking, there is no ranking between sets.

If, in contrast, there is some frequency range estimate for the verbal frequency modifiers, a joint ranking is possible.

For analysis it is quite reasonable to convert both data into a ranking, but only giving frequency estimates allows the joint ranking.

It would be desirable to give some indication of how "informed" or "wild" the frequency estimate/guess is.


Revision of item definition (= what is being described) and meta data

Next, the discussion on item definitions from Thursday was continued. (Note that some of the results of this discussion have already been worked into the minutes for Thursday to simplify the presentation.)

A central problem in redesigning the structure of item/description is the lack of unique keys for publication and specimen references. To avoid this problem, in Brazil multiple coded/natural language descriptions were kept together by an "IPR envelope". This is considered confusing and unintuitive. Furthermore, it does not scale very well if data are kept in federated systems.

A problem with this is that author, editor, and contributors do not have natural keys. Without envelopes keeping them together, it is difficult to regroup them. A person may be spelled and typed in multiple ways, even without any typing error. Bob Morris = R. Morris = Mr. Morris = Dr. Morris = Prof. Morris = Professor Morris, University of Boston = ... This can be avoided, if person names are defined once (separate list) and then referenced by unique identifiers (key/keyref mechanism. Person names can and should be external resources, adding links where more information about a creator of descriptive data can be found.

Decision: use an indirection mechanism to store the connectors to external data source centrally in resource section, provide internal keys that allow to refer them unambiguously several times. As a result, we reworked the entire concept of resources. We now provide a new type, "resource connector" for publications, specimen, also media resource reworked to fit into the same concept (needs revision). (These changes were already presented in the minutes for Thursday, see above.)

Continuing with the main discussion thread of item definition, we reconsidered the case of federated descriptive data, to obtain information about how the description containers need to be structured.

Federated database:
Inst. 1, Researcher 1: D1 coded descr. from publication 1 refers to t1
Inst. 1, Researcher 1: D2 coded descr. from publication 2 refers to t1
Inst. 2, Researcher 2: D3 coded descr. from publication 3 refers to t1
Inst. 2, Researcher 2: ?4 image from publication 4 refers to t1


Jean Marc showed a proposal to use polymorphic elements with multiple type (esp. the either-date-or-year-elements) which were ok in spy, but not in xsv. Possible solution: define simple type as union.


IPR example: a state is added to a description. The following may be recorded: creation: who and when
updatelast: who and when

Should we add a who/when of data "checking"? Perhaps call this verification? If update date is later than verification date, the descriptions needs to be rechecked!

Or alternatively a boolean attribute "checked" is set to true if it is being verified by an editor that is part of the project editor team (which is a web of trust), must be set to false if data are updated.

Problem: deletion of states. who did this, where to record it? Do we want to have a full history/tracing system?

Documents with different contributions. Important aspect: reward the contributors by making contributors visible. Encourage by adequately document contributions, so that someone contributing much is rewarded much.

Aspect of intellectual history on the data: make applications who can choose which contributors data they are trusting.

Perhaps in project definition need data whether a user is an editor and may check the data? Perhaps in contributor documentation? (= Agents?)

Gregor: Project management is not our consideration, but some SDD decisions place restrictions on project management. SDD should allow basic project management data to be interoperable if data are transferred. Rationale: Ideally document should be fully consistent and checked, which will be possible for small project. However, many larger projects can only extremely rarely reach such a fully checked state. These data sets are nevertheless valuable. In this case the consumer itself must decide with which level of checking she or he is satisfied.

Does project definition provide appropriate "Project wide checked" mechanism? -> RevisionStatus added!

Validity of document. Technical versus scientific. Guillaume: Cohesion: none-overlapping states. Validation checking has poor semantics. Could be a plausibility plus a comparison checking like the mechanism proposed in Workbench: DiversityReferences.

Agreement:
- created who and when
- checked yes/no and who did it
- also should be on the top level: entire project checked

Additional information about updated proposed by Gregor, others were unwilling to follow...

Can two users make the same statement:

What about deletion of states?

Problem is that data are both management data and IP contribution records. This duality is confusing.


Rights: Copyright/Usage conditions should perhaps not be possible except on project level. Having them on more atomic levels can complicate the management of data very much.

However: Natural language descriptions and images can only be integrated if they are acknowledged individually. They may be provided under a usage condition stating that e. g. natural language description text may only be provided under the condition that they may be distributed only in the context and not reused elsewhere. This needs to go somewhere!

Xml document root was renamed from Document to Project.

Technical discussions: Figure out how to do xml keyrefs correctly (Jean-Marc: xsv validator finds a problem with current version which validates ok with xml spy). In comparison with the normative xml schema documents from w3c. We could not identify the problem and were in the end not certain, whether xsv or xml spy validation is correct.

[Retested November 2003, G. Hagedorn: xsv reports some errors that referenced schemas cannot be found (which is unclear, since the namespace names should be URIs, not schema locations), but otherwise accepts our schema as valid.]

Finally we had a discussion whether to use polymorphism of elements or substitution group mechanism, how much to rely on xml type derivation, and the problems with derivation by extension and restriction. Also: how to use external schemata (multiple files that can be reused by other GBIF standards)? Esp.: Formatted text, ContributorDocumentation? How do xml-schema type libraries work? These discussions could not be finished and were partially completed the next day.


Sunday, 16. February 2003

Participants

Nicolas Bailly (Paris)
Gregor Hagedorn (Berlin)
Robert Morris (Boston)
Guillaume Rousse (Paris)

(On Sunday some brave souls met for a few hours to use the available time to clarify open points of the discussion.)

Technical schema discussion: Type derivation

We started with a discussion about type derivation, extension and restriction, and changes within the structure (e. g deriving the element types within a collection type. Gregor expects extension to provide more mechanisms than adding new elements and attributes to the end of the sequence. Most object-oriented languages offer such functionality. In xml schema a similar thing is possible using derivation by restriction. However, restriction of complex types works rather a redefinition and only by name an inheritance. The entire structure and constraints are copied and then redefined in the code. If the complex base type is changed, nothing is updated in types derived by restriction. This makes derivation by restriction only useful to signal to other programs that the types have a relationship, but is no longer an operational tool during schema design. We found that schema design guidelines in fact recommend against widespread use of restriction of complex types. (This is different for simple types, where derivation by restriction is a very useful tool.)

Gregor will try to rework the type inheritance of the schema to make as much use as possible of derivation by extension and limit the use of restriction of complex types.

Naming and structure of the audience-dependent elements (label and wording)

General

In Brazil we tentatively proposed to use an element name "LinguisticSet" for the container of textual elements that belong to a single audience definition, and the term "LinguisticSets" for the collection of these "LinguisticSet" containers (each for a different audience definition).

In the ensuing discussions it turned out that these names (or perhaps the structure used?) lead to much confusion and misunderstanding. One problem is that the term "linguistic set" is about structure on an abstract level, but does not indicate the content or "pay load" of an element. However, the content is often more relevant if one tries to understand the composition of a certain type. In the case of, e. g., a character, it is more relevant to know that an element ultimately contains label information, than that it is uses the generic LinguisticSets/LinguisticSet structure.

Search for alternative terms for Label + Glossary + Abbreviation container:

No satisfactory solution was found.

[Editor's note: after the SDD meeting in Lisbon (October 2003) we now propose to use singular Label + repeated audience-specific Representation elements.]

Various

(Note: Images may be audience specific! This could be due to language embedded in image, or different images may be designed for different expertise levels. As a consequence, we need both language/audience dependent and independent resources.)

InternalNotes versus ReportedNotes: this may be implemented through a generic Notes element with attribute "renderable" or "publiclyVisible". Typing may be a better solution, since reported notes can be constrained as to where they are available. For example, characters, nodes in the character/concept tree, states have no reported notes feature. Instead, it is intended that all additional definitions and explanations should be collected in the reusable Terminology/Glossary/GlossaryEntry feature!

Wording for natural language reporting

If the wording is defined as tree as proposed by Gregor as an alternative model to the DELTA model (one level of headings, special directives for paragraphs, sentences, semicolon- and comma-groups. Three different wording types are required to implement this:

  1. Wording innermost (singular, not around something else)
  2. Wording around something else
  3. Wording around something else that may be repeated (with delimiters between repeated content)

A tree would look like:

                            node1

             node2                       node3            
             
     char1         char2          char3          char4
     
  state 1 2 3   state 1 2 3    state 1 2 3     state 1 2 3

The relation between state and modifiers is then modeled "reversely". Rather than modifiers attached to states:

                      ----------------- State --------------------
                    /               |                \             \
                   |                |                 |             |            
      UncertaintyModifier / FrequencyModifier / GeneralModifier, GeneralModifier, ...

It is modeled as modifiers around the state:
    UncertaintyModifier( FrequencyModifier ( GeneralModifier ( GeneralModifier ( state ) ) ) )
because modifier wording may be required before or after the states (see examples in General modifiers).

Consequently nodes and characters have wording around something else and delimiters between repeated contained elements (further nodes and states, respectively), modifiers have wording around the state, and states have only singular wording.

(Structure of tree according to Bob:)
wb (Element-1) [(Element-2-to-n-minus-1) NTD]* TD (Element-n) wa
Example: if content has 4 elements, the output should be:
wb Element1 NTD Element2 NTD Element3 TD Element4 wa

Available terms for the different types:

We agreed tentatively to use the Set 3 from above.


Position and typing of keyref attributes

Proposal Guillaume: The key attribute should always be on the highest container element of the definition, but keyref attributes should be placed in a separate empty element, not in the container element. This would allow to add the keyref reference always as a complex type with a single empty element containing the keyref attribute (i. e. it allows composition rather than inheritance). The ReferenceType would define an empty element with a single required attribute "keyref".

Example: Audience should not be a keyref attribute on the language container, but rather the container should contain an empty element Audience with an audience attribute (e. g. <Audience keyref="en5">). Advantage: reusable AudienceReference type. Problem: the keyref identity constraint can in xml-schema never be defined on types, only on elements!

[Editorial note Gregor: I tried to follow this, and it does make sense in the audience case. However, in most other cases it leads to confusing structures. For example, the character, state and modifier references in coded descriptions currently look like:

<Character keyref="123">
  <State keyref="20"/>
  <State keyref="21">
      <Modifier keyref="333" />
  </State>
</Character>

Following the proposal above, this would become:

<Character>
  <Character keyref="123">
  <State>
    <State keyref="20">
  </State>
  <State>
    <State keyref="21">
    <Modifier>
      <Modifier keyref="333">
    </Modifier>
  </State>
</Character>

When preparing a new discussion schema for the Lisbon 2003 meeting, I therefore removed the empty audience elements again and instead used a type derivation by extension from keyref base-types for the container elements itself.]


Please send any necessary corrections to G.Hagedorn@bba.de
(Gregor Hagedorn, Convener)



Return to the SDD starting page.

First published 2003-02-19, last update: 2003-12-05.

Valid XHTML 1.0! Valid CSS1! Viewable With Any Browser