SDD proposal: Character dependency

TDWG working group: Structure of Descriptive Data (SDD)

Note on the current version of this document

This document DOES NOT YET discuss character dependency properly. However, it already contains information about the relationship between coding status values (= "missing data indicators"), esp. the not-applicable indicator, and declarative character dependency rules.


Introduction

[...]

Relations between declarative character dependency and missing data indicators

As discussed in the proposal on missing data indicators, it seems desirable to support both declarative character dependency and explicit "not applicable" missing data indicators.

On the one hand, declarative character dependency has several advantages: a) it generally constrains the list of available characters in identification or data entry (thus increasing identification or data input efficiency, b) it validates data input (reducing errors), and c) it can be used to explain why something is inapplicable.

On the other hand, a) situations exist that cannot be expressed through character dependency (e. g. method dependent inapplicability, where some values cannot be observed by the method), and b) it seems desirable to decouple the revision of the terminology (and the character dependency rules declared therein) from the recording of descriptive data. Understanding character dependency is a process that often continues during the entire project duration. Work flow management becomes considerably easier if scoring of data and refining the character dependency declarations are decoupled. The separate "not applicable" state serves as a marker, bringing necessary revisions of the character dependency rules to the attention of the editor responsible for terminology. This is especially important in federated databases or large collaborative projects, where the terminology can only be changed in a consensus process.

Providing both "not applicable" states and declarative dependency is similar to the support of free-form notes within descriptions:

Terminology defines:  Object descriptions allow in addition:
Character and state definitions  Reported notes (free-form text)
Character dependency rules  "Not applicable" missing data indicator

It is necessary to explore the interactions between dependency rules and missing data indicators (especially the "not applicable" state). If a character is covered by a character dependency rule and existing data invoke the dependency, it is still possible that either normal or missing data indicators are present. Applications should report the presence of normal states in inapplicable characters as an error. Should missing data indicators be handled differently?

A statement that data are "unknown" or "not interpretable" clearly implies the potential presence of scorable data and therefore violates the character dependency rules just like normal states. The presence of these missing data indicators in inapplicable characters should be reported as an error.

However, explicit "not applicable" statements are redundant rather than logically inconsistent. The presence of such redundant "not applicable" state is currently possible at least in some DELTA based applications.

Consider a character that is inapplicable through a character dependency rule and for which the controlling state is present in the description. Three options to deal with such duplication of character dependency and explicit "not applicable" states in the description are conceivable:

Conclusion1: The explicit "not applicable" missing data indicator should be allowed for characters inapplicable through a character dependency rule. The rules and recommendations outlined above under "optionally permit explicit 'not applicable' scoring" should be followed.

Temporary Note 1: in the Paris schema I have added a section "CharacterDependencyRules" in terminology and added a simple controlling state/dependent character logic. This needs further discussions, however! Especially, it is unclear whether inapplicability and applicability dependency can both be expressed as inapplicability, and whether a special mechanism to directly deduce dependency from character hierarchies should be added in a addition to the DELTA-like flat model expressed here.

Temporary Note 2:The relation between the use of inapplicable as a taxon specific default state and character dependency are parallel mechanisms to declare inapplicability. It is unclear at the moment whether structurally similar solutions are possible. This applies not to flat character dependency declarations, but to those dependent on some hierarchy (e. g. part hierarchy: if some part is missing, all dependent parts, and thus all character depending on the entire tree should be inapplicable as well). Part-hierarchy-driven character dependency is another open issue!


Responsibility for validation

Character dependency rules and the combination of explicit "not applicable" missing data indicators with other data are not validated by the SDD schema or by XSLT rules directly derived from it. The xml-validator should allow any character state in inapplicable characters. It is not possible to detect wether the error is due to mis-scoring or to an inappropriate declaration of character dependency. Requiring data to be always valid in this respect requires in the second case an immediate revision of the character dependency rule in the terminology, which is not always possible in collaborative projects. (However, generic xslt code reporting violations of inapplicability in SDD data sets would be very helpful.)

Note: An exception to the rule that validating code should report states in inapplicable characters as errors is the case of data marked with a modifier as "present by misinterpretation". In such data the states are truly absent (and are analyzed as such), but are coded to achieve some degree of error tolerance in identification. Such data can validly appear in inapplicable characters. The exception must be made both for characters inapplicable through declarative character dependency and for characters containing "not applicable" indicator.

Note: this implies that the "present by misinterpretation" modifier must be applicable to numeric characters. The only other modifiers definable at numeric states are the certainty modifiers. Also: Should the set of "present by misinterpretation" modifiers be handled as "Certainty modifiers" rather than normal modifiers? (I think yes, since probability modifiers are possible in numerical characters).


To do: Check logic of DELTA *Nonautomatic Controlling Characters directive! See user guide 2000

Footnotes

Footnote 1: These conclusions were agreed upon on 17. Oct. 2002 at the TDWG-SDD meeting in Brazil. However, any conclusion at the current state is open to revision and new discussions!


Request for discussion

Please send your criticism or suggestions to the SDD mailing list or to the author.

Gregor Hagedorn; Vers. 1; 28. August 2003



Return to the SDD starting page.

First published 2003-03-07, last update: 2003-08-28.

Valid XHTML 1.0! Valid CSS1! Viewable With Any Browser