SDD proposal: Adding misinterpretation hints
TDWG working group: Structure of Descriptive Data (SDD)
Introduction
To achieve good identification rates, common misinterpretations of characters or states must be taken care of. If the designer of a data set adds false statements to preempt misinterpretations of users without distinguishing them from true statements, the data set degenerates and becomes difficult to manage and revise. Furthermore, it can then only be used for identification purposes and not, for example, for phylogenetic analysis.
Among the current identification packages, only LucID fully supports a separate markup for misinterpretations. In DELTA the information itself can be stored as a comment, but is not available for further analysis. DeltaAccess supports misinterpretation markers through modifiers, but the initial versions had the same problems as DELTA, i. e. a misinterpretation preemption could not be recognized during analysis. Starting with version 1.8, DeltaAccess has added a new attribute to identify these modifiers.
Several types of misinterpretation can be distinguished:
- The organism part (= structure) is generally likely to be misinterpreted
Examples: a phylloclade (= cladode) is interpreted as a leaf, or a rhizome as a root.
- The organism part is likely to be misinterpreted within a given taxonomic group
Example: the inflorescence is often interpreted as a flower in Euphorbia. Or the bracts of Cornus florida
(dogwood) are interpreted as petals.
- The property state is generally likely to be misinterpreted.
Example: a spore surface that is visibly rough in a good microscope may be interpreted as smooth because of insufficient
optics or inappropriate handling of the microscope.
- The property state is likely to be misinterpreted within a given taxonomic group.
Example: Leaves of Lotus corniculatus are palmate with 3 leaflets plus 2 leaflet-like stipules, but often
misinterpreted as pinnate with 5 leaflets.
The cases differ in how they ideally would be handled:
- In the first case, a generic misinterpretation tolerance mechanism (e. g. a mapping) of the organism part would
be desirable. For example, If a root character is specified during identification, the identification application
could search both under "root" and "rhizome" characters. This general tolerance mechanism would then automatically apply to any character in the descriptions.
- In the second case, a taxon-specific mapping of the organism part would be desirable.
- The third case could be solved by a state mapping that adds error tolerance to the identication for all taxa.
- In the last case, a special attribute could be added to the state for each item where the misinterpretation arises.
All cases can in principle be managed with the single mechanism of description x state-specific misinterpretation attributes, although this would require extensive work by the designer (e. g. adding potentially misinterpreted statements to a number of characters if the confusion is really about a misinterpretation of the part that is being described.
Proposal
- Add state-specific misinterpretation attributes in the descriptions. These can either be implemented through project-wide modifiers that have a misinterpretation attribute set to true, or through item-and-state-specific attributes plus appropriate wording.
- The wording for the reporting of misinterpretations, both in generated natural language descriptions and in tabular or other data reports should be customizable. (If an application does not support customized wording for misinterpretations of states, the association of a given state with a given wording should at least be transparently stored so that it reappears when a new version of the descriptor data set is output in the SDD standard.)
- In version 0.9 of SDD we attempt to handle this with Certainty modifiers, i. e. Misinterpretation is interpreted as "certainly not". Probability modifiers have an extra attribute IsTrueByMisinterpretation. If this is set to true, the modifier is identified as a misinterpretation modifier.
- The interpretation of state-specific modifiers as certainty modifiers is supported by the following structural similarities:
- Probability modifiers cannot logically occur together with a misinterpretation modifier: "flowers probably white (by misinterpretation)" does not make sense.
- Only probability and misinterpretation modifiers (but not frequency or general modifiers) are applicable to numeric statistical measures.
- The question of misinterpretation of organism parts or structures (which can only be solved unsatisfactorily with state-specific misinterpretation modifiers) should be reconsidered together with the generic aggregation/generalization mechanisms (generating new descriptions for species (= Class) based on specimen (= Object)).
Request for discussion
Please send your criticism or suggestions to the SDD mailing list or to any of the authors.
Gregor Hagedorn; Version 2; 11. November 2003
Earlier versions: Version 1
Return to the SDD starting page.
First published 2002-02-20, last update: 2003-11-11.
