SDD data challenge:
Collections of numerical values and mixed numeric and categorical statements
TDWG working group: Structure of Descriptive Data (SDD)
Introduction
Certain numeric characters are reported as collections of values rather than as numeric statistics. Furthermore, in such collection-characters as well as in numerical characters reporting statistical measures (mean, standard deviation, etc., see the SDD proposal "Numeric data types") categorical states with undefined numerics may be reported in the data source that is being recorded in structured digital format: Many, few, etc.
Data challenge
The following data challenge example illustrates these cases (based on an example by Stephen Seiberling from Flora North America):
1. Bundle Scars 5
2. Bundle Scars 5, 7, or 9
3. Bundle Scars 5, rarely 7
4. Bundle Scars 5-7 or 9
5. Bundle Scars many
Some possible solutions
- Collections of numerical data are stored in a specialized array or collection type. This solution is, for example, implemented in the CSIRO DELTA programs. It needs a different type of data storage and that needs special considerations during all data analysis operations. Also, it cannot directly handle the "many" case.
- The data are stored altogether in a ordinal categorical data type, with states none, 1, 2, 3, 4, 5, 6, 7, 8, 9, many. This option is available only if the domain of possible values is reasonably limited. The example "5-7 or 9" would be coded by scoring "5, 6, 7, 9"; current output processors like CSIRO DELTA or DeltaAccess automatically create ranges for ordinal characters, resulting in a natural language wording "5-7 or 9".
Gregor Hagedorn, Vers. 1, 8.October 2002
Return to the SDD starting page.
First published 2002-10-08, last update: 2005-01-08.
