SDD proposal: Frequency modifiers

TDWG working group: Structure of Descriptive Data (SDD)

Introduction

If a character has multiple states within the class of objects that is being described (population, species, genus, etc.), the frequency with which these state occur may be known (approximately or exactly). The SDD standard provides for approximate and exact frequencies through "frequency modifiers", which are applied to states within item descriptions.

Frequency statements apply primarily to categorical character states. They may also apply to numerical measurements or counts if the controversial numeric multistate proposal is approved (which is not present in SDD 0.9).

Frequency modifiers are either verbal statements ("usually", "occasionally", "rarely", etc.) defined in the terminology, or direct expressions of frequency values (introduced at the SDD meeting in Brazil). The terminology of verbal statements must be defined and an estimate of a range of frequency values must be given for each verbal frequency category.

Giving exact frequency values for a state in an item description is the preferable method for the purpose of data analysis and identification. However, in most cases only approximations can possibly be made.

Recommendation: Frequency statements should be restricted to variability among individual organisms (including the case where populations are generally homogeneous, but variation exists between populations). Variation of a feature that is consistent within the class, but differs at various times of observation (spring, autumn) or between parts of the organism (e. g. hairiness on different parts) should preferably be qualified by general modifiers. For example, "glabrous, hairy at the tip" is preferred over "glabrous, rarely hairy".

Proposal

Frequency modifiers are structurally similar to other modifiers. They carry a different type of semantics, however, and have different attributes. Especially they allow the user to directly enter frequency values during data entry. SDD schema treats them as a separate type and element to prevent the following cases:

For a discussion of the differences and relationships between frequency statements ("usually", "rarely") and likelihood statements ("probably") compare the separate proposal on Certainty modifiers.

Aggregating frequency modifiers

If two specimen have the information "a or very rarely c" and "b or very rarely c" it can easily be aggregated into "a, b, or very rarely c". However, in the case of higher level abstraction (e. g. Order description based on species) it may be desirable to suppress states present only with very low frequencies in the generalization/aggregation process.

During the discussion in Paris we discussed whether a special attribute (e. g. "PropagateOnAggregation") should be added to frequency modifiers. This attribute could determine whether the information to which a frequency modifier is applied is aggregated or not.

Since the desirability and the level of suppression depends on the level of abstraction it seems that rather than introducing an additional attribute the application should base the decision whether to include a state in a generalization or not on the estimated frequency range for a given frequency modifier. The issue of "PropagateOnAggregation" should only be taken up again if it is considered unacceptable to define required frequency estimates on modifiers.

Status

The proposals detailed here were discussed and accepted at the SDD meetings in Australia, (March 2002) and Brazil (2002), and Paris (2003). They are included in the Brazil straw man schema versions, except for the mechanism to select single frequency modifiers or frequency modifier sets for each character.

Request for discussion

Please send your criticism or suggestions to the SDD mailing list or to the author.

Gregor Hagedorn; Vers. 2; 9. November 2003
Earlier versions: Version 1



Return to the SDD starting page.

First published 2002-06-06, last update: 2003-11-09.

Valid XHTML 1.0! Valid CSS1! Viewable With Any Browser