TDWG working group: Structure of Descriptive Data (SDD)
(Note: The following discussion is partly based on discussions at the SDD meetings in Brazil (2002) and Paris (2003). See also the proposal Intellectual Property documentation )
In large projects it is essential that the individual contributions can be traced to a certain extent and that the scientist responsible for a given part of the work is recorded. Two major aspects exist:
To some extend these two aspects are interwoven and are therefore treated together. Occasionally it will be important, however, to distinguish them.
The dominant aspect is the IPR, and here in contrast to most commercial operations, not even the legal aspects of IPR. Rather, to be promoted or even allowed to continue, most scientists must be able to document their work in the form of publications (printed or digital). Large database projects must be able to generate web pages that can be cited to proof the individual contributions.
Work on a project that is collecting and revising descriptive data has many aspects and phases. After the initial phases of conceptually defining the scope of the project and the terminology to be used, the work on the descriptions consists of:
In practice most of these steps are difficult to distinguish. Data typing and interpretation are difficult to separate if the terminology of information source and data model differs, most proofreading occurs as a by-product during interpretation and revision, etc. A data model that would require the contributor to a project to evaluate their own work with respect to such a classification would probably be very hard and ungratifying to use for scientists.
Data collection is especially problematic. The information source itself is recorded in the form of a citation of a publication resource (see @@@). However, the effort to find appropriate data sources at all is not directly visible in any object in a descriptive database. It is intimately related with the data input, which creates the first object.
It is possible to distinguish between a person that is responsible for the selection of data, and another person that does the typing work. The DiversityWorkbench model (@@) distinguishes in general between a responsible person and an operator, and always uses a double login. That is, an non-scientist operator may work for different responsible scientists, and a responsible scientist is logged in as both operator and responsible person.
Whether such a distinction should be entered in the SDD model should be discussed [@ Open Question @]. The experience with the DiversityWorkbench projects has proven it to be very useful. It should be noted, however, that tracing the typist's names is exclusively a management problem, and not related to intellectual property rights. Furthermore, pure typing work is perhaps relatively rare in descriptive data.
Depending on the scale of the project, several solutions offer the best balance between efficiency of work (avoiding management overhead and time consuming recording of meta data) and accuracy of IPR documentation:
(The terms "author team model", "shared work model", and "private workspace model" are introduced here for the purpose of discussion. I am grateful if somebody points out citable equivalent terms to me.)
The following example illustrates for states of a single character the differences between the 3 models:
| Model: | "author team" | "shared work" | "private workspace" |
| Scientist 1: inserts state "x" |
state x | state x | state x |
| Scientist 2: confirms state "x" |
state x | state x | state x state x |
| Scientist 2: adds state "y" |
state x state y |
state x state y |
state x state x state y |
| Scientist 1: contradicts state "y" |
state x | state x | state x state x state y NOT state y |
--------------------------
I have a strong dislike for the solution to allow metadata just everywhere. The reason for this is, that it is elegant when the data are captured, and it makes easy programming, but it reorganizing data pure hell. For every element with meta data the implications of making a change have to be analyzed carefully before making changes. I feel this would make it almost impossible to change the structure of SDD in a new version, even if we should find ourselves in error. I do not see that elements themselves have any meta data in itself. rather, the meta data refers to the container. There is no such thing as author of the character element itself, but there is a thing like author of everything that is contained within this element (including label, definition, etc.) The implications of having metadata on all kinds of hierarchies rather than on atomic elements to me is, that you cannot ever modfiy the hierarchy. If there is metadata on a character grouping, you can not reorganize characters without throwing away all metadata on them. I believe for each metadata instance we need a clear definition of what it means and what not. For instance, I believe that metadata on the description (former "item") should be understood to refer to the definition (taxon name, publication, specimen reference and item annotation, but not to the actual description itself, although this is contained in the same element. If this is done we keep a path open for future reorganizations. We could say metadata can exist only on simple types, but perhaps we should keep some items together. I do not think it is necessary to add metadata on for a state score on both the frequency upper and lower range values associated with the state. In fact I currently feel that allowing metadata ONLY on states in the description would be a good solution. ================================= From email with G: > > I do not see that elements themselves have any meta data in itself. > > rather, the meta data refers to the container. There is no such > > thing as author of the character element itself, but there is a > > thing like author of everything that is contained within this > > element (including label, definition, etc.) > I don't see any difference between the author of a book, and the > author of characters, plot and landscapes from the book. True. My problem is what if 100 authors each writes some paragraph in the book and 3 editors do work on different parts of the content, and the printer does work on the book. Now putting metadata on the book element probably needs explanation, and you cannot simply get away with saying "this is some kind of metadata on the book element". In my thinking the metadata are not meaningless, but you must differentiate between IPR metadata that mean: Covers this and all included content, and IPR metadata that mean: only covers this. Whenever data are reorganized, you must analyze this distinction in detail. =================================(Gregor attempted to clarify the discussion on the next day in the following proposal. This material from Friday is shown here to keep it in the context:)
The current project-based attribution schemes (as present in DELTA) makes it possible to work in small teams where all members know each other well, and where the relative share of the work is agreed upon in advance. However, even middle sized projects need to observe some management practices.
With multiple users, a scheme how to append and add to each others information is relatively easy, since the mechanism is identical to summarizing information from several specimens for a taxon descriptions. However, at the moment the question how to contradict each other is unresolved!
General problem: of attribution/accrediation of data origin/editing/analysis/intellectual property rights should be perfect here, the model easily becomes unmanageable, so we have to find a good compromise.
Please send your criticism or suggestions to the SDD mailing list or to any of the authors.
Gregor Hagedorn; Vers. 1; 14. March 2003