SDD proposal: GUID usage

TDWG working group: Structure of Descriptive Data (SDD)

(Currently a raw discussion document, not worked through!)

Introduction

Discuss globally unique identifiers, and architectural issues how to separate terminology definition and item description and support federated descriptive data projects. Use a web-URN or a GUID model?

Current GBIF standpoint:

Donald Hobern prefers numeric GUIDs to avoid misinterpretation: "I note that you will be discussing globally unique identifiers and agree that this is an essential item. Like you I feel that the scheme for these need not be readable in the way that URIs are." (pers. comm. to G. Hagedorn)

Hannu Saarenmaa rather prefers URIs: "There are a few options for the form that TCIDs should take. These include a 64 bit long integer, a 128 bits long GUID (Globally Unique Identifier), or an URI string. While the integer form could easiest be used as database key, it would not be easy to distribute and read by human users. It might not always be recognized as a globally unique key. These factors could cause errors. GUIDs would be unique and not so easy to miss what they are, but are even less readable than numbers. URIs would therefore probably be the best solution. Examples:
www.lepidoptera.com/papilionidae/parnassius/apollo/fennica
fennica.apollo.parnassius.lepidoptera.net
mail.santaclaus.fi/reindeers/rudolf" (From "Taxonomic Object Service (RFC)", draft 0.4 by Hannu Saarenmaa, 2003)

Telefone numbers are not so bad in being readable. They actually have the problem, of being semantical, i.e. you have to change your number when you are moving (which is currently being increasingly relaxed!).

Similar: I am not opposed against character strings, but I am against the appearance of semantics. No problem with http://www.lias.net/characters/AZDJJ32347. But http://www.uni-tuebingen.de/123352 is already problematic. I believe most biologists would change this to http://www.uni-frankfurt.de/123352 as soon as they get an appointment at a new university, thinking that the first part indicates an address (like the email address and homepage always change when moving) and (correctly!) assuming that this visible thing will actually cause a recognition or attribution of the scientific work as being somehow related to the institution.

Possible: Use project code name (alphanumeric) + number. This project code should be a separate data item from the project title, which can change frequently. The project code could even be recommended to be a DNS name, if one is available.

See also ### document for GBIF on taxonomic names and their identifiers!! MAKE LINK

Better URIs than GUIDs?
From: Web Architecture from 50,000 feet
"The most fundamental specification of Web architecture, while one of the simpler, is that of the Universal Resource Identifier, or URI. The principle that anything, absolutely anything, "on the Web" should identified distinctly by an otherwise opaque string of characters (A URI and possibly a fragment identifier) is core to the universality."

Retrieval is useful but identity is important too.

archived email answer by Tim Berners-Lee, May 20 2000 (http://lists.w3.org/Archives/Public/xml-uri/2000May/0287.html): "(sometimes I wish I had restricted the elements of a path to digits! ;-)"

archived email answer by Tim Berners-Lee, May 20 2000 (http://lists.w3.org/Archives/Public/xml-uri/2000May/0287.html): "No one I have heard is suggesting representing *all* the desired syntactical restrictions. A schema gives you one set and it may miss some. There may some restrictions which it is not powerful enough to express. As you say, the substructure of the attribute values is one."

Request for discussion

Please send your criticism or suggestions to the SDD mailing list or to any of the authors.

Gregor Hagedorn; Vers. 1; 14. March 2003



Return to the SDD starting page.

First published 2003-03-14, last update: 2003-03-14.

Valid XHTML 1.0! Valid CSS1! Viewable With Any Browser