The information on this page is superseded by document CLARIN-2008-2 (Persistent and unique Identifiers) at http://www.clarin.eu/specification-documents
Introduction
Based on the criteria presented we will compare the major systems that have been suggested so far. The major ones are
- URI- Standard - IETF/W3C - S
- URN Resolver - German National Library (DNB)[1] - R
- Handle System - Corporation for National Research Initiatives Virginia - S/R
- System - International DOI Federation (based on Handle System) - S/R
- System - University of California - S/R
Here we distinguish between schemas (S) and resolution systems (R). For schemas we can only speak about a syntax specification. Resolution systems also have a software solution that transforms PIDs into real addresses.
There are a number of other schemas and resolution systems such as PURL, Info-URI and XRI. For more detailed information we refer to overviews of the MPDL [1] and the Australian PILIN [2] project.
Short Comparison Table
System | Criteria | Comments |
---|---|---|
URI-URN [4, 5] | General |
Defined by a IETF Standard for the identification of web resources, yet no general resolver has been specified and developed |
Copies | - | |
Standards |
IETF Standard with W3C Support, the list of accepted URI Schemas can be found on the IANA Web-Site [3] |
|
Additional data | - | |
Semantics | Left to the user/creator | |
Fragments | - | |
Performance/ Robustness | - | |
Security | - | |
Independence |
All is open and freely available |
|
Spreading | large | |
Costs | no | |
URN DNB [6] | General |
There is a home-made resolver at the DNB that transforms standard URIs into locations |
Copies | - | |
Standards | IETF compatible | |
Additional data | - | |
Semantics | Left to the user/creator | |
Fragments | - | |
Performance/ Robustness |
The resolver was made for internal use only, which does not scale and not made for out of house use, not usable by CLARIN |
|
Security |
relatively unproblematic since usage is limited to DNB |
|
Independence | Dependence of DNB | |
Spreading | Resolver only used by DNB | |
Costs | - | |
Handle System [7] | General |
Handle system is a RFC based schema including a resolver, which has been used and improved during the last 15 years |
Copies | supported | |
Standards |
Schema and protocol are specified in RFCs, yet no registration as official URI schema, for 2008 IETF acceptance as an official URI schema is intended |
|
Additional data |
Any associated information such as metadata, rights etc is possible, database mechanism remains fast |
|
Semantics | Left to the user/creator | |
Fragments |
Intention for an implemented in 2008 |
|
Performance/ Robustness |
Obviously a software architecture that is tuned for high availability, scalability and performance, robustness has been proven by years of experience in large projects |
|
Security |
In particular the management access has been made secure |
|
Independence |
CNRI is open with respect to aspects of independence (mirrors, proper contractual clarifications etc) that would allow a continuation even if CNRI would stop, contracts with other institutions have been signed, the exact meaning of a patent needs to be studied |
|
Spreading |
Not so known as URLs, but used by a number of large institutions and projects such as Library of Congress |
|
Costs |
50 $ per year per prefix (own resolving server) |
|
DOI [8] | General |
DOI has added a business model to the Handle System and offers registration services as well |
Copies | See above | |
Standards | See above | |
Additional data |
The INDECS schema is used for metadata, the association of other information such as rights is not intended |
|
Semantics | See above | |
Fragments | See above | |
Performance/ Robustness | See above | |
Security | See above | |
Independence |
The DOI system belongs to a company |
|
Spreading |
Well established in the publisher’s world |
|
Costs |
For the 500.000 objects the MPI currently has they would need to pay about 30.000 per year, since a high granularity of the references is required, costs in this size would not be acceptable |
|
ARK [9] | General |
ARK comes along with an interesting schema design and a few nice features, also a resolver is being offered, however the spreading is very limited |
Copies | supported | |
Standards | IETF draft | |
Additional data |
ERC (Electronic Resource Citation) metadata |
|
Semantics | Excluded on purpose | |
Fragments |
Excluded on purpose in the syntax |
|
Performance/ Robustness | Can’t make statements | |
Security | Can’t make statements | |
Independence | possible | |
Spreading |
Little spreading as far as we know |
|
Costs | no |
Experiences
As far as we know only a few institutions in the humanities have already experience with PID systems[2]. Some of them are Max-Planck-Institutes:
- The MPI for Meteorology has been registering larger chunks of data in the realm of their international collaboration in the climate research exchange program. The registration is done at both the DNB[3] as well as at TIB[4] Hannover, which is Registration Authority of the IDF[5]. The chunks of data are normally the chunks referred to when making a publication about climate exchange. A higher granularity of referencing is possible via the own internal PIDs stored in the internal database. A higher granularity for outside referencing would be ideal, but the current DOI model would not allow this to do due to too high costs. With the registered PIDs DOI conform metadata are associated.
- The MPI for Psycholinguistics, University of Lund and INL Leiden introduced PIDs on the basis of the Handle System in the realm of the DAM-LR Project[6]. All MPI's metadata descriptions of its about 500.000 resources got a PID entry, all three institutes maintain a local Handle Server to resolve the references and MPI mirrors the Lund PID database for testing redundancy aspects. Associated with the Handles are rights, since these should go with the objects (PIDs) and not with their instances. Until now the experiences with the Handle System were very satisfying.
- The Max Planck Digital Library needs to introduce PIDs as well, since maintaining and resolving PIDs is seen as a must for repository systems with a long-term strategy. Yet MPDL relies on URNs, but is in need of a resolution system. Together with other MPIs negotiations will be started with CNRI about fulfilling all requirements.
It is obvious that everywhere in science the registration of stable PIDs is one of the most important issues to be solved in the coming years to support stable electronic references of all sort. In the language resource domain each individual resource (even an annotation) needs to be referenced, so that we can expect a huge number of PIDs.
References
[1] MPDL/FIZ eScidoc: https://zim01.gwdg.de/repos/smc/tags/public/PubMan/Concepts/cpt_pubman_persistentidentifiers.doc
https://zim01.gwdg.de/repos/smc/tags/public/PubMan/Concepts/StoAR_PersistentIdentifiers_Version_1.0.pdf [2] PILIN: https://www.pilin.net.au/Project_Documents/Community_Guidelines/Using_URLS_PI.htm [3] IANA: http://www.iana.org/
[4] URI: RFC 3896, http://www.ietf.org/rfc/rfc3986.txt/ [5] URN: http://www.w3.org/2001/tag/doc/URNsAndRegistries-50 [6] DNB: http://www.d-nb.de/standardisierung/pi/pi.htm
[7] Handle: http://www.handle.net/ [8] DOI: http://www.doi.org/ [9] ARK: http://www.ietf.org/internet-drafts/draft-kunze-ark-14.txt
[1] We assume that there will be more of such home-made solutions that have a limited fucntionality. This is cited as one example of a URN based solution.
[2] We would like to motivate anyone to inform us about institutes with experience.
[3] This service is not available for researchers in general since the resolver was only made for internal use.
[4] Technische Informationsbibliothek Hannover (Technical Information Library Hannover)
[5] International DOI Federation – Registration of Digital Object Identifiers
[6] Distributed Access Management for Language Resources, http://www.mpi.nl/DAM-LR/