You are here

Comparison of PID Systems

 The information on this page is superseded by document CLARIN-2008-2 (Persistent and unique Identifiers) at http://www.clarin.eu/specification-documents

Introduction

Based on the criteria presented we will compare the major PID systems that have been suggested so far. The major ones are

  • URI-URN Standard - IETF/W3C - S 
  • URN Resolver - German National Library (DNB)[1]   - R
  • Handle System - Corporation for National Research Initiatives Virginia - S/R
  • DOI System - International DOI Federation (based on Handle System) - S/R
  • ARK System - University of California - S/R 

Here we distinguish between schemas (S) and resolution systems (R). For schemas we can only speak about a syntax specification. Resolution systems also have a software solution that transforms PIDs into real addresses.  

There are a number of other schemas and resolution systems such as PURL, Info-URI and XRI. For more detailed information we refer to overviews of the MPDL [1] and the Australian PILIN [2] project.

Short Comparison Table 

System Criteria Comments
URI-URN [4, 5] General

Defined by a IETF Standard for the identification of web resources, yet no general resolver has been specified and developed

Copies -
Standards

IETF Standard with W3C Support, the list of accepted URI Schemas can be found on the IANA Web-Site [3]

Additional data -
Semantics Left to the user/creator
Fragments -
Performance/ Robustness -
Security -
Independence

All is open and freely available

Spreading large
Costs no
URN DNB [6] General

There is a home-made resolver at the DNB that transforms standard URIs into locations

Copies -
Standards IETF compatible
Additional data -
Semantics Left to the user/creator
Fragments -
Performance/ Robustness

The resolver was made for internal use only, which does not scale and not made for out of house use, not usable by CLARIN

Security

relatively unproblematic since usage is limited to DNB

Independence Dependence of DNB
Spreading Resolver only used by DNB
Costs -
Handle System [7] General

Handle system is a RFC based schema including a resolver, which has been used and improved during the last 15 years 

Copies supported
Standards

Schema and protocol are specified in RFCs, yet no registration as official URI schema, for 2008 IETF acceptance as an official URI schema is intended

Additional data

Any associated information such as metadata, rights etc is possible, database mechanism remains fast

Semantics Left to the user/creator
Fragments

Intention for an implemented in 2008

Performance/ Robustness

Obviously a software architecture that is tuned for high availability, scalability and performance, robustness has been proven by years of experience in large projects

Security

In particular the management access has been made secure

Independence

CNRI is open with respect to aspects of independence (mirrors, proper contractual clarifications etc) that would allow a continuation even if CNRI would stop, contracts with other institutions have been signed, the exact meaning of a patent needs to be studied

Spreading

Not so known as URLs, but used by a number of large institutions and projects such as Library of Congress

Costs

50 $ per year per prefix (own resolving server)

DOI [8] General

DOI has added a business model to the Handle System and offers registration services as well

Copies See above
Standards See above
Additional data

The INDECS schema is used for metadata, the association of other information such as rights is not intended

Semantics See above
Fragments See above
Performance/ Robustness See above
Security See above
Independence

The DOI system belongs to a company

Spreading

Well established in the publisher’s world

Costs

For the 500.000 objects the MPI currently has they would need to pay about 30.000 per year, since a high granularity of the references is required, costs in this size would not be acceptable

ARK [9] General

ARK comes along with an interesting schema design and a few nice features, also a resolver is being offered, however the spreading is very limited

Copies supported
Standards IETF draft
Additional data

ERC (Electronic Resource Citation) metadata

Semantics Excluded on purpose
Fragments

Excluded on purpose in the syntax

Performance/ Robustness Can’t make statements
Security Can’t make statements
Independence possible
Spreading

Little spreading as far as we know

Costs no

  Experiences

As far as we know only a few institutions in the humanities have already experience with PID systems[2]. Some of them are Max-Planck-Institutes: 

  • The MPI for Meteorology has been registering larger chunks of data in the realm of their international collaboration in the climate research exchange program. The registration is done at both the DNB[3] as well as at TIB[4] Hannover, which is Registration Authority of the IDF[5]. The chunks of data are normally the chunks referred to when making a publication about climate exchange. A higher granularity of referencing is possible via the own internal PIDs stored in the internal database. A higher granularity for outside referencing would be ideal, but the current DOI model would not allow this to do due to too high costs. With the registered PIDs DOI conform metadata are associated.
  • The MPI for Psycholinguistics, University of Lund and INL Leiden introduced PIDs on the basis of the Handle System in the realm of the DAM-LR Project[6]. All MPI's metadata descriptions of its about 500.000 resources got a PID entry, all three institutes maintain a local Handle Server to resolve the references and MPI mirrors the Lund PID database for testing redundancy aspects. Associated with the Handles are rights, since these should go with the objects (PIDs) and not with their instances. Until now the experiences with the Handle System were very satisfying.
  • The Max Planck Digital Library  needs to introduce PIDs as well, since maintaining and resolving PIDs is seen as a must for repository systems with a long-term strategy. Yet MPDL relies on URNs, but is in need of a resolution system. Together with other MPIs negotiations will be started with CNRI about fulfilling all requirements.  

It is obvious that everywhere in science the registration of stable PIDs is one of the most important issues to be solved in the coming years to support stable electronic references of all sort. In the language resource domain each individual resource (even an annotation) needs to be referenced, so that we can expect a huge number of PIDs.

   References

[1] MPDL/FIZ eScidoc: https://zim01.gwdg.de/repos/smc/tags/public/PubMan/Concepts/cpt_pubman_persistentidentifiers.doc

https://zim01.gwdg.de/repos/smc/tags/public/PubMan/Concepts/StoAR_PersistentIdentifiers_Version_1.0.pdf [2] PILIN: https://www.pilin.net.au/Project_Documents/Community_Guidelines/Using_URLS_PI.htm [3] IANA: http://www.iana.org/

[4] URI: RFC 3896, http://www.ietf.org/rfc/rfc3986.txt/ [5] URN: http://www.w3.org/2001/tag/doc/URNsAndRegistries-50 [6] DNB: http://www.d-nb.de/standardisierung/pi/pi.htm

[7] Handle: http://www.handle.net/ [8] DOI: http://www.doi.org/ [9] ARK: http://www.ietf.org/internet-drafts/draft-kunze-ark-14.txt


[1] We assume that there will be more of such home-made solutions that have a limited fucntionality. This is cited as one example of a URN based solution.

[2] We would like to motivate anyone to inform us about institutes with experience.

[3] This service is not available for researchers in general since the resolver was only made for internal use.

[4] Technische Informationsbibliothek Hannover (Technical Information Library Hannover)

[5] International DOI Federation – Registration of Digital Object Identifiers

[6] Distributed Access Management for Language Resources, http://www.mpi.nl/DAM-LR/