You are here

Standards and Formats

Basic principles

CLARIN adheres to the following principles:

  • Open standards are preferred over proprietary standards
  • Formats and protocols should be:
    • well-documented
    • verifiable
    • proven (being used in practice)
  • Text-based formats are (where possible) preferred over binary formats
  • In the case of digitisation of an analogue signal, using no or lossless compression is recommended

Learning more

Relevant standards

The table below contains a (non-exhaustive) list of standards that are relevant for the CLARIN community.

Source: CLARIN standard guidance website (provided by the IDS)

Abbreviation/Name Topic(s) Standard body CLARIN centre(s)
CES Generic Corpus Annotation EAGLES  
CHAT File Formats | Transcription Other  
CMDI Metadata ISO CLARIN-PL, IDS, MPI-PL, UC
Controlled Vocabulary Controlled Vocabulary | Knowledge Representation | Thesaurus NISO CLARIN-PL, MPI-PL
CQLF Query ISO IDS
DCAM Metadata DCMI  
DCMES Metadata DCMI MPI-PL
DCR Data Categorization ISO CLARIN-PL, IDS, MPI-PL
DiAML Markup Language | Semantic Annotation ISO  
DictionaryEntry-RePresentation Controlled Vocabulary | Terminology ISO  
DITA Generic Corpus Annotation OASIS  
DOL Knowledge Representation | Ontology ISO  
DSSSL Formatting | Transformation ISO  
Feature structures Feature Structure ISO CLARIN-PL
GOLD Ontology | Terminology GOLD Community  
HTML Meta Language ISO CLARIN-PL, IDS, MPI-PL
HyTime Markup Language ISO  
IMDI Metadata Other CLARIN-PL, MPI-PL, UC
ISBD Metadata IFLA  
ISO-Thesauri Controlled Vocabulary | Thesaurus ISO  
ITS Data Categorization W3C  
JATS Generic Corpus Annotation NISO  
LAF Generic Corpus Annotation ISO CLARIN-PL
LMF Lexical Knowledge ISO CLARIN-PL, MPI-PL, UC
MAF Morpho-syntactic Annotation ISO  
METS Metadata LoC  
MLIF Multilingual data annotation ISO CLARIN-PL
Multilingual Thesaurus Controlled Vocabulary | Thesaurus IFLA  
NISO MIX Metadata | Schema LoC  
NLM JAITS Generic Corpus Annotation Other  
OLAC Metadata Metadata OLAC CLARIN-PL, MPI-PL, UC
OLiA Ontology | Terminology    
OntoIOp Knowledge Representation | Ontology ISO  
OWL Knowledge Representation | Ontology W3C UC
PDF/A File Formats ISO  
PISA Terminology ISO IDS, MPI-PL
RDF Knowledge Representation | Metadata W3C CLARIN-PL, IDS, UC
RDF/XML Meta Language | Serialization W3C CLARIN-PL
RDFS Constraint Language | Schema W3C  
RELAX NG Constraint Language ISO MPI-PL, UC
RTF File Formats Other  
SemAF Semantic Annotation ISO  
SemRoleML Markup Language | Semantic Annotation ISO  
SGML Meta Language ISO  
SimpL-1 Terminology ISO  
SKOS Knowledge Representation | Thesaurus W3C  
SPARQL Query W3C  
SRX Segmentation LISA CLARIN-PL
Structured Vocabulary Controlled Vocabulary | Knowledge Representation | Thesaurus BSi CLARIN-PL
SynAF Syntactic Annotation ISO  
TBX File Formats | Markup Language | Terminology ISO  
TEI Guidelines Generic Corpus Annotation | Markup Language TEI CLARIN-PL, MPI-PL, UC
TextMD Metadata LoC  
TimeML Markup Language | Semantic Annotation ISO CLARIN-PL
TMF Data Categorization ISO  
TMS Data Categorization | Terminology ISO  
TMX File Formats | Markup Language LISA  
Topic Maps Knowledge Representation ISO  
Turtle Serialization W3C  
WordSeg Generic Corpus Annotation | Segmentation ISO  
XCES Generic Corpus Annotation EAGLES CLARIN-PL, IDS
XHTML Meta Language ISO CLARIN-PL, IDS
XML Meta Language W3C CLARIN-PL, MPI-PL, UC
XMLNS Meta Language W3C IDS, MPI-PL
XPath Markup Language | Transformation W3C CLARIN-PL, IDS, MPI-PL, UC
XQuery Markup Language | Query W3C IDS, MPI-PL, UC
XSD Constraint Language W3C IDS, MPI-PL
XSL-FO Formatting W3C MPI-PL
XSLT Transformation W3C CLARIN-PL, MPI-PL, UC