Basic principles
CLARIN adheres to the following principles:
- Open standards are preferred over proprietary standards
- Formats and protocols should be:
- well-documented
- verifiable
- proven (being used in practice)
- Text-based formats are (where possible) preferred over binary formats
- In the case of digitisation of an analogue signal, using no or lossless compression is recommended
Learning more
Relevant formats
Several CLARIN centres have published excellent guidance on formats suitable to deposit language research data:
Relevant standards
The table below contains a (non-exhaustive) list of standards that are relevant for the CLARIN community.
Source: CLARIN standard guidance website (provided by the IDS)
Abbreviation/Name |
Topic(s) |
Standard body |
CLARIN centre(s) |
CES |
Generic Corpus Annotation |
EAGLES |
|
CHAT |
File Formats | Transcription |
Other |
|
CMDI |
Metadata |
ISO |
CLARIN-PL, IDS, MPI-PL, UC |
Controlled Vocabulary |
Controlled Vocabulary | Knowledge Representation | Thesaurus |
NISO |
CLARIN-PL, MPI-PL |
CQLF |
Query |
ISO |
IDS |
DCAM |
Metadata |
DCMI |
|
DCMES |
Metadata |
DCMI |
MPI-PL |
DCR |
Data Categorization |
ISO |
CLARIN-PL, IDS, MPI-PL |
DiAML |
Markup Language | Semantic Annotation |
ISO |
|
DictionaryEntry-RePresentation |
Controlled Vocabulary | Terminology |
ISO |
|
DITA |
Generic Corpus Annotation |
OASIS |
|
DOL |
Knowledge Representation | Ontology |
ISO |
|
DSSSL |
Formatting | Transformation |
ISO |
|
Feature structures |
Feature Structure |
ISO |
CLARIN-PL |
GOLD |
Ontology | Terminology |
GOLD Community |
|
HTML |
Meta Language |
ISO |
CLARIN-PL, IDS, MPI-PL |
HyTime |
Markup Language |
ISO |
|
IMDI |
Metadata |
Other |
CLARIN-PL, MPI-PL, UC |
ISBD |
Metadata |
IFLA |
|
ISO-Thesauri |
Controlled Vocabulary | Thesaurus |
ISO |
|
ITS |
Data Categorization |
W3C |
|
JATS |
Generic Corpus Annotation |
NISO |
|
LAF |
Generic Corpus Annotation |
ISO |
CLARIN-PL |
LMF |
Lexical Knowledge |
ISO |
CLARIN-PL, MPI-PL, UC |
MAF |
Morpho-syntactic Annotation |
ISO |
|
METS |
Metadata |
LoC |
|
MLIF |
Multilingual data annotation |
ISO |
CLARIN-PL |
Multilingual Thesaurus |
Controlled Vocabulary | Thesaurus |
IFLA |
|
NISO MIX |
Metadata | Schema |
LoC |
|
NLM JAITS |
Generic Corpus Annotation |
Other |
|
OLAC Metadata |
Metadata |
OLAC |
CLARIN-PL, MPI-PL, UC |
OLiA |
Ontology | Terminology |
|
|
OntoIOp |
Knowledge Representation | Ontology |
ISO |
|
OWL |
Knowledge Representation | Ontology |
W3C |
UC |
PDF/A |
File Formats |
ISO |
|
PISA |
Terminology |
ISO |
IDS, MPI-PL |
RDF |
Knowledge Representation | Metadata |
W3C |
CLARIN-PL, IDS, UC |
RDF/XML |
Meta Language | Serialization |
W3C |
CLARIN-PL |
RDFS |
Constraint Language | Schema |
W3C |
|
RELAX NG |
Constraint Language |
ISO |
MPI-PL, UC |
RTF |
File Formats |
Other |
|
SemAF |
Semantic Annotation |
ISO |
|
SemRoleML |
Markup Language | Semantic Annotation |
ISO |
|
SGML |
Meta Language |
ISO |
|
SimpL-1 |
Terminology |
ISO |
|
SKOS |
Knowledge Representation | Thesaurus |
W3C |
|
SPARQL |
Query |
W3C |
|
SRX |
Segmentation |
LISA |
CLARIN-PL |
Structured Vocabulary |
Controlled Vocabulary | Knowledge Representation | Thesaurus |
BSi |
CLARIN-PL |
SynAF |
Syntactic Annotation |
ISO |
|
TBX |
File Formats | Markup Language | Terminology |
ISO |
|
TEI Guidelines |
Generic Corpus Annotation | Markup Language |
TEI |
CLARIN-PL, MPI-PL, UC |
TextMD |
Metadata |
LoC |
|
TimeML |
Markup Language | Semantic Annotation |
ISO |
CLARIN-PL |
TMF |
Data Categorization |
ISO |
|
TMS |
Data Categorization | Terminology |
ISO |
|
TMX |
File Formats | Markup Language |
LISA |
|
Topic Maps |
Knowledge Representation |
ISO |
|
Turtle |
Serialization |
W3C |
|
WordSeg |
Generic Corpus Annotation | Segmentation |
ISO |
|
XCES |
Generic Corpus Annotation |
EAGLES |
CLARIN-PL, IDS |
XHTML |
Meta Language |
ISO |
CLARIN-PL, IDS |
XML |
Meta Language |
W3C |
CLARIN-PL, MPI-PL, UC |
XMLNS |
Meta Language |
W3C |
IDS, MPI-PL |
XPath |
Markup Language | Transformation |
W3C |
CLARIN-PL, IDS, MPI-PL, UC |
XQuery |
Markup Language | Query |
W3C |
IDS, MPI-PL, UC |
XSD |
Constraint Language |
W3C |
IDS, MPI-PL |
XSL-FO |
Formatting |
W3C |
MPI-PL |
XSLT |
Transformation |
W3C |
CLARIN-PL, MPI-PL, UC |