What standards are recommended by CLARIN?

Submitted by Dieter Van Uytvanck on 22 February 2010

An open list follows:

  • character encoding: ISO -10646 UNICODE, UTF-8
  • country codes: ISO 3166
  • language codes: ISO 639-1 and 639-3
  • codes for the representation of names of scripts: ISO 15924
  • text format: XML
  • text format: CSV (comma separated with "-quotes, with a header line and preferrably a line of ISOcat URIs for each column)
  • feature structure representation: ISO 24610-1:2006
  • representation of primary sources: (Text Encoding Initiative)
  • knowledge engeneering: RDF, RDF-S, SKOS, OWL
  • audio/speech: PCM (Pulse Code Modulation) for digitizing sound waves, the Alphabet of the International Phonetic Association for phonetic transcriptions;
  • video/multimodality: MJPEG2000 lossles as backend format, MPEG2 or H.264 for handling and processing
  • annotation of temporal entities: TimeML (part of TC 37/SC 4)
  • morpho-syntactic annotation: MAF (Morpho-syntactic Annotation Framework), ISO/DIS 24611
  • syntactic annotation: SynAF (Syntactic Annotation Framework), ISO/CD 24615
  • lexical annotation: (Lexical Markup Framework), ISO 24613:2008
  • linguistic annotation: LAF (Linguistic Annotation Framework), ISO/DIS 24612