What standards are recommended by CLARIN?

An open list follows:

    • character encoding: ISO -10646 UNICODE, UTF-8
    • country codes: ISO 3166
    • language codes: ISO 639-1/2/3
    • codes for the representation of names of scripts: ISO 15924
    • text format: XML
    • text format: CSV (comma separated with "-quotes, with a header line and preferrably a line of ISOcat URIs for each column)
    • feature structure representation: ISO 24610-1:2006
    • representation of primary sources: TEI (Text Encoding Initiative)
    • knowledge engeneering: RDF, RDF-S, SKOS, OWL
    • audio/speech: PCM (Pulse Code Modulation) for digitizing sound waves, the Alphabet of the International Phonetic Association for phonetic transcriptions;
    • video/multimodality: MJPEG2000 lossles as backend format, MPEG2 or H.264 for handling and processing
    • data categories: ISO DCR and ISOcat
    • annotation of temporal entities: TimeML (part of TC 37/SC 4)
    • morpho-syntactic annotation: MAF (Morpho-syntactic Annotation Framework), ISO/DIS 24611
    • syntactic annotation: SynAF (Syntactic Annotation Framework), ISO/CD 24615
    • lexical annotation: LMF (Lexical Markup Framework), ISO 24613:2008
    • linguistic annotation: LAF (Linguistic Annotation Framework), ISO/DIS 24612
For more information on each of these standards, please take a look at the CLARIN Standardization Action Plan.