Lexica

Introduction

Lexica are primarily used in applications. They typically contain an extensive lexical inventory with specific linguistic information (e.g., morphosyntax, sentiment). There are 80 lexica in the CLARIN infrastructure. Most (65) of the lexica are monolingual, accounting for 17 languages (Arabic, Croatian, Czech, Danish, Dutch, English, Estonian, French, Icelandic, Italian, Greek, Maltese, Polish, Portuguese, Serbian, Slovenian, and Swedish). The rest (15) are multilingual and include a variety of language combinations. In the vast majority of the cases, the lexica can be directly downloaded from the national repositories or queried through easy-to-use online search environments.

For comments, changes of the existing content or inclusion of new resources, send us an email.

This website was last updated on 3 June 2021.

Lexica in the CLARIN infrastructure

Monolingual resources

Resource

Language Description Availability

A machine-readable dictionary of Egyptian Arabic

Size: 2,418 entries

Linguistic information: basic morphological information, usage examples

Licence: CC-BY-NC-SA 3.0

Arabic (Egyptian)

 

This lexicon presents a more comprehensive version of A machine-readable glossary of Egyptian Arabic. The resource is available for download from ARCHE.

Download

A machine-readable glossary of Egyptian Arabic

Size: 2,204 entries

Linguistic information: basic morphological information, usage examples

Licence: CC-BY-NC-SA 3.0

Arabic (Egyptian)

 

This lexicon has been compiled for comparative as well as didactic purposes in the on-going VICAV project. The resource is available for download from ARCHE.

Download

Automatically constructed multiword lexicon hrMWELex v0.5

Size: 43,730 entries

Linguistic information: multi-word expressions

Licence: CC-BY 4.0

Croatian

This is a lexicon of multiword expressions available for download from CLARIN.SI.

For a related publication, see Ljubešić et al. (2015).

Download

Inflectional lexicon hrLex 1.3

Size: 6,427,709 items; 164,206 entries

Linguistic information: wordform, lemma, MSD, UPOS, 

Licence: CC-BY 4.0

Croatian
This is a large inflectional lexicon where each entry consists of a (wordform, lemma, MSD, MSD features, UPOS, morphological features, frequency, per-million frequency) 8-tuple. The (wordform, lemma, MSD) triple frequencies are calculated on the hrWaC v2.2 corpus. The MSD tagset follows the MULTEXT-East V6 tagset for the Serbo-Croatian macro-language. The UPOS and morphological features follow the UD v2 specifications.
 
The resource is available for download from CLARIN.SI
 
For a relevant publication, see Ljubešić et al. (2016).
Download

Word embeddings CLARIN.SI-embed.hr 1.0

Size: 3,147,352 entries

Linguistic information: PoS-tags, lemmas

Licence: CC-BY 4.0

Croatian

This lexicon contains word embeddings extracted from the Croatian web corpus hrWaC and a 400-million-token-heavy collection of newspaper texts. The resource is available for download from CLARIN.SI.

Download

DeriNet 1.6

Size: 1,027,832 entries

Licence: CC-BY-NC-SA 3.0

Czech

This is a lexicon of derivational relations (both compounding and inflections). The resource is available for download and online browsing through LINDAT.

Browse

Download

MorfFlex CZ

Size: 124,259,099 lexical types

Linguistic information: MSD-tags, derivational, semantic, NER information

Licence: CC-BY-NC-SA 3.0

Czech

This is a morphological lexicon available for download from LINDAT.

Download

ParaDi 2.0

Size: 1,621 entries

Linguistic information: MSD-tags, syntactic/semantic features

Licence: CC-BY 4.0

Czech

This is a lexicon of single-word paraphrases of Czech verbal multiword expressions. The resource is available for download from LINDAT.

Download

PDT-Vallex

Size: 7,121 entries, 11,933 frames

Linguistic information: verb, adjective and noun valency

Licence: CC-BY-NC-SA 4.0

Czech

This is a valency lexicon linked to several Czech corpora (PDT, PCEDT Cz side, PDTSC, Faust). The resource is available for download and online browsing through LINDAT.

For a related publication, see Urešová (2011).

Browse

Download

VALLEX 3.0

Size: 2,722 entries, 6,711 units, 6,711 frames, 4,586 words

Linguistic information: verb senses (characterized by glosses and examples)

Licence: CC-BY-NC-SA 4.0

Czech

This is a valency lexicon available for download and online browsing through LINDAT.

For a related publication, see Lopatková et al. (2017).

Browse

Download

STO morphology (v2) - LMF format

Size: 87,209 entries

Licence: CC BY-SA 4.0

Danish

This morphological lexicon is available for download from the CLARIN-DK repository. It is also available in the .csv format.

Download

STO syntax (v2) - LMF format

Size: 84,159 entries

Licence: CC BY-SA 4.0

Danish

This syntactic lexicon is available for download from the CLARIN-DK repository.

Download

Basilex Lexicon

 

Dutch

This is a lexicon that comprises all lemmas from the Basilex Corpus. The Basilex Corpus is an annotated collection of texts written for children in elementary school. The resource is available for download from the Dutch Language Institute (INT).

Download

Basiscript Lexicon

Licence: other

Dutch

This is a lexicon that comprises all lemmas from the Basiscript Corpus. The Basiscript Corpus is an annotated collection of texts written by children in elementary school. The resource is available for download from the Dutch Language Institute (INT).

Download

CombiLex

Size: 213,000 lemmas

Linguistic information: lemmas and word forms

Licence: other

Dutch

This is a lexicon of words and word forms available for download from the Dutch Language Institute (INT).

Download
 
Size: 220,000 entries; 600,000 word forms; 77,000 multi-word expressions; 26,000 multi-word lemmas
Linguististic information: MSD-tags, syntactical and phonological information; partially semantically annotated
Licence: other
Dutch

This is a lexical database that consists of a one-word lexicon and a multi-word lexicon.

 

This lexicon is available for download from the Dutch Language Institute (INT).

Download

Dutch Electronic Lexicon of Multiword Expressions

Size: 5,000 expressions

Licence: other

Dutch

This is a lexicon of multiword expressions available for download from the Dutch Language Institute (INT).

Download

PAROLE Lexicon

Size: 20,000 entries

Linguistic information: MSD-tags and syntactic complementation patternsLicence: other

Dutch

This morphosyntactic lexicon is available for download from the Dutch Language Institute (INT).

Download

Reference Lexicon for Belgian-Dutch (RBBN)

Size: 4,000 words and expressions-

Licence: other

Dutch

This lexicon, which contains words and expressions typically of Dutch spoken in Belgium, is available for download from the Dutch Language Institute (INT).

Download

Reference Lexicon for Dutch

Size: 50,000 lemmas

Linguistic information: dialectical information

Licence: other

Dutch

This is a corpus-based monolingual lexicon available for download the Dutch Language Institute (INT).

Download

BioLexicon

Size: over 2.2 million entries (over 3.3 million semantic relations)

Licence: ELRA END USER

English

This is a large-scale, wide-coverage computational lexicon covering the biomedical domain. The resource is unavailable for download or online browsing, but can be accessed by contacting the resource manager.

 

EngVallex

Size: 4,337 entries, 7,148 frames

Linguistic information: verb valency

Licence: CC-BY-NC-SA 4.0

English

This is a valency lexicon linked to the English side of the PCEDT corpus (WSJ corpus). The resource is available for download from LINDAT and for online browsing.

Browse

Download

The Database of Estonian Multi-Word Expressions

Size: 12,500 words

Licence: proprietary

Estonian

This is a collection of lexica that contain multi-word expressions consisting of a verb and a particle or a verb and its complements. The resource is available for download from (CELR distribution) and for online browsing through a dedicated website.

Browse

Download

Démonette

Size: 96,027 entries

Linguistic information: MSD-tags (grace format), semantic types

Licence: CC-BY 4.0

French

This is a morphological lexicon available for download from ORTOLANG.

Download

Dicovalence

Size: 8,000 entries

Linguistic information: c- and s-selectional restrictions

Licence: Licence Publique Générale Amoindrie GNU

French

This is a verb-valency lexicon.

The lexicon specifies certain selectional restrictions, possible term manifestations (pronominal, phrasal), and whether the valency frames can be used in various passive constructions, as well as references to other valency frames for the same infinitive. The resource is available for download from ORTOLANG.

Download

MarsaLex

Size: 595,000,000 inflected forms

Licence: CC-BY 4.0

French

This is a morphological lexicon available for download from ORTOLANG.

Download

Morphalou

Size: 159,261 entries

Linguistic information: spelling, phonetics, mood, tense, MSD-tags, spelling variant, feminine variation, pronominal

Licence: Publique Générale Amoindrie GNU

 

French

This is a morphological lexicon available for download from ORTOLANG.

Download

VfrLPL

Size: 8,800 entries

Linguistic information: conjugation forms, phonetic forms, use frequencies

French

This is a morphosyntactic lexicon available for download from ORTOLANG.

Download

ILSP PsychoLinguistic Resource

Size: 217,664 entries

Linguistic information: phonetic transcription, frequency of usage

Licence: CC-BY-NC-SA

Greek

This is a lexicon for psycholinguistic research. The resource is available for download from clarin:el.

For a related publication, see Protopapas et al. (2010).

Download

Database of Modern Icelandic Inflections

Size: 278,994 entries

Linguistic information: MSD-tags

Licence: other

Icelandic

This is a morphological lexicon available for download and online browsing through CLARIN-IS.

For a related publication, see Bjarnadóttir (2012).

Browse

Download

Italian Content Words v3

Size: 2,342,120 items

Licence: CC-BY-NC-SA 4.0

Italian

This is a morphological lexicon. The resource is available for download from LINDAT.

Download

Italian Function Words v3

Size: 3,510 entries

Licence: CC-BY-NC-SA 4.0

Italian

This is a morphological lexicon. The resource is available for download from LINDAT.

Download

OpeNER Sentiment Lexicon Italian - LMF

Size: 24,293 entries

Linguistic information: positive/negative/neutral polarity

Licence: CC-BY 4.0

Italian

This is a sentiment lexicon available for download from ILC4CLARIN.

Download

PAROLE-SIMPLE-CLIPS

Size: 37,406 syntactic units

Licence: CC-BY-SA 4.0

Italian

This is a morphological lexicon available for download from LC4CLARIN.

Download

Maltese Speech Engine Lexicon

Size: 39,242 entries

Linguistic information: PoS-tags, orthographic transcription, phonetic forms, syllables, stress position

Licence: MS-BY-NC-SA

Maltese

This is a speech lexicon that is useful for building speech-to-text systems. It is available for download from CLARIN PORTULAN.

Download

Emotional Annotations Dictionary

Size: 178,514 elements

Licence: CC-BY 4.0

Polish

This is a lexicon with emotional annotation extracted from Polish Wordnet. The resource is available for download from the CLARIN-PL repository.

Download

Extended dictionary of named entities NELexicon connected with Linked Open Data

Size: 103,585 entries

Licence: GNU LGPL 3.0

Polish

This lexicon contains Polish named entities connected with terminology from available resources within Linked Open Data (e.g. WordNet, DBPedia, Wikipedia, etc.). The resource is available for download from the CLARIN-PL repository.

Download

MWELexicon 1.1

Size: 56,500 lexical units

Linguistic information: “syntactic behaviour”

Licence: plWordNet

Polish

This is a lexicon of multiword expressions available for download from CLARIN.PL.

Download

Walenty (2018-06-29)

Size: 18,236 entries

Licence: CC BY SA 4.0

Polish

This is a lexicon of verb valency that is available for download from the CLARIN-PL repository.

Download

LEX-MWE-PT: Word Combination in Portuguese Language

Size: 1,198 entries

12,753 multi word unit

Linguistic information: lemmas

Licence: MS NC-NoReD-ND

Portuguese

This is a lexicon of multiword expressions. The resource is unavailable for download or online browsing, but can be accessed by contacting the resource manager.

 

LX-Abbreviations

Size: 208 words

Linguistic information: MSD-tags

Licence: MS NC-NoReD-ND

Portuguese

This is a lexicon of abbreviations. It is unavailable for download or online browsing, but can be accessed by contacting the resource manager.

 

LX-DSemVectors

Size: 17,572 words

Linguistic information: word embeddings

Licence: MS NC-NoReD-ND

Portuguese

This lexicon provides distributional semantic representations of Portuguese words. The dataset is available for download from GitHub.

Download

LX-Rare Word Similarity Dataset

Size: 2,034 words

Linguistic information: synonyms

Licence: MS NC-NoReD-ND

Portuguese

This is a word-similarity lexicon that is unavailable for download.

 

LX-SimLex-999

Size: 1,998 words

Linguistic information: MSD-tags, linguistic standardness

Licence: MS NC-NoReD-ND

Portuguese

This is a word-similarity lexicon that is unavailable for download or online browsing, but can be accessed by contacting the resource manager.

 

LX-StopWords

Size: 2,631 words

Linguistic information: MSD-tags, MWEs

Licence: MS NC-NoReD-ND

Portuguese

This is a manually compiled exhaustive list of closed-class words in Portuguese. The resource is unavailable for download or online browsing, but can be accessed by contacting the resource manager.

 

LX-WordSim-353

Size: 706 words

Linguistic information: synonyms, antonyms, identical, hypernym-hyponym, sibling terms, meronym-holonym

Licence: MS NC-NoReD-ND

Portuguese

This is a word-similarity lexicon that is unavailable for download or online browsing, but can be accessed by contacting the resource manager.

 

Multifunctional Computational Lexicon of Contemporary Portuguese

Size: 26,443 entries

Linguistic information: lemmas, MWEs, PoS-tags

Licence: CC-BY - SA

Portuguese

This is a frequency lexicon suitable for NLP specific purposes (information extraction, lemmatization, PoS tagging). The resource is available for download from META-SHARE (CLARIN PORTULAN distribution).

Download

PAROLE Portuguese Lexicon

Size: 20,000 entries

Linguistic information: MSD tags, lemma

Licence: ELRA EVALUATION

Portuguese

This is a morphosyntactic lexicon available for download from CLARIN PORTULAN

Download

Porlex

Size: 27,374 words

Linguistic information: orthographic and phonological/phonetic transcriptions, phonetic, MSD-tags, and frequency informationLicence: MS NC-NoReD-ND

Portuguese

This is a lexicon that provides psycholinguistic and cognitive information that is useful to select stimulus materials for experiments and/or training vocabularies. The resource is available for download from CLARIN PORTULAN.

Download

Simple Portuguese Lexicon

Size: 10,438 entries

Linguistic information: qualia structure, semantic relations (hyponymy, synonymy, etc.)

Licence: MS-BY-NC-SA

Portuguese

This semantic lexicon is available for download from CLARIN PORTULAN.

Download

Automatically constructed multiword lexicon srMWELex v0.5

Size: 22,290 entries

Linguistic information: MWEs

Licence: CC-BY 4.0

Serbian

This is a lexicon of multiword expressions available for download from CLARIN.SI.

Download

Inflectional lexicon srLex 1.3

Size: 6,905,941 items; 169,328 entries

Linguistic information: wordform, lemma, MSD

Licence: CC-BY 4.0

Serbian
This is a large inflectional lexicon where each entry consists of a (wordform, lemma, MSD, MSD features, UPOS, morphological features, frequency, per-million frequency) 8-tuple. The (wordform, lemma, MSD) triple frequencies are calculated on the hrWaC v2.2 corpus. The MSD tagset follows the MULTEXT-East V6 tagset for the Serbo-Croatian macro-language. The UPOS and morphological features follow the UD v2 specifications.
 
The resource is available for download from CLARIN.SI
 
For a relevant publication, see Ljubešić et al. (2016).
Download

Word embeddings CLARIN.SI-embed.sr 1.0

Size: 1,480,566 entries

Linguistic information: PoS-tags, lemmas

Licence: CC-BY 4.0

Serbian

The lexicon contains word embeddings from the srWaC web corpus. The resource is available for download from CLARIN.SI.

Download

Automatically constructed multiword lexicon slMWELex v0.5

Size: 47,579 entries

Linguistic information: MWEs

Licence: CC-BY 4.0

Slovenian

This is a lexicon of multiword expressions available for download from CLARIN.SI.

Download

Automatically stress labelled morphological lexicon Sloleks 1.2, version 1.1

Size: 100,805 entries; 2,774,745 words

Linguistic information: wordforms, PoS-tags, lemmas, frequency, prosody

Licence: CC-BY-NC-SA 4.0

Slovenian

This is an extended version of the morphological lexicon Sloleks 1.2 with added information about the stress of each word form. The resource is available for download from CLARIN.SI.

For a related publication, see Krsnik and Robnik Šikonja (2017).

 

Download

Beseda Corpus Lemmatisation Lexicon

Size: 3,228,127 entries

Linguistic information: wordforms, PoS-tags, lemmas, frequency

Licence: CC-BY 4.0

Slovenian

This lexicon contains inflected open class words from the Dictionary of Standard Slovenian that are augmented by wordforms, their part of speech tags and their lemmas used during the PoS tagging and lemmatization of the Beseda corpus. The resource is available for download from CLARIN.SI and for online browsing.

Browse

Download

Collocation lexicon of Slovene academic discourse Aleks

Size: 463 entries

Linguistic information: collocations

Licence: CC-BY 4.0

Slovenian
This is a lexicon of entries typical for general Slovene academic discourse. The entries include typical context examples (collocations and examples of use) taken from KAS, a corpus of Slovene academic texts (see also the Academic corpora resource family), i.e. a morphosyntactically tagged synchronous and monolingual corpus, containing more than 1.5 billion words.
 
The resource is available for download from CLARIN.SI
Download

Lexicon of historical Slovene imp25k 1.1

Size: 28,034 entries

Linguistic information: MSD-tags, lemmas, etymological glosses

Licence: CC-BY 4.0

Slovenian

This is a morphological lexicon available for download from CLARIN.SI and for online browsing through a dedicated environment.

For a related publication, see Erjavec (2015).

Browse

Download

Morphological lexicon Sloleks 2.0

Size: 100,805 entries

Linguistic information: wordforms, PoS-tags, lemmas, frequency, phonology

Licence: CC-BY-NC-SA 4.0

Slovenian

This is a reference morphological lexicon of the Slovenian language developed to be used in NLP applications and language manuals. The resource is available for download from CLARIN.SI and for online browsing.

For a related publication, see Dobrovoljc et al. (2017).

Browse

Download

Slovene sentiment lexicon JOB 1.0

Size: 25,524 entries

Linguistic information: sentiment tags

Licence: CC-BY-S15A 4.0

Slovenian

This is a lexicon of sentiment labels available for download from the CLARIN.SI repository.

For a related publication, see Bučar et al. (2018).

Download

Slovene sentiment lexicon KSS 1.1

Size: 90,620 lexica

Linguistic information: lemmas, sentiment tags

Licence: CC-BY 4.0

Slovenian

This is a lexicon of sentiment labels available for download from the CLARIN.SI repository.

Download

Word embeddings CLARIN.SI-embed.sl 1.0

Size: 4,560,444 entries

Linguistic information: PoS-tags, lemmas

Licence: CC-BY 4.0

Slovenian

This is a lexicon of word embeddings that is available for download from CLARIN.SI.

Download

Old Swedish morphology (2017-10-16)

Size: 41,958 entries

Licence: CC-BY 4.0

Swedish

This is a glossary of Old Swedish that is available for download from the SWE-CLARIN repository and can be queried online through KARP.

Browse

Download

Parole+ (2017-10-16)

Size: 24,523 entries

Licence: CC-BY 4.0

Swedish

This is a lexicon for language technologies which offers access to syntactic information and is connected to SALDO senses. The resource can be download from the SWE-CLARIN repository and can be queried online through KARP.

Browse

Download

SALDO's morphology (2017-10-16)

Size: 128,036 entries

Licence: CC-BY 4.0

Swedish

This is a semantic and morphological lexicon for language technologies. The resource can be download from the SWE-CLARIN repository and can be queried online through KARP.

Browse

Download

Simple lexicon

Size: 11,624 entries

Licence: CC-BY 4.0

Swedish

This is a semantic lexicon that is available for download from the SWE-CLARIN repository and can be queried online through KARP.

Browse

Download

Multilingual resources

Resource

Language Description Availability

Concreteness and imageability lexicon MEGA.HR-Crossling

Size: 7,237,589 entries

Linguistic information: concreteness prediction, imageability prediction

Licence: CC-BY-SA 4.0

77 languages

These lexica contain concreteness and imageability predictions for 77 languages. They are available for download from CLARIN.SI.

For a related publication, see  Ljubešić et al. (2018).

Download

Emoji Sentiment Ranking 1.0

Size: 751 entries (emojis)

Linguistic information: sentiment labels

Licence: CC-BY-SA 4.0

Albanian, Bulgarian, English, German, Hungarian, Polish, Portuguese, Russian, Serbo-Croatian, Slovak, Slovenian, Spanish, Swedish

This is a lexicon of emojis available for download from CLARIN.SI and for online browsing through a dedicated environment.

For a related publication, see Kralj Novak et al. (2015).

Browse

Download

OMBI Dutch-Arabic

Size: 37,000 entries

Licence: other

Arabic, Dutch

This is a bilingual lexicon that is suitable for language technology applications such as automatic translation, e-learning, multilingual information retrieval, etc. The resource is available for download from the Dutch Language Institute (INT).

Download

MULTEXT-East free lexicons 4.0

Size: 3,665,864 entries

Linguistic information: MSD-tags, lemmas

Licence: CC-BY-SA 4.0

Bulgarian, Czech, English, Estonian, French, Hungarian, Romanian, Slovak, Slovenian, Ukrainian

These are morphological lexica available for download from the CLARIN.SI repository.

For a related publication, see Erjavec (2011).

Download

CzEngClass 0.2

Size: 200 classes, 3,525 entries

Linguistic information: valency and synonymy

Licence: CC-BY-NC-SA 4.0

Czech, English

This is a valency lexicon linked to PDT-Vallex, EngVallex and external resources, such as FrameNet, VerbNet, WordNet, etc. The resource is available for download and online browsing through LINDAT.

Browse

Download

CzEngVallex

Size: 20,835 pairs (verb senses)

Linguistic information: verb valency

Licence: CC-BY-NC-SA 4.0

Czech, English

This is a valency lexicon linked to the parallel PCEDT corpus. The resource is available for download and online browsing through LINDAT.

For a related publication, see Fučíková et al. (2016).

Browse

Download

 

Size: 14,182 entries

Linguistic information: word sentiment

Licence: CC-BY-NC-SA 4.0

Croatian, Dutch, Slovenian
This lexicon contains manual translations of the NRC Emotion Lexicon, which encodes the sentiment of a word (positive, negative) and its emotion association (anger, anticipation, disgust, fear, joy, sadness, surprise, trust) for Croatian, Dutch and Slovene with a binary schema. Manual translations were produced by inspecting and correcting the automatic translations from English provided with the original lexicon. While translations to all 14,182 entries are provided for Slovene and Croatian, only translations for the 6,468 entries that have any sentiment or emotion associated with the word are given for Dutch.
 
The resource is available for download from the CLARIN.SI repository.
Download

OMBI Arabic-Dutch

Size: 37,000 entries

Licence: other

Dutch, Arabic

This is a bilingual lexicon for language technology applications such as automatic translation, e-learning, multilingual information retrieval, etc. The resource is available for download from the Dutch Language Institute (INT).

Download

OMBI Dutch-Danish

Size: 46,000 entries

Licence: other

Dutch, Danish

This is a bilingual lexicon for language technology applications such as automatic translation, e-learning, multilingual information retrieval, etc. The resource is available for download from the Dutch Language Institute (INT).

Download

OMBI Dutch-Indonesian

Size: 50,000 entries

Licence: other

Dutch, Indonesian

This is a bilingual lexicon for language technology applications such as automatic translation, e-learning, multilingual information retrieval, etc. The resource is available for download from the Dutch Language Institute (INT).

Download

QTLeap specialized lexicons

Size: 231,516 entries

Licence: CC-BY

English, Spanish, Castilian, Bulgarian, Basque, Dutch, Flemish, Czech, Portuguese

This lexicon is used for the automatic translation of specific IT domain expressions and is available for download from CLARIN PORTULAN.

Download

MULTEXT-East non-commercial lexicons 4.0

Size: 2,288,228 entries

Linguistic information: MSD-tags, lemmas

Licence: CC-BY-NC 4.0

Macedonian, Persian, Polish, Russian, Serbian

These are morphological lexica available for download from the CLARIN.SI repository.

Download

A machine-readable Persian-English dictionary

Size: 1,892 entries

Licence: CC-BY-NC-SA 3.0

Linguistic information: morphological information, usage examples

Persian-English

This bilingual lexicon has been compiled for comparative as well as didactic purposes in the on-going VICAV project. The resource is available for download from ARCHE.

Download

A machine-readable Persian-English glossary of verbs

Size: 429 entries

Linguistic information: basic morphological information

Licence: CC-BY-NC-SA 3.0

Persian-English

This lexicon of single-word verbs in Modern Persian is available for download from ARCHE.

Download

Publications

[Bjarnadóttir 2012] Kristín Bjarnadóttir. 2012. The Database of Modern Icelandic Inflection (Beygingarlýsing íslensks nútímamáls).

[Bučar et al. 2018] Jože Bučar, Martin Žnidaršič, and Janez Povh. 2018. Annotated news corpora and a lexicon for sentiment analysis in Slovene.

[Erjavec 2011] Tomaž Erjavec. 2011. MULTEXT-East: morphosyntactic resources for Central and Eastern European languages.

[Erjavec 2015] Tomaž Erjavec. 2015. The IMP historical Slovene language resources.

[Fučíková et al. 2016] Fučíková Eva, Hajič Jan, and Urešová Zdeňka. 2016. Joint search in a bilingual valency lexicon and an annotated corpus.

[Dobrovoljc et al. 2017] Kaja Dobrovoljc, Simon Krek, and Tomaž Erjavec. 2017. The Sloleks Morphological Lexicon and its Future Development.

[Kralj Novak et al. 2015] Petra Kralj Novak  Jasmina Smailović, Borut Sluban, and Igor Mozetič. 2015. Sentiment of Emojis.

[Krsnik and Robnik Šikonja 2017] Luka Krsnik and Marko Robnik Šikonja. 2017. Napovedovanje naglasa slovenskih besed z metodami strojnega učenja.

[Ljubešić et al. 2015]  Nikola Ljubešić, Kaja Dobrovoljc, and Darja Fišer. 2015. MWELEX – MWE LEXICA OF CROATIAN, SLOVENE AND SERBIAN EXTRACTED FROM PARSED CORPORA.

[Ljubešić et al. 2016] Nikola Ljubešić, Filip Klubička, Željko Agić, and Ivo-Pavao Jazbec. 2016. New Inflectional Lexicons and Training Corpora for Improved Morphosyntactic Annotation of Croatian and Serbian.

[Ljubešić et al. 2018] Nikola Ljubešić, Darja Fišer,  and Anita Peti-Stantić. 2018. Predicting Concreteness and Imageability of Words Within and Across Languages via Word Embeddings.

[Lopatková et al. 2017] Markéta Lopatková et al. 2017. Valenční slovník českých sloves VALLEX.

[Protopapas et al. 2010] Athanassios Protopapas, Marina Tzakosta, Aimilios Chalamandaris, and Pirros Tsiakoulis. 2010. IPLR: an online resource for Greek word-level and sublexical information.

[Urešová 2011] Zdeňka Urešová. 2011. Valenční slovník Pražského závislostního korpusu (PDT-Vallex). 

[Úlfarsdóttir 2014] Thórdís Úlfarsdóttir. 2014. ISLEX – a Multilingual Web Dictionary.