Lexica

Introduction

Lexica are primarily used in applications. They typically contain an extensive lexical inventory with specific linguistic information (e.g., morphosyntax, sentiment). There are 80 lexica in the CLARIN infrastructure. Most (65) of the lexica are monolingual, accounting for 17 languages (Arabic, Croatian, Czech, Danish, Dutch, English, Estonian, French, Icelandic, Italian, Greek, Maltese, Polish, Portuguese, Serbian, Slovenian, and Swedish). The rest (15) are multilingual and include a variety of language combinations. In the vast majority of the cases, the lexica can be directly downloaded from the national repositories or queried through easy-to-use online search environments.

For comments, changes of the existing content or inclusion of new resources, send us an email.

This website was last updated on 3 October 2022.

Lexica in the CLARIN infrastructure

Monolingual resources

Corpus Language Description Availability

A machine-readable dictionary of Egyptian Arabic

Size: 2,418 entries

Annotation: basic morphological information, usage examples

Licence: CC-BY-NC-SA 3.0

Arabic (Egyptian)

This lexicon presents a more comprehensive version of A machine-readable glossary of Egyptian Arabic. The resource is available for download from ARCHE.

Download

A machine-readable glossary of Egyptian Arabic

Size: 2,204 entries

Annotation: basic morphological information, usage examples

Licence: CC-BY-NC-SA 3.0

Arabic (Egyptian)

This lexicon has been compiled for comparative as well as didactic purposes in the on-going VICAV project. The resource is available for download from ARCHE.

Download

Automatically constructed multiword lexicon hrMWELex v0.5

Size: 43,730 entries

Annotation: multi-word expressions

Licence: CC-BY 4.0

Croatian

This is a lexicon of multiword expressions available for download from CLARIN.SI.

For the relevant publication, see LjubeŇ°ińá et al. (2015)

Download

Inflectional lexicon hrLex 1.3

Size: 6,427,709 items, 164,206 entries

Annotation: wordform, lemma, MSD, UPOS

Licence: CC-BY 4.0

Croatian

This is a large inflectional lexicon where each entry consists of a (wordform, lemma, MSD, MSD features, UPOS, morphological features, frequency, per-million frequency) 8-tuple. The (wordform, lemma, MSD) triple frequencies are calculated on the hrWaC v2.2 corpus. The MSD tagset follows the MULTEXT-East V6 tagset for the Serbo-Croatian macro-language. The UPOS and morphological features follow the UD v2 specifications.

The resource is available for download from CLARIN.SI

For the relevant publication, see LjubeŇ°ińá et al. (2016)

Download

Word embeddings CLARIN.SI-embed.hr 1.0

Size: 3,147,352 entries

Annotation: PoS-tags, lemmas

Licence: CC-BY 4.0

Croatian

This lexicon contains word embeddings extracted from the Croatian web corpus hrWaC and a 400-million-token-heavy collection of newspaper texts. The resource is available for download from CLARIN.SI.

Download

DeriNet 1.6

Size: 1,027,832 entries

Licence: CC-BY-NC-SA 3.0

Czech

This is a lexicon of derivational relations (both compounding and inflections). The resource is available for download and online browsing through LINDAT.

Browse

Download

MorfFlex CZ

Size: 124,259,099 lexical types

Annotation: MSD-tags, derivational, semantic, NER information

Licence: CC-BY-NC-SA 3.0

Czech

This is a morphological lexicon available for download from LINDAT.

Download

ParaDi 2.0

Size: 1,621 entries

Annotation: MSD-tags, syntactic/semantic features

Licence: CC-BY 4.0

Czech

This is a lexicon of single-word paraphrases of Czech verbal multiword expressions. The resource is available for download from LINDAT.

Download

PDT-Vallex

Size: 7,121 entries, 11,933 frames

Annotation: verb, adjective and noun valency

Licence: CC-BY-NC-SA 4.0

Czech

This is a valency lexicon linked to several Czech corpora (PDT, PCEDT Cz side, PDTSC, Faust). The resource is available for download and online browsing through LINDAT.

For the relevant publication, see UreŇ°ov√° (2011)

Browse

Download

VALLEX 3.0

Size: 2,722 entries, 6,711 units, 6,711 frames, 4,586 words

Annotation: verb senses (characterized by glosses and examples)

Licence: CC-BY-NC-SA 4.0

Czech

This is a valency lexicon available for download and online browsing through LINDAT.

For the relevant publication, see Lopatkov√° et al. (2017)

Browse

Download

STO morphology (v2) - LMF format

Size: 87,209 entries

Licence: CC BY-SA 4.0

Danish

This morphological lexicon is available for download from the CLARIN-DK repository. It is also available in the .csvformat.

Download

STO syntax (v2) - LMF format

Size: 84,159 entries

Licence: CC BY-SA 4.0

Danish

This syntactic lexicon is available for download from the CLARIN-DK repository.

Download

Basilex Lexicon

 

Dutch

This is a lexicon that comprises all lemmas from the Basilex Corpus. The Basilex Corpus is an annotated collection of texts written for children in elementary school. The resource is available for download from the Dutch Language Institute (INT).

Download

Basiscript Lexicon

Licence: other

Dutch

This is a lexicon that comprises all lemmas from the Basiscript Corpus. The Basiscript Corpus is an annotated collection of texts written by children in elementary school. The resource is available for download from the Dutch Language Institute (INT).

Download

CombiLex

Size: 213,000 lemmas

Annotation: lemmas and word forms

Licence: other

Dutch

This is a lexicon of words and word forms available for download from the Dutch Language Institute (INT).

Download

e-Lex

Size: 220,000 entries, 600,000 word forms, 77,000 multi-word expressions, 26,000 multi-word lemmas

Annotation: MSD-tags, syntactical and phonological information, partially semantically annotated

Licence: other

Dutch

This is a lexical database that consists of a one-word lexicon and a multi-word lexicon.

This lexicon is available for download from the Dutch Language Institute (INT).

Download

Dutch Electronic Lexicon of Multiword Expressions

Size: 5,000 expressions

Licence: other

Dutch

This is a lexicon of multiword expressions available for download from the Dutch Language Institute (INT).

Download

PAROLE Lexicon

Size: 20,000 entries

Annotation: MSD-tags and syntactic complementation patterns

Licence: other

Dutch

This morphosyntactic lexicon is available for download from the Dutch Language Institute (INT).

Download

Reference Lexicon for Belgian-Dutch (RBBN)

Size: 4,000 words and expressions-

Licence: other

Dutch

This lexicon, which contains words and expressions typically of Dutch spoken in Belgium, is available for download from the Dutch Language Institute (INT).

Download

Reference Lexicon for Dutch

Size: 50,000 lemmas

Annotation: dialectical information

Licence: other

Dutch

This is a corpus-based monolingual lexicon available for download the Dutch Language Institute (INT).

Download

BioLexicon

Size: over 2.2 million entries (over 3.3 million semantic relations)

Licence: ELRA END USER

English

This is a large-scale, wide-coverage computational lexicon covering the biomedical domain. The resource is unavailable for download or online browsing, but can be accessed by contacting the resource manager.

 

EngVallex

Size: 4,337 entries, 7,148 frames

Annotation: verb valency

Licence: CC-BY-NC-SA 4.0

English

This is a valency lexicon linked to the English side of the PCEDT corpus (WSJ corpus). The resource is available for download from LINDAT and for online browsing.

Browse

Download

The Database of Estonian Multi-Word Expressions

Size: 12,500 words

Licence: proprietary

Estonian

This is a collection of lexica that contain multi-word expressions consisting of a verb and a particle or a verb and its complements. The resource is available for download from META-SHARE (CELR distribution) and for online browsing through a dedicated website.

Browse

Download

Démonette

Size: 96,027 entries

Annotation: MSD-tags (grace format), semantic types

Licence: CC-BY 4.0

French

This is a morphological lexicon available for download from ORTOLANG.

Download

Dicovalence

Size: 8,000 entries

Annotation: c- and s-selectional restrictions

Licence: Licence Publique Générale Amoindrie GNU

French

This is a verb-valency lexicon.

The lexicon specifies certain selectional restrictions, possible term manifestations (pronominal, phrasal), and whether the valency frames can be used in various passive constructions, as well as references to other valency frames for the same infinitive. The resource is available for download from ORTOLANG.

Download

MarsaLex

Size: 595,000,000 inflected forms

Licence: CC-BY 4.0

French

This is a morphological lexicon available for download from ORTOLANG.

Download

Morphalou

Size: 159,261 entries

Annotation: spelling, phonetics, mood, tense, MSD-tags, spelling variant, feminine variation, pronominal

Licence: Publique Générale Amoindrie GNU

French

This is a morphological lexicon available for download from ORTOLANG.

Download

VfrLPL

Size: 8,800 entries

Annotation: conjugation forms, phonetic forms, use frequencies

Licence: restricted

French

This is a morphosyntactic lexicon available for download from ORTOLANG.

Download

ILSP PsychoLinguistic Resource

Size: 217,664 entries

Annotation: phonetic transcription, frequency of usage

Licence: CC-BY-NC-SA

Greek

This is a lexicon for psycholinguistic research. The resource is available for download from clarin:el.

For the relevant publication, see Protopapas et al. (2010)

Download

Database of Modern Icelandic Inflections

Size: 305,000 lemmas; 6.5 million inflectional forms; 48,000 non-standard word forms

Annotation: MSD-tags

Licence: CC BY-SA 4.0

Icelandic

This is a morphological lexicon created for use in language technology (LT), as a reference for the general public in Iceland, and for use in research on the Icelandic language. The term Modern Icelandic here refers to contemporary Icelandic, i.e. late 20th and 21st century usage.

The lexicon is available for download and online browsing through CLARIN-IS.

For the relevant publication, see Bjarnadóttir (2012)

Browse

Download

Italian Content Words v3

Size: 2,342,120 items

Licence: CC-BY-NC-SA 4.0

Italian

This is a morphological lexicon. The resource is available for download from LINDAT.

Download

Italian Function Words v3

Size: 3,510 entries

Licence: CC-BY-NC-SA 4.0

Italian

This is a morphological lexicon. The resource is available for download from LINDAT.

Download

OpeNER Sentiment Lexicon Italian - LMF

Size: 24,293 entries

Annotation: positive/negative/neutral polarity

Licence: CC-BY 4.0

Italian

This is a sentiment lexicon available for download from ILC4CLARIN.

Download

PAROLE-SIMPLE-CLIPS

Size: 37,406 syntactic units

Licence: CC-BY-SA 4.0

Italian

This is a morphological lexicon available for download from LC4CLARIN.

Download

Maltese Speech Engine Lexicon

Size: 39,242 entries

Annotation: PoS-tags, orthographic transcription, phonetic forms, syllables, stress position

Licence: MS-BY-NC-SA

Maltese

This is a speech lexicon that is useful for building speech-to-text systems. It is available for download from CLARIN PORTULAN.

Download

Emotional Annotations Dictionary

Size: 178,514 elements

Licence: CC-BY 4.0

Polish

This is a lexicon with emotional annotation extracted from Polish Wordnet. The resource is available for download from the CLARIN-PL repository.

Download

Extended dictionary of named entities NELexicon connected with Linked Open Data

Size: 103,585 entries

Licence: GNU LGPL 3.0

Polish

This lexicon contains Polish named entities connected with terminology from available resources within Linked Open Data (e.g. WordNet, DBPedia, Wikipedia, etc.). The resource is available for download from the CLARIN-PL repository.

Download

MWELexicon 1.1

Size: 56,500 lexical units

Annotation: syntactic behaviour

Licence: plWordNet

Polish

This is a lexicon of multiword expressions available for download from CLARIN.PL.

Download

Walenty (2018-06-29)

Size: 18,236 entries

Licence: CC BY SA 4.0

Polish

This is a lexicon of verb valency that is available for download from the CLARIN-PL repository.

Download

LEX-MWE-PT: Word Combination in Portuguese Language

Size: 1,198 entries, 12,753 multi word unit

Annotation: lemmas

Licence: MS NC-NoReD-ND

Portuguese

This is a lexicon of multiword expressions. The resource is available for download from CLARIN PORTULAN.

Download

LX-Abbreviations

Size: 208 words

Annotation: MSD-tags

Licence: MS NC-NoReD-ND

Portuguese

This is a lexicon of abbreviations. The resource is available for download from CLARIN PORTULAN.

Download

LX-DSemVectors

Size: 17,572 words

Annotation: word embeddings

Licence: MS NC-NoReD-ND

Portuguese

This lexicon provides distributional semantic representations of Portuguese words. The dataset is available for download from GitHub.

Download

LX-Rare Word Similarity Dataset

Size: 2,034 words

Annotation: synonyms

Licence: MS NC-NoReD-ND

Portuguese

This is a word-similarity lexicon available for download from CLARIN PORTULAN.

Download

LX-SimLex-999

Size: 1,998 words

Annotation: MSD-tags, linguistic standardness

Licence: MS NC-NoReD-ND

Portuguese

This is a word-similarity lexicon. The resource is available for download from CLARIN PORTULAN.

Download

LX-StopWords

Size: 2,631 words

Annotation: MSD-tags, MWEs

Licence: MS NC-NoReD-ND

Portuguese

This is a manually compiled exhaustive list of closed-class words in Portuguese. The resource is available for download from CLARIN PORTULAN.

Download

LX-WordSim-353

Size: 706 words

Annotation: synonyms, antonyms, identical, hypernym-hyponym, sibling terms, meronym-holonym

Licence: MS NC-NoReD-ND

Portuguese

This is a word-similarity lexicon. The resource is available for download from CLARIN PORTULAN.

Download

Multifunctional Computational Lexicon of Contemporary Portuguese

Size: 26,443 entries

Annotation: lemmas, MWEs, PoS-tags

Licence: CC-BY - SA

Portuguese

This is a frequency lexicon suitable for NLP specific purposes (information extraction, lemmatization, PoS tagging). The resource is available for download from (CLARIN PORTULAN distribution).

Download

PAROLE Portuguese Lexicon

Size: 20,000 entries

Annotation: MSD tags, lemma

Licence: ELRA EVALUATION

Portuguese

This is a morphosyntactic lexicon available for download from CLARIN PORTULAN

Download

Porlex

Size: 27,374 words

Annotation: orthographic and phonological/phonetic transcriptions, phonetic, MSD-tags, and frequency information

Licence: MS NC-NoReD-ND

Portuguese

This is a lexicon that provides psycholinguistic and cognitive information that is useful to select stimulus materials for experiments and/or training vocabularies. The resource is available for download from CLARIN PORTULAN.

Download

Simple Portuguese Lexicon

Size: 10,438 entries

Annotation: qualia structure, semantic relations (hyponymy, synonymy, etc.)

Licence: MS-BY-NC-SA

Portuguese

This semantic lexicon is available for download from CLARIN PORTULAN.

Download

Automatically constructed multiword lexicon srMWELex v0.5

Size: 22,290 entries

Annotation: MWEs

Licence: CC-BY 4.0

Serbian

This is a lexicon of multiword expressions available for download from CLARIN.SI.

Download

Inflectional lexicon srLex 1.3

Size: 6,905,941 items, 169,328 entries

Annotation: wordform, lemma, MSD

Licence: CC-BY 4.0

Serbian

This is a large inflectional lexicon where each entry consists of a (wordform, lemma, MSD, MSD features, UPOS, morphological features, frequency, per-million frequency) 8-tuple. The (wordform, lemma, MSD) triple frequencies are calculated on the hrWaC v2.2 corpus. The MSD tagset follows the MULTEXT-East V6 tagset for the Serbo-Croatian macro-language. The UPOS and morphological features follow the UD v2 specifications.

The resource is available for download from CLARIN.SI

For the relevant publication, see LjubeŇ°ińá et al. (2016)

Download

Word embeddings CLARIN.SI-embed.sr 1.0

Size: 1,480,566 entries

Annotation: PoS-tags, lemmas

Licence: CC-BY 4.0

Serbian

This lexicon contains word embeddings from the srWaC web corpus. The resource is available for download from CLARIN.SI.

Download

Automatically constructed multiword lexicon slMWELex v0.5

Size: 47,579 entries

Annotation: MWEs

Licence: CC-BY 4.0

Slovenian

This is a lexicon of multiword expressions available for download from CLARIN.SI.

Download

Automatically stress labelled morphological lexicon Sloleks 1.2, version 1.1

Size: 100,805 entries, 2,774,745 words

Annotation: wordforms, PoS-tags, lemmas, frequency, prosody

Licence: CC-BY-NC-SA 4.0

Slovenian

This is an extended version of the morphological lexicon Sloleks 1.2 with added information about the stress of each word form. The resource is available for download from CLARIN.SI.

For the relevant publication, see Krsnik and Robnik ҆ikonja (2017)

Download

Beseda Corpus Lemmatisation Lexicon

Size: 3,228,127 entries

Annotation: wordforms, PoS-tags, lemmas, frequency

Licence: CC-BY 4.0

Slovenian

This lexicon contains inflected open class words from the Dictionary of Standard Slovenian that are augmented by wordforms, their part of speech tags and their lemmas used during the PoS tagging and lemmatization of the Beseda corpus. The resource is available for download from CLARIN.SI and for online browsing.

Browse

Download

Collocation lexicon of Slovene academic discourse Aleks

Size: 463 entries

Annotation: collocations

Licence: CC-BY 4.0

Slovenian

This is a lexicon of entries typical for general Slovene academic discourse. The entries include typical context examples (collocations and examples of use) taken from KAS, a corpus of Slovene academic texts (see also the Academic corpora resource family), i.e. a morphosyntactically tagged synchronous and monolingual corpus, containing more than 1.5 billion words.

The resource is available for download from CLARIN.SI

Download

Lexicon of historical Slovene imp25k 1.1

Size: 28,034 entries

Annotation: MSD-tags, lemmas, etymological glosses

Licence: CC-BY 4.0

Slovenian

This is a morphological lexicon available for download from CLARIN.SI and for online browsing through a dedicated environment.

For the relevant publication, see Erjavec (2015)

Browse

Download

Morphological lexicon Sloleks 2.0

Size: 100,805 entries

Annotation: wordforms, PoS-tags, lemmas, frequency, phonology

Licence: CC-BY-NC-SA 4.0

Slovenian

This is a reference morphological lexicon of the Slovenian language developed to be used in NLP applications and language manuals. The resource is available for download from CLARIN.SI and for online browsing.

For the relevant publication, see Dobrovoljc et al. (2017)

Browse

Download

Slovene sentiment lexicon JOB 1.0

Size: 25,524 entries

Annotation: sentiment tags

Licence: CC-BY-S15A 4.0

Slovenian

This is a lexicon of sentiment labels available for download from the CLARIN.SI repository.

For the relevant publication, see Buńćar et al. (2018)

Download

Slovene sentiment lexicon KSS 1.1

Size: 90,620 lexica

Annotation: lemmas, sentiment tags

Licence: CC-BY 4.0

Slovenian

This is a lexicon of sentiment labels available for download from the CLARIN.SI repository.

Download

Word embeddings CLARIN.SI-embed.sl 1.0

Size: 4,560,444 entries

Annotation: PoS-tags, lemmas

Licence: CC-BY 4.0

Slovenian

This is a lexicon of word embeddings that is available for download from CLARIN.SI.

Download

Old Swedish morphology (2017-10-16)

Size: 41,958 entries

Licence: CC-BY 4.0

Swedish

This is a glossary of Old Swedish that is available for download from the SWE-CLARIN repository and can be queried online through KARP.

Browse

Download

Parole+ (2017-10-16)

Size: 24,523 entries

Licence: CC-BY 4.0

Swedish

This is a lexicon for language technologies which offers access to syntactic information and is connected to SALDO senses. The resource can be download from the SWE-CLARIN repository and can be queried online through KARP.

Browse

Download

SALDO's morphology (2017-10-16)

Size: 128,036 entries

Licence: CC-BY 4.0

Swedish

This is a semantic and morphological lexicon for language technologies. The resource can be download from the SWE-CLARIN repository and can be queried online through KARP.

Browse

Download

Simple lexicon

Size: 11,624 entries

Licence: CC-BY 4.0

Swedish

This is a semantic lexicon that is available for download from the SWE-CLARIN repository and can be queried online through KARP.

Browse

Download

Multilingual resources

Corpus Language Description Availability

Concreteness and imageability lexicon MEGA.HR-Crossling

Size: 7,237,589 entries

Annotation: concreteness prediction, imageability prediction

Licence: CC-BY-SA 4.0

77 languages

These lexica contain concreteness and imageability predictions for 77 languages. They are available for download from CLARIN.SI.

For the relevant publication, see LjubeŇ°ińá et al. (2018)

Download

Emoji Sentiment Ranking 1.0

Size: 751 entries (emojis)

Annotation: sentiment labels

Licence: CC-BY-SA 4.0

Albanian, Bulgarian, English, German, Hungarian, Polish, Portuguese, Russian, Serbo-Croatian, Slovak, Slovenian, Spanish, Swedish

This is a lexicon of emojis available for download from CLARIN.SI and for online browsing through a dedicated environment.

For the relevant publication, see Kralj Novak et al. (2015)

Browse

Download

OMBI Dutch-Arabic

Size: 37,000 entries

Licence: other

Arabic, Dutch

This is a bilingual lexicon that is suitable for language technology applications such as automatic translation, e-learning, multilingual information retrieval, etc. The resource is available for download from the Dutch Language Institute (INT).

Download

MULTEXT-East free lexicons 4.0

Size: 3,665,864 entries

Annotation: MSD-tags, lemmas

Licence: CC-BY-SA 4.0

Bulgarian, Czech, English, Estonian, French, Hungarian, Romanian, Slovak, Slovenian, Ukrainian

These are morphological lexica available for download from the CLARIN.SI repository.

For the relevant publication, see Erjavec (2011)

Download

CzEngClass 0.2

Size: 200 classes, 3,525 entries

Annotation: valency and synonymy

Licence: CC-BY-NC-SA 4.0

Czech, English

This is a valency lexicon linked to PDT-Vallex, EngVallex and external resources, such as FrameNet, VerbNet, WordNet, etc. The resource is available for download and online browsing through LINDAT.

Browse

Download

CzEngVallex

Size: 20,835 pairs (verb senses)

Annotation: verb valency

Licence: CC-BY-NC-SA 4.0

Czech, English

This is a valency lexicon linked to the parallel PCEDT corpus. The resource is available for download and online browsing through LINDAT.

For the relevant publication, see Fuńć√≠kov√° et al. (2016)

Browse

Download

The LiLaH Emotion Lexicon of Croatian, Dutch and Slovene

Size: 14,182 entries

Annotation: word sentiment

Licence: CC-BY-NC-SA 4.0

Croatian, Dutch, Slovenian

This lexicon contains manual translations of the NRC Emotion Lexicon, which encodes the sentiment of a word (positive, negative) and its emotion association (anger, anticipation, disgust, fear, joy, sadness, surprise, trust) for Croatian, Dutch and Slovene with a binary schema. Manual translations were produced by inspecting and correcting the automatic translations from English provided with the original lexicon. While translations to all 14,182 entries are provided for Slovene and Croatian, only translations for the 6,468 entries that have any sentiment or emotion associated with the word are given for Dutch.

The resource is available for download from the CLARIN.SI repository.

Download

OMBI Arabic-Dutch

Size: 37,000 entries

Licence: other

Dutch, Arabic

This is a bilingual lexicon for language technology applications such as automatic translation, e-learning, multilingual information retrieval, etc. The resource is available for download from the Dutch Language Institute (INT).

Download

OMBI Dutch-Danish

Size: 46,000 entries

Licence: other

Dutch, Danish

This is a bilingual lexicon for language technology applications such as automatic translation, e-learning, multilingual information retrieval, etc. The resource is available for download from the Dutch Language Institute (INT).

Download

OMBI Dutch-Indonesian

Size: 50,000 entries

Licence: other

Dutch, Indonesian

This is a bilingual lexicon for language technology applications such as automatic translation, e-learning, multilingual information retrieval, etc. The resource is available for download from the Dutch Language Institute (INT).

Download

QTLeap specialized lexicons

Size: 231,516 entries

Licence: CC-BY

English, Spanish, Castilian, Bulgarian, Basque, Dutch, Flemish, Czech, Portuguese

This lexicon is used for the automatic translation of specific IT domain expressions and is available for download from CLARIN PORTULAN.

Download

MULTEXT-East non-commercial lexicons 4.0

Size: 2,288,228 entries

Annotation: MSD-tags, lemmas

Licence: CC-BY-NC 4.0

Macedonian, Persian, Polish, Russian, Serbian

These are morphological lexica available for download from the CLARIN.SI repository.

Download

A machine-readable Persian-English dictionary

Size: 1,892 entries

Annotation: morphological information, usage examples

Licence: CC-BY-NC-SA 3.0

Persian-English

This bilingual lexicon has been compiled for comparative as well as didactic purposes in the on-going VICAV project. The resource is available for download from ARCHE.

Download

A machine-readable Persian-English glossary of verbs

Size: 429 entries

Annotation: basic morphological information

Licence: CC-BY-NC-SA 3.0

Persian-English

This lexicon of single-word verbs in Modern Persian is available for download from ARCHE.

Download

Publications

[Bjarnad√≥ttir 2012]¬†Krist√≠n Bjarnad√≥ttir. 2012.¬†The Database of Modern Icelandic Inflection (Beygingarl√Ĺsing √≠slensks n√ļt√≠mam√°ls).

[Buńćar et al. 2018]¬†JoŇĺe Buńćar, Martin ŇĹnidarŇ°ińć, and Janez Povh. 2018.¬†Annotated news corpora and a lexicon for sentiment analysis in Slovene.

[Erjavec 2011] TomaŇĺ Erjavec. 2011.¬†MULTEXT-East: morphosyntactic resources for Central and Eastern European languages.

[Erjavec 2015] TomaŇĺ Erjavec. 2015.¬†The IMP historical Slovene language resources.

[Fuńć√≠kov√° et al. 2016]¬†Fuńć√≠kov√° Eva, Hajińć Jan, and UreŇ°ov√° ZdeŇąka. 2016.¬†Joint search in a bilingual valency lexicon and an annotated corpus.

[Dobrovoljc et al. 2017]¬†Kaja Dobrovoljc, Simon Krek, and TomaŇĺ Erjavec. 2017.¬†The Sloleks Morphological Lexicon and its Future Development.

[Kralj Novak et al. 2015]¬†Petra Kralj Novak ¬†Jasmina Smailovińá, Borut Sluban, and Igor Mozetińć. 2015.¬†Sentiment of Emojis.

[Krsnik and Robnik ҆ikonja 2017] Luka Krsnik and Marko Robnik ҆ikonja. 2017.¬†Napovedovanje naglasa slovenskih besed z metodami strojnega uńćenja.

[LjubeŇ°ińá et al. 2015]¬†¬†Nikola LjubeŇ°ińá, Kaja Dobrovoljc, and Darja FiŇ°er. 2015.¬†MWELEX ‚Äď MWE LEXICA OF CROATIAN, SLOVENE AND SERBIAN EXTRACTED FROM PARSED CORPORA.

[LjubeŇ°ińá et al. 2016]¬†Nikola LjubeŇ°ińá, Filip Klubińćka, ŇĹeljko Agińá, and Ivo-Pavao Jazbec. 2016.¬†New Inflectional Lexicons and Training Corpora for Improved Morphosyntactic Annotation of Croatian and Serbian.

[LjubeŇ°ińá et al. 2018]¬†Nikola LjubeŇ°ińá, Darja FiŇ°er,¬† and Anita Peti-Stantińá. 2018.¬†Predicting Concreteness and Imageability of Words Within and Across Languages via Word Embeddings.

[Lopatkov√° et al. 2017]¬†Mark√©ta¬†Lopatkov√° et al. 2017.¬†Valenńćn√≠ slovn√≠k ńćesk√Ĺch sloves VALLEX.

[Protopapas et al. 2010] Athanassios Protopapas, Marina Tzakosta, Aimilios Chalamandaris, and Pirros Tsiakoulis. 2010. IPLR: an online resource for Greek word-level and sublexical information.

[UreŇ°ov√° 2011] ZdeŇąka UreŇ°ov√°. 2011.¬†Valenńćn√≠ slovn√≠k PraŇĺsk√©ho z√°vislostn√≠ho korpusu (PDT-Vallex).¬†

[√ölfarsd√≥ttir 2014]¬†Th√≥rd√≠s √ölfarsd√≥ttir. 2014.¬†ISLEX ‚Äď a Multilingual Web Dictionary.