- Clarin
- Publications
- Clarin Groups
- Events
- Resources
- Help Desk
Tools inventory
Please click here to add a tool to the inventory. Number of tools that match the current filter criteria: 231
Please click here to add a resource to the inventory.
|
Name |
Type | Country | Description | Organisation (not a CLARIN member) | Distribution Type |
|---|---|---|---|---|---|
| ABC - Language Identifier | toolbox | Romania |
The application, developed in C#, automatically identifies the language of a text written in one of the 21 European Union languages. By using training texts in different languages (approx. 1.5Mb of text for each language), a training module counts the prefixes (the first 3 characters) and the suffixes (4 characters endings) for all the words in the texts, for each language. For every language two models are constructed, containing the weights (percentages) of prefixes and suffixes in the texts representing a language. In the prediction phase, for a new text, two models are built on the fly in a similar manner. These models are then compared with the stored models representing each language for which the application was trained. Using comparison functions, the best model is chose. More detailed descriptions are available in [[http://www.racai.ro/~tufis/papers|the following papers]]: -- Dan Tufiş, Radu Ion, Alexandru Ceauşu, and Dan Ştefănescu (2008). RACAI's Linguistic Web Services. In Proceedings of the 6th Language Resources and Evaluation Conference - LREC 2008, Marrakech, Morocco, May 2008. ELRA - European Language Resources Association. ISBN 2-9517408-4-0. -- Dan Tufiş and Alexandru Ceauşu (2007). Diacritics Restoration in Romanian Texts. In Elena Paskaleva and Milena Slavcheva (eds.), A Common Natural Language Processing Paradigm for Balkan Languages - RANLP 2007 Workshop Proceedings, pp. 49-56, Borovets, Bulgaria, September 2007. INCOMA Ltd., Shoumen, Bulgaria. ISBN 978-954-91743-8-0. -- Dan Tufiş and Adrian Chiţu (1999). Automatic Insertion of Diacritics in Romanian Texts. In Ferenc Kiefer, Gábor Kiss, and Júlia Pajzs (eds.), Proceedings of the 5th International Workshop on Computational Lexicography (COMPLEX 1999), pp. 185-194, Pecs, Hungary, May 1999. Linguistics Institute, Hungarian Academy of Sciences. |
RACAI - Research Institute for Artificial Intelligence, Romanian Academy, Bucharest, Romania | |
| Access rights Management System | other | Netherlands (the) | A tool to grant and deny the access to (parts of) an IMDI-based corpus. Support for advanced settings like ACLs. | ||
| ADDIT | annotation tool | Netherlands (the) | ADDIT is a tool to 'plug' notes to elements of an IMDI archive and draw relations between those elements. It is an application on top of the web applications which give you access to the archive, like the imdi-browser or Annex. | ||
| Ajka |
annotation tool written language single tool |
Czech Republic | morphological analyser and tagger for Czech | Centre for NLP Faculty of Informatics, Masaryk University | |
| Aligner 2.0.6.7 | Lithuania | A language-independent tag-oriented semi-automatic paragraph and sentence aligner. Works on MS Windows. Produces XML valid documents. Allows recording detailed bibliographical information. It has been used for creating English-Lithuanian Parallel corpus. | Center of Computational Linguistics, Vytautas Magnus University | ||
| Alinea | other | Spain | A tool for parallelizing translated texts, which has been specially designed for specialized corpora and also as a translation validator. | ||
| Annex - Annotation Exploration tool | multimedia tool | Netherlands (the) | Tool in the MPI web-based framework for archive exploration (and enrichment) | ||
| ANNIS |
evaluation tool multimodal tool spoken language written language single task tool web application |
Germany | ANNIS2 is an open source, versatile web browser-based search and visualization architecture for complex multilevel linguistic corpora with diverse types of annotation. ANNIS, which stands for ANNotation of Information Structure, has been designed to provide access to the data of the SFB 632 - "Information Structure: The Linguistic Means for Structuring Utterances, Sentences and Texts". Since information structure interacts with linguistic phenomena on many levels, ANNIS2 addresses the SFB's need to concurrently annotate, query and visualize data from such varied areas as syntax, semantics, morphology, prosody, referentiality, lexis and more. For project working with spoken language, support for audio / video annotations is also required. | public | |
| Anotatornia |
annotation tool written language multiple task tool web application |
Poland | Tool for manual on-line annotation of corpora at various linguistic levels. The levels currently implemented are: word-level and sentence-level segmentation, morphosyntax, word sense disambiguation. Anotatornia implements sophisticated mechanisms of the management of texts, annotators and conflicts. | public | |
| Apertium Old Catalan morphological analyzer |
annotation tool web service |
Spain | A RESTful morphological analyzer for Old Catalan. | ||
| Araucaria | Araucaria is a software tool for analysing arguments. It aids a user in reconstructing and diagramming an argument using a simple point-and-click interface. The software also supports argumentation schemes, and provides a user-customisable set of schemes with which to analyse arguments. Written in Java, released under the GNU General Public License. | ||||
| Assigning lemmas and part-of-speech to wordform lists | Croatia | online service | |||
| AUTONOMATA-g2p-toolkit |
spoken language toolbox |
Belgium Netherlands (the) |
Dedicated name g2p converters (for Dutch and Flemish) designed to produce high quality canonical name transcriptions of person names and address items. The machine learning tools used to design these converters are available to third parties as well. This way they can be applied to develop dedicated g2p converters for name categories that are not handled in this project. | ELIS - UGent | |
| Birnam Parser |
annotation tool written language single task tool |
Poland | A bottom-up memorizing interpreter of DCG grammars. It includes some extensions necessary for Świdziński's grammar. | public | |
| BitPar | annotation tool | Germany | Statistical parser | ||
| BNF Converter | NLP development aid | Sweden | BNF Converter is a tool for creating compiler frontends for domain-specific languages. | Språkbanken | |
| Bracmat | NLP development aid |
Denmark Netherlands (the) |
Interpreted programming language with pattern matching as its core bussiness. Allows pattern matching in text strings as well as in data structures, using the same syntax. The program is excellent for quickly trying out algorithms, for conversion tasks (e.g. cleaning up buggy HTML), and for parsing (e.g. NL dialogue management in a conversation between a software agent and an avatar). The program is written with a focus on avoiding any numerical limits and on compact storage of data. | Center for Sprogteknologi, Københavns Universitet | |
| BulTreeBank Morphological Analyzer |
annotation tool written language single tool |
Bulgaria | It is used morphological lexicon of Bulgarian (100 000 lemmas) compiled as a finite-state automaton in CLaRK System. It requires the text to be first tokenized and it is applied in each token. Includes also guessers for unknown words and Named Entities gazetteers. If the corresponding resources are available for a different language, then it can be tuned to it. | Linguistic Modeling Department, IPP, Bulgarian Academy of Sciences | |
| BulTreeBank Morphosyntactic Disambiguator |
annotation tool written language single tool |
Bulgaria | This is a hybrid system: rules, neural network, rules. First rules for the sure cases are applied, then a neural network disambiguator is applied, then rules for repairing of the most frequent errors of the neural network. The rules are implemented as constraints in CLaRK System. The neural network is additional module implemented in Java. It is called CLaRK. It requires the morphologically annotated input. | Linguistic Modeling Department, IPP, Bulgarian Academy of Sciences | |
| BulTreeBank Tokenizer |
annotation tool written language single tool |
Bulgaria | The tokenizer is covering all languages that use Latin1, Laitn2, Latin3 and Cyrillic tables of Unicode. Can be extended to cover other tables in Unicode if necessary. The implementation is as a cascaded regular grammar in CLaRK. It recognizes over 60 token categories. It is easy to be adapted to new token categories. | Linguistic Modeling Department, IPP, Bulgarian Academy of Sciences | |
| BUSCANEO | other | Spain | Tool for neologism extraction. | ||
| Bústia Neològica Escolar | written language | Spain | Terminology management | ||
| Bwananet | written language | Spain | Tool for querying the Technical Corpus of the Institut Universitari de Lingüística Aplicada. | ||
| calcular_p_cue_class |
written language web service |
Spain |
Statistical analysis service: It calculates P(cue|class): probability of seeing a linguistic cue given a lexical class. This probability is computed given the occurrences of cues in a corpus (codified in the signatures file) and the information of belonging or not belonging of these words to different classes (codified in indicators file). The probability is computed for each studied cue in the signatures file and for each class in the indicators file. |
||
| Catalan Annotated Corpora CQP | web service | Spain | This RESTful service allows to define a sub-corpus from different annotated corpora. The service includes a POS tag harmonisation process where original tags are converted to EAGLES/Parole format. The eventual sub-corpus is indexed using the IMS CWB tool. The user receives an ID which can be used by the CQP service to exploit the sub-corpus. |

