treebanks

Poly-GrETEL Search Engine for Querying Syntactic Constructions in Parallel Treebanks

Poly-GrETEL is an online tool which enables syntactic querying in parallel treebanks. It is based on the monolingual GrETEL environment (http://gretel.ccl.kuleuven.be/gretel-2.0). It provides online access to the Europarl parallel treebank for Dutch and English, allowing users to query the treebank using either an XPath expression or an example sentence in order to look for similar constructions. By combining the example-based query functionality with node alignments, Poly-GrETEL omits the need for users to be familiar with the query language and the structure of the trees in the source and target language, thus facilitating the use of parallel corpora for comparative linguistics and translation studies.

Languages covered

Dutch, English

Keywords

Linguistics, syntax, morpho-syntax, translation studies, comparative linguistics

Project leader
Vincent Vandeghinste
Acknowledgements

Poly-GrETEL is developed in the context of the SCATE project (Smart Computer Aided Translation Environment), funded by the Flemish Agency for Innovation through Science and Technology (IWT).

GrETEL Search Engine for Querying Syntactic Constructions in Treebanks

GrETEL is a query engine in which linguists can use a natural language example as a starting point for searching a treebank with limited knowledge about tree representations and formal query languages. Instead of a formal search instruction, it takes a natural language example as input. This provides a convenient way for novice and non-technical users to use treebanks with a limited knowledge of the underlying syntax and formal query languages. By allowing linguists to search for constructions similar to the example they provide, it aims to bridge the gap between descriptive-theoretical and computational linguistics.

The example-based query procedure consists of six steps. In the first step the user enters an example of the construction he/she is interested in. In the second step the example sentence is automatically parsed with the Alpino parser, resulting in a parse tree. In the third step the example is returned in the form of a matrix, in which the user specifies which aspects of this example are essential for the construction under investigation. In the fourth step the user can select which treebank needs to be queried. The fifth step provides an overview of the search instruction, i.e. the subpart of the parse tree that contains the elements relevant for the construction under investigation. This query tree is automatically converted in an XPath query which can be used for the actual treebank search. In the advanced search mode, the XPath query is made visible, and can be edited if desired. In the sixth step the query is executed on the selected corpus. The matching constructions are presented to the user as a list of sentences, which can be downloaded. The user can also click on the sentences in order to visualize the results as syntax trees.

In addition to the example-based search functionality, users can also query the treebanks using an XPath expression. This query is then processed in the same way as the automatically generated query in the example-based approach.

GrETEL enables search in several Dutch treebanks: the 1-million word LASSY-Small (written Dutch) and CGN treebanks (spoken Dutch), and the 500-million word SoNaR-500 treebank (written Dutch).

GrETEL for Afrikaans (http://gretel.ccl.kuleuven.be/afribooms) includes the Afrikaans NCHLT treebank, containing ca. 45,000 words.

Poly-GrETEL (http://gretel.ccl.kuleuven.be/poly-gretel) is a tool to query parallel treebanks. It contains the English – Dutch Europarl parallel treebank. In addition to a bilingual search functionality it also offers a monolingual search option which is similar to example-based querying in the monolingual GrETEL environment.

Languages covered

Dutch, Afrikaans

Keywords

linguistics, syntax, morpho-syntax

Tool task

corpus exploration, treebank search

Country

Belgium

Project leader
Liesbeth Augustinus
Contact email
Acknowledgements

The first version of GrETEL is a result of the Nederbooms project (funded by CLARIN-VL). The second version was developed in the context of the GrETEL 2.0 project (funded by the Dutch Language Union). GrETEL for Afrikaans was created in the context of the AfriBooms project (funded by the Dutch Language Union and the Department of Arts and Culture of the Government of South Africa).