Skip to main content

CLARIN Estonia presents the Place Names Database (KNAB)

Submitted by Jakob Lenardič on

Blog post written by Peeter Päll and Kairi Tamuri, edited by Kadri Vider, Olga Gerassimenko, Darja Fišer and Jakob Lenardič


The Place Names Database of the Institute of the Estonian Language (KNAB) is a multilingual and multiscriptual systematic database of geographical names covering Estonia and other countries. Its purpose is to facilitate the study and standardization of geographical names by providing information on their history and modern use. It has been planned as a linguistically-oriented database.

KNAB currently contains  approximately 46,000 entries related to Estonia and 108,000 entries related to other countries. Estonian geographical names include the following:

  • street names;
  • names of populated places;
  • names of former manor houses;
  • farm names (partially);
  • names of administrative units (both modern and historic);
  • names of natural features (rivers, lakes, islands, bogs, capes etc.).

The geographical names of other countries cover at least 1st-level administrative divisions of each country, some autonomous administrative units of Russia (notably North Caucasus) and some minority names from other parts of the world (e.g.. Basque, Tibetan, Welsh). KNAB also collects exonyms or conventional foreign names from many languages of the world, which are also published separately.

Please note that the database is not an authoritative source of official names in Estonia. While some feature types (e.g. street names of Tallinn, names of populated places in Estonia) are fully covered, others might not be. The official register of Estonian place names is maintained by the Land Board of Estonia.

The database is continuously updated. By giving access to both modern and historic records, the database provides researchers with the possibility to identify name forms across different languages and study their diachronic development. Uniquely, the database also provides geographical names in different scripts; besides Latin, there are names in Burmese, Chinese, Cyrillic, Devanagari, Greek, Japanese, Mongolian, Tibetan and many other scripts, strictly encoded according to Unicode. In the case of foreign names, it should be borne in mind that the data often reflect the de facto situation in a given country, so the names do not always correspond to the de iure status of certain regions. By contrast, country names follow the international naming conventions.

The users of the database include editors, translators, researchers, geographers and other specialists. The Estonian edition of the database (where the Estonian variants of the place names are listed as keywords) is used for example by the Estonian Wikipedia and the media when there is a need for more comprehensive listings than those given by dictionaries. In the English edition of the database, preference is given to local official names. The English data have been used in international research projects, which required multilingual name variants. For instance, in  the Named Entity Recognition and Classification project of the Joint Research Centre of the European Commission, Pouliquen et al. (2006) have used KNAB to develop a tool that recognizes geographical information in texts, which can be then visualised by tools such as Google Earth.


Figure 1: Visualising geographical information provided by KNAB in Google Earth with a tool developed by Pouliquen et al. (2006)

 


Click here to read more about Tour de CLARIN