collocation

DiaCollo: collocation analysis in diachronic perspective

In the words of the famous language philosopher Ludwig Wittgenstein: "the meaning of a word is its usage in the language" (Philosophical Investigations, Part I, section 43). In other words, the meaning of a word can be revealed by the context in which it appears. An ambiguous word such as ‘bank’ can be be disambiguated given its context: the ‘bank’ bounding a body of water will tend to occur together with terms like “river”, “lake”, or “slope”, while the ‘bank’ which is a financial institution will tend occur together with expressions like “money”, “cheque”, or “go to”.

Changes in a word's meaning will therefore often be directly associated with changes in its characteristic combinations (the set of words with which it typically occurs together, its collocates). Even political, cultural, or social changes relating to a central term can be revealed and traced through its typical combinations (see the example for ‘revolution’ below).

DiaCollo is a software tool for the discovery, comparison, and interactive visualization of the typical word combinations for a user-specified target term. Characteristic word combination profiles based on various underlying text corpora can be requested for a particular time period, as well as direct comparisons between different time periods. In addition to traditional static tabular display formats, a number of intuitive interactive online visualizations for query result data are also available.

A short guide on how to use DiaCollo

  1. Visit the DiaCollo query form in a browser to query the data from the German Text Archive text corpus
  2. Type the word Revolution in the QUERY field.
  3. Select Cloud from the FORMAT menu. Leave the rest of the fields unchanged.
  4. Click on the submit button (next to the QUERY field).
  5. In the box beneath the query section, the words that typically appear with Revolution will be displayed. The window initially shows the situation in 1610. The presentation format is a word-cloud: the displayed words will differ in size and colour based on their association strengths with respect to the target word, Revolution.
  6. Directly above the display area is a time-line beginning at 1610 and ending at 1910, divided into intervals of 10 years each. To the left of the display area is a scale of the (relative) association strengths for the displayed items for easier interpretation of the results.
  7. Clicking on a date in the time-line (e.g. 1790) will cause the typical combinations for Revolution in the corresponding decade to be displayed; clicking on a word in the display area will display a window containing detailed information on that word to be displayed, including a direct link to the respective underlying corpus hits. Alternatively, you can click on the play button to the left of the time-line to initiate an animation of the changes in typical word combinations over time. Playback speed can be altered with the vertical slider next to the play button.

You can modify the basic recipe above in various ways, for example by changing the queried time period (DATE) and/or the size of the intervals on the time-line (SLICE). You can also change the maximal number of displayed collocates (KBEST) or the mode of visual presentation (FORMAT). Additional corpora and further modes of application are also available. For instance, you can use DiaCollo to display the differences or the similarities between two different words on the basis of their typical collocates over a given time period, or to directly compare the typical collocates of a single word in two different time periods. Further details and examples can be found in the full CLARIN-D DiaCollo use-case (in German), as well as in DiaCollo's online help pages.

Additional versions of this guide

A more detailed guide with examples in German is available in PDF format.

CLARIN Centre
Berlin-Brandenburg Academy of Sciences (BBAW)
Project leader
Bryan Jurish
Contact email
Acknowledgements

DiaCollo is a use case of the CLARIN-D centre in the Berlin-Brandenburg Academy of Sciences and Humanities (BBAW).

Participating projects:

 

Related CLARIN-D tools and services

  • WebLicht web-based analysis tool
  • DTA::CAB historical German text analysis service

GraphColl

GraphColl logo
Country:
United Kingdom
Description

GraphColl is a tool for building and exploring networks of linguistic collocations. It was developed at the ESRC Centre for Corpus Approaches to Social Sciences (CASS) at Lancaster University in 2014-15. CASS is a member of the CLARIN-UK consortium.

Text in a particular field of discourse is organized into lexical patterns, which can be visualized as networks of words that collocate with each other. GraphColl is a tool that builds collocation networks from corpora, allowing the user to gain important insights into semantic relationships.

GraphColl 1.0 is a free tool, developed with both novice and advanced users in mind, providing full control over the statistics and methods used to build collocation networks, whilst also offering sensible defaults for casual users. The system runs locally on a desktop computer, with a graphical user interface. The interface is structured around a series of tabs, which may be followed in a wizard-like manner to construct, explore and export a collocation graph. Graphs are presented as detachable tabs, allowing multiple graphs to be generated and examined at once.

The user can define properties of the collocation graph to be produced, such as the span of left and right collocation windows, the association measure, the minimum collocate and minimum collocation frequency, and  “advanced thresholds”, which are boolean expressions written in the Groovy scripting language.

GraphColl has been used in a number of research projects, including investigations into historical moral panics about swearing, current discourse about migration in Europe, and the use of GraphColl is taught in a corpus linguistics MOOC (see https://www.futurelearn.com/courses/corpus-linguistics/0/steps/9886).