Skip to main content

Users of CLARIN - who are they?

Submitted by martin.wynne@b… on

CLARIN is an infrastructure to support the sharing, use and sustainability of language data and tools for research in the human and social sciences, and the CLARIN vision is for an interconnected online environment where anyone can get involved with using language resources and tools for professional or amateur research. However, such a wide vision implies a huge number of potential users. Not just those creating resources, but anyone wanting to find out more about language, or wanting to use language as a means to a different end. Such a vast scope requires clear prioritization, here are some of my recent thoughts about that issue.

CLARIN has very diverse sets of current and potential users. These can be categorized according to the following dimensions, at least:

Type of entity

country, research infrastructure, University, department, academy, centre, project team, individual, associations of individual academics

User involvement activities should be understood as aimed primarily at people, not organizations or countries. CLARIN's members are countries, and we deal with all of these entities, but when it come specifically to user engagement, the highest priority should be to deal with individuals, and with the groups and associations with a high level of involvement of active researchers. Forming strategic alliances with communities in which researchers are already well-organized and represented by effective organizations has the advantage of converting “chiefs” rather than “Indians”.

Role in the research process

funder, decision maker, senior administrator, data/information scientist, researcher, teacher, student

The highest priority here is for engagement with the active researchers involved in creating, using and actively curating resources and tools. There should also be opportunities for engagement with those in specialist digital research support roles, who offer front-line support to staff in their institutions. The category of 'researchers' would normally include students studying for higher degrees, particularly at doctoral level. This would imply a lower priority for non-research active academic staff (e.g. those focussed on more on teaching), undergraduate students, as well as administrators, IT officers, etc., although specific initiatives directed at these groups could be useful.

Role in the research life-cycle

creator, annotator, corrector, user, disseminator, archivist, aggregator

The highest priority is to engage with users of research data and tools. The creators and curators of resources are of course not excluded from user involvement activities, but users should also usually be involved as well. Other CLARIN activities centred on building and maintaining the infrastructure are more focussed on the creators and curators.

Level of familiarity and expertise

novice, competent, expert

All are important: we should aim to (i) engage in advanced discussions and joint activities with expert users, (ii) train and encourage competent users, and (iii) raise awareness among novices and the digitally non-engage.

Sector

higher education, government, industry, general public

CLARIN is founded as a non-commercial entity with a central mission to promote the effective use of language resources in advanced research in the higher education sector. Language technology is important to industry and other sectors, and CLARIN partners are also involved in industrial engagement and other forms of knowledge exchange, and there are two-way benefits to these involvements, but they are not the priority for CLARIN. If outreach to the general public is a priority for researchers, CLARIN is able to help, but we don't aim to set the agenda in this respect.

Language

all languages, as the user's native or working language, and as the object of study

CLARIN deals with all human languages, in present-day and historical forms. The priority is on those which are studied most widely by our highest priority users. Our key users are in Europe, and but all of the languages in which the human cultural record is encoded are studied on this continent. Reflecting the priorities of the disciplines of the humanities and social sciences which CLARIN aims to serve, the major European languages of scholarship and literature of various eras are of particular importance: Latin, Ancient Greek, French, Italian, Spanish, German, English, and there are of course also many more. National CLARIN initiatives are likely to have a high priority for national languages, and the resulting aggregation of centres of expertise in most of Europe's major and minority languages is one of CLARIN's huge strengths.

CLARIN centres are also home to some of the world's most important repositories of resources and documentation for endangered and lesser-studied languages, and the curation of these resources remains a high priority, particularly when this work is not being done elsewhere. All languages of the world are recorded and studies in European research institutions, and CLARIN is also keen to deal with all non-European languages, including major world languages such as Arabic, Chinese, Russian, Japanese, etc.

User groups

 all academic disciplines (either those with language as the object of study in itself, or as the means of communication or encoding, and too many to list here)

CLARIN has three top priority sets of communities. These are listed 1-3 below, and although difficult to rank in terms of priority, they are useful to separate from one another since different engagement strategies are usually called for in the three cases. Listed 4-6 are lower priority areas where engagements are also possible.

1=. Core HSS (Humanities and Social Sciences) disciplines (not including linguistics and language studies - these are treated separately below) for which widespread use of language data is central to many of their key methods, including:

  • History
  • Literary studies
  • Political Science
  • Media Studies
  • Sociology
  • Theology
  • Classics and Ancient History
  • Area Studies (e.g. Middle Eastern / South Asian / Latin American Studies)
  • Philosophy
  • Social anthropology and ethnography

1=. Linguistics, language studies, and related disciplines which focus on text and speech description and  analysis.

Working with these communities has the advantage of consolidating strong roots in our home domain, and allows us to concentrate on the integration of the core tools and datasets. We also need to remember that that linguistics is much wider than corpus linguistics, and different to computational linguistics, and will also involve a large degree of outreach activities to the numerous sub-disciplines, some of which are not so much on our home turf. One effective engagement strategy is to support and encourage collaboration between linguistics and other disciplines (e.g. intellectual history, political science, etc.).

Due to the historical circumstances of the origins of CLARIN, there is a risk of an imbalance towards textual linguistics. While this is important (especially in terms of the connections with other disciplines), extra efforts are often necessary to ensure that CLARIN has sufficient engagement with researchers working with spoken data and tools.

1=.  Digital Humanities

Although for many this is not a discipline, it is nevertheless an important area for CLARIN. Those already identifying themselves as 'digital' are an obvious target for joint collaborative activities, including sharing expertise and resources.  Activities to promote digital tools and methods are less approriate for this group, as they are likely to develop and use their own datasets and tools, and be fairly self-reliant when doing so.

There are also risks associated with prioritizing DH too highly: there is the danger of only preaching to the converted, the risk of ignoring the majority of researchers in the mainstream of disciplines, and also of getting too caught up in the current hype around DH (and inevitable backlash).

4. Other HSS - the following disciplines are generally less likely to make use of language resources and tools, although there will be opportunities, since language permeates all areas of human activity and academic study.

  • Law
  • Library and Information Studies
  • Education (not including language education, covered under language studies above)
  • Archaeology
  • Performing Arts
  • Fine Arts
  • Art and design, Architecture
  • Music
  • Demography and human geography
  • Economics
  • Management and business studies
  • Social Policy and Social Work

5. Interdisciplinary areas: these are numerous. Here are just a few examples of areas which are currently particularly active and likely to present opportunities:

  • History of Science
  • Medical Humanities
  • Thematic historical studies (e.g. Food Studies, War and Peace Studies, the Reformation, etc.)

6. Non-HSS disciplines: CLARIN's mission does not include supporting these disciplines, but there will be opportunities to collaborate. In general, additional funding should be sought, and such activities should not detract or draw resources away from engagements with the HSS disciplines.

  • Computer Science
  • Psychology
  • Biological Science and medical research
  • Engineering & Physical Sciences
  • Environmental Studies

And finally...

While it is possible to try to generalize about priorities, a number of factors at national and local level are likely to operate and adjust the equations We need to be flexible, and be able to take advantage of opportunities as they arise. But I hope that these thoughts will at least be useful as we think about who are users are and who they could be.