You are here

Legal Information Platform

Welcome to the Legal Information Platform!

Here you will find useful information on legal issues related to research, especially in the Social Sciences and Humanities domain.

The platform aims to introduce researchers with basic notions related to the legislative and licensing framework in Europe on Copyright and Data Protection:

  • Introduction to Copyright and Related Rights
  • Licensing Practice
  • Overview of Data Protection
  • Changing laws in the EU related to copyright and data protection

It also includes proposals for:

  • Further reading/Bibliography
  • Useful links


CLARIN Legal Information Platform

by Paweł Kamocki, Erik Ketzan

1. Copyright and Related Rights

Copyright Law Overview

Copyright Exceptions

Orphan Works

Related Rights and Databases

2. Licensing Practice

Overview and Public Licenses

Public Licenses for Data and Software

License Chooser Tools

3. Personal Data Protection

History and Sources

Overview of the Data Protection Framework

4. Changing Laws in the EU

Text & Data Mining Exception in the UK

Text & Data Mining Exception in France

Research Exception, including Text & Data Mining, in Germany (2017)

EC’s Proposal for the Directive on Copyright in the Digital Single Market

General Data Protection Regulation

5. Bibliography - Further Reading

6. CLARIN Legal Issues Committee (CLIC)

Committee / People

News & Meetings

CLIC White Papers

Copyright and Related Rights

Copyright Law Overview

What is copyright law?

Copyright is a branch of Intellectual Property Law (IP Law). IP Law protects a wide range of rights in various results of human creativity. Other major branches of IP Law include: Patents, Industrial Designs, Trademarks and Trade secret.

Copyright is therefore a form of property; indeed, just like “classic” (corporeal) property it grants the owner (rightholder) some exclusive rights, i.e. rights to exclude others from his property. Like other forms of property, Intellectual Property is recognized as a fundamental right and as such it benefits from special legal protection (see art. 17.2 of the Charter of Fundamental Rights of the European Union).

What are the sources of copyright law?

There are international, European and national sources of copyright law:

1. International sources:

  • Berne Convention for the Protection of Literary and Artistic Works 1886
  • Universal Copyright Convention 1952
  • Agreement on Trade-Related Aspects of Intellectual Property Rights (TRIPS) 1994
  • WIPO (World Intellectual Property Organisation) Copyright Treaty 1996

2. European sources:

  • Directive 2001/29/EC of 22 May 2001 on the harmonisation of certain aspects of copyright and related rights in the information society (known as Copyright Directive or InfoSoc Directive)
  • Directive 2012/28/EU of 25 October 2012 on certain permitted uses of orphan works (Orphan Works Directive)
  • Directive 2006/116/EC of 12 December 2006 on the term of protection of copyright and certain related rights (Term Directive)
  • Directive 2009/24/EC of 23 April 2009 on the legal protection of computer programs (Software Directive)
  • Directive 96/9/EC of 11 March 1996 on the legal protection of databases (Database Directive)

3. National laws in every EU Member State, for example:

  • In Germany: Urheberrechtsgesetz (UrhG)
  • In France: Code de la Propriété Intellectuelle (CPI)
  • In the UK: Copyright, Designs and Patents Act (CDPA)

What is protected by copyright? What is the scope of copyright?

1. General principle: copyright protects expression of original works.

Copyright protects original works. A work must be created (and not just recorded) by a human being; it must be conscious modification of reality. In order to be original, a work must be its author’s own intellectual creation [cf. Software Directive, Database Directive, Term Directive, CJUE C-5/08 Infopaq]. It means that the author must exercise at least a small degree of free choice while creating the work, unmotivated by any objective standards -- this choice can express his creative personality. For example, if two painters are asked to independently paint the same landscape, they will obtain two different results -- both of them will be original. It is not necessary for a work to be ‘artistic’ in any sense -- utilitarian, scientific or informative works can also be protected by copyright. The originality threshold that is required for copyright protection may vary from jurisdiction to jurisdiction, but it is relatively low. For example, a design of a can opener or a salad bowl can be protected by copyright.

2. Fixation: special requirement in UK law.

UK law (and other legal systems stemming from the same legal tradition, e.g. US law) requires also that in order to be protected by copyright, a work must be fixed in a tangible medium of expression (written on paper, carved in stone, recorded on a hard drive). All digital devices are tangible media of expression. In this approach, e.g. a speech or a sermon cannot be protected unless it is somehow recorded (in writing, on tape, on DVD).

By contrast, continental European copyright laws do not contain such a requirement, which allows for ephemeral works (like flowery compositions or chocolate sculptures) to be protected by copyright.

The consequences of this difference are often rather theoretical -- indeed, if a work is not fixed in a tangible medium of expression, it is difficult to prove copyright infringement.

3. Examples of protected works

Copyright may protect for example:

  • writings (literary works): books and articles, poems, short stories, letters, but also manuals, contracts, descriptions or explanatory notes, providing that they are original. While a book is very likely to be original, a short explanatory note will not necessarily be so. The same applies to titles or slogans;
  • interviews, speeches and sermons;
  • graphic works, such as drawings, paintings, engravings, graphics in computer games but potentially also diagrams, designs, schemes or maps, providing that they are original (e.g. through an original, non-standard choice of colours);
  • audiovisual works (movies, short videos…);
  • artistic works: such as sculptures, models (including e.g. models of utilitarian objects), works of architecture…;
  • musical works;
  • photographs, unless there is no creativity involved in their making (such is e.g. the case of most X-ray photos);
  • software;
  • and other creations, such as choreographies, artistic performances, costumes etc.

4. Short works, parts of works, unfinished works

Not only whole works, but also its parts can be protected by copyright, as long as they are original. A paragraph from a book will most likely be original. It has been ruled that excerpts as short as 11 words can be protected by copyright (it does not mean that they automatically are, but they can potentially be original) [CJEU C-5/08 Infopaq]. The same applies to parts of images or songs.

By analogy, unfinished or incomplete works may also be protected by copyright if they fulfil the originality requirement.

5. Copyright protection of compilations, databases, corpora

Copyright protects also compilations -- regardless of whether their content is protected by copyright (e.g. compilations of purely factual information or public domain works may also be protected). In order to be protected by copyright, compilations must be original in the selection or arrangement of their contents. Copyright would then only protect this particular selection and/or arrangement, and not the constitutive elements.

In general, compilations of research data are, arguably, rarely original, as they are built according to objective standards (such as completeness, relevance, arrangement in alphabetical order) which leave no room for creative, arbitrary choices and, therefore, originality. The case of language corpora, however, is specific, as the process of compilation often leaves a lot of room for choice.

For example, works of William Shakespeare are long in the public domain. However, compilations of his works may be protected by copyright if their selection or arrangement are original. A compilation ‘My favorite works of William Shakespeare’ is likely to involve original selection (especially if it includes some of the less popular works of Shakespeare); a compilation ‘Complete works of William Shakespeare’ is not original in its selection (the compilation is complete), but may still be original in its arrangement. Arrangement by alphabetical or chronological order will not be original; however, one can imagine an original arrangement (by subject: love, hate, war; or just purely arbitrary, based on the compiler’s creative choices).

In no case this would mean that one cannot re-use ‘Hamlet’ or ‘Romeo and Juliette’ extracted from such an original compilation, as the fact that the compilation is copyright-protected has no influence on copyright in its constitutive parts. This would only mean that in order to re-use (e.g. re-publish) the whole original compilation (with its original selection and arrangement) one would need an authorisation from its author (compiler).

6. Translations and other adaptations

Copyright also protects translations, adaptations (i.e. transpositions of the same work from one mean of artistic expression to another, e.g. from a book to a film, from a painting to a sculpture) and other derivative works. These works are not merely copies of original works -- some degree of arbitrary choice is involved into their making (which is obvious to everybody who ever tried to translate a paragraph of text: two equally skilled translators are extremely unlikely to come up with exactly the same translation of a longer text). Arguably, adding another layer of annotation or even linking two datasets may be regarded as creating a derivative work and therefore give rise to a new copyright.

In order to make derivative works, one needs an authorisation from the rightholder of the original work. It means that if one wants to translate a new novel by Stephen King, he would have to ask King (or more likely his agent or publisher) for permission. On the other hand, works of William Shakespeare (in which copyright had expired, or - more accurately - never actually existed, since Shakespeare died long before the first copyright statute) can be freely translated or adapted on stage.

Derivative works are protected without prejudice to the copyright in the original work. In practice it means that in order to use a derivative work (e.g. a translation), one needs to obtain permission from both the author of the original work, and the author of the derivative work (although quite often the latter would have a limited right to grant permission in the name of the original author). This, of course, does not apply if the original work is no longer protected by copyright -- in such case, only the permission of the author of the derivative work is necessary.

What is NOT protected by copyright?

Copyright does not protect:

  • unoriginal works;
  • works whose copyright term has expired;
  • ideas, themes, motives (e.g. an idea for a movie about a war between vampires and werewolves; an idea for a research project)
  • statements of facts (e.g. the fact that the Battle of Hastings took place in 1066, or that Germany won 7:1 against Brasil in the Football World Cup of 2014);
  • ‘works’ created by nature (e.g. oddly but naturally shaped rocks or trees, songs of birds);
  • individual words [CJEU C-5/08 Infopaq];
  • mathematical problems and formulas;
  • discoveries (as they are not created, but they existed objectively before they were discovered).

Are texts of laws, court decisions and other official works protected by copyright?

by Pawel Kamocki

Works created by public bodies, such as texts of laws, judicial and administrative decisions etc. would normally fall within the scope of copyright protection. This, however, would create an access barrier for the general public, which goes against the spirit of democracy and the public’s right to information. This is why various mechanisms have been created to make access to official works easier.

In most EU countries, there are explicit rules stating that official works are not protected by copyright (see e.g. s. 6 UrhG). These official works can roughly be defined as works created by public bodies within the scope of their public functions. As a general rule, this applies to court decisions, texts of laws as well as other sources of national or local law adopted by competent bodies. This may not apply, however, to drafts and unofficial translations of those texts.

In other countries (e.g. in the UK) a different approach has been adopted, in which copyright to official works belongs to the government (in the UK this mechanism is referred to as ‘crown copyright’). In such systems, governments then make official works available under special open licenses, such as Open Government Licenses.

Also in those countries in which official works are not protected by copyright, some content (e.g. databases of public data, statistics etc.) is sometimes released under similar open licenses (DatenLizenz Deutschland, Licence Ouverte) or with appropriate copyright notices (see: Europa Legal notice: ).

All the documents held by public sector bodies, whether they are copyrightable works or not, whether they are effectively protected by copyright or not, are jointly referred to as Public Sector Information (PSI). According to the general principle established by the PSI Directive (Directive 2003/98/EC of 17 November 2003) [], PSI shall be made re-usable. There are, however, many exceptions to this principle, mostly related to third-party copyright and data protection laws. Still, there is a growing tendency in EU Member States to make PSI openly available and re-usable via the so-called Open Data Portals (e.g. Open Data Paris:, Offene Daten Berlin: , EU Open Data Portal, Dutch Open Data Portal:; licensing plays an important role in this movement.

For a comparison of PSI access regimes, see: .

For more detailed information about the status of official works, contact the legal helpdesk in your consortium.

What are the exclusive rights of a copyright holder?

General remark

Copyright grants a certain number of exclusive rights. This means that the author has a right to exclude others from accomplishing certain acts (just like the owner can exclude others from his property). If you want to accomplish one of these restricted acts, you’ll have to ask the author (or a subsequent rightholder, if the author had transferred his copyright or if he is dead) for permission.

These exclusive rights can be divided into two categories: economic and moral.

  1. Economic rights

The exact scope of economic rights may differ between jurisdictions, but they can all be reduced to three main rights harmonised by the InfoSoc Directive: reproduction, communication to the public and distribution.

1. Reproduction

The reproduction right covers the right to authorise or prohibit direct or indirect, temporary or permanent reproductions of works by any means and in any form, in whole or in part. The right is therefore very broad.

Please note that most (if not all) digital uses of works involve reproduction, be it only temporary, in the memory of a device. However, note also that a certain numbers of these reproductions (e.g. those made while browsing the Internet) would be exempted under the ‘temporary reproduction’ exception (see Copyright Exceptions).

Reproduction is also necessary if one wants to create a derivative work or include a work in a compilation. This is why in some Member States the adaptation right (the right to make derivative works) may not be a separate right; in any case, adaptation is a part of the reproduction right.

2. Communication to the public

The exclusive right of communication to the public covers any act of communication of works to the public by wire or wireless means, including in such a way that members of the public may access them from a place and at a time individually chosen by them (i.e. uploading on a server).

Please note that different jurisdictions may define the notion of ‘public’ in a slightly different way, but there seems to be a compromise that a close circle of family members and perhaps also close friends does not constitute a public (the contrary solution would be an unnecessary intervention in the private life of users). It means that one can e.g. recite a poem during a family dinner without asking for the author’s permission. Accordingly, copyright-protected music can be played during such a dinner; this does not apply e.g. in hotel receptions that can be entered to by anybody. Playing music in hotels is regarded as communication to the public and therefore requires authorisation.

Applied to the digital context, this means that making works available online in a password-protected environment (when the password is only given e.g. to family members) does not constitute communication to the public. Arguably, the same can be said about the use of private profiles on social media (that can only be seen by ‘friends’). One should not forget, however, that most digital uses require also reproduction, and only some of these reproductions are exempted from authorisation under a temporary reproduction or private copy exception (see Copyright Exceptions).

The Court of Justice of the European Union ruled in 2014 that Internet users constitute one public. Therefore, linking (to a content that is, logically, already available on the Internet) does not constitute communication to a new public and, as such, does not require authorisation [CJEU C-466/12 Svensson].

3. Distribution

Authors have an exclusive right to authorise or prohibit distribution of their works (and their copies) by sale or otherwise (e.g. by lending or even by distributing free copies). Within the EU this right is exhausted after the first authorised distribution of the work on the EU market. It means that if a work (e.g. in form of a book or a CD) has been lawfully sold in e.g. Germany, the author cannot prohibit its sale in France or any other Member State.

b) Moral rights

Moral rights protect the link between the author and his work. They are not harmonised at the EU level, and therefore their scope and nature may vary considerably across the Member States. As a general rule these rights include the right of attribution (i.e. the right to be mentioned as the author of the work, but also to use a pseudonym or remain anonymous), the right to decide whether a work should be disclosed (published for the first time), and the right to protection against distortion of the message of the original work (i.e. by shuffling sentences in a book, or writing an alternative ending). The latter right is sometimes in conflict with freedom of expression; it is widely admitted that parody and pastiche (within certain limits) cannot be prohibited.

From the point of view of academic users, attribution is by far the most important moral right.

In some jurisdictions (such as France) these rights (unlike the economic rights) cannot be transferred, licensed or waived and they always remain with the author. They may also be perpetual; in such a case, after the death of the author they can be exercised by his heirs. In other jurisdictions (such as Germany), moral rights share the fate of economic rights -- they can be licensed and expire with economic rights, i.e. 70 years after the death of the author.

How long does copyright protect a work (term)?

As a general rule, copyright expires 70 years after the death of the author. The fact that copyright had been transferred to a third party has no influence on this.

This term is in fact very long; it is not rare that it extends beyond 120 years after the work had been created (i.e. when the author creates the work at the age of 30 and dies at the age of 80).

Copyright term has been gradually extended in the past century. Therefore, in case of works published before ca. 1945, determining whether they are still in copyright may be a complicated task (especially given that the two World Wars may have an impact on how the term is calculated). Therefore, if you are dealing with such works, you should consult a lawyer specialised in local law.

Nowadays, copyright term is harmonised across the EU by the Term Directive. Within the EU, works from a different Member State should be protected in each Member State for at least as long as ‘local’ works (ECJ, case C-360/00 Puccini).

If copyright belongs originally to a legal person (e.g. the employer in jurisdictions where copyright in works created in the course of employment belongs originally to the employer (such as the UK - see Who owns copyright?), its term is calculated not from the death of the author (because legal persons are ‘immortal’), but from the creation of the work. In such cases, specific rules apply and the term of protection may be longer than 70 years (e.g. 90 years from the creation of the work).

Please note that in some jurisdictions moral rights can be perpetual.

For more on copyright terms, see:

Who owns copyright?

As a general rule, copyright belongs to the author of the work, i.e. the person who made the necessary creative choices involved in the creation of the work, and by doing so stamped the work with his ‘personal touch’ [see: CJUE C-145/10 Painer]. The participation of other people in the process (e.g. field workers, technical operators), as long as their contribution is not original (i.e. they have to strictly follow the directives of the creator), has no impact on copyright ownership.

Copyright and employment

The solution may vary between jurisdictions!

In many EU jurisdictions, the works created by an employee in the course of employment belong to the employee, and not the employer, unless the employer and the employee had agreed otherwise in writing (e.g. in the employment contract). Such a clause, however, may be deemed as implied.

Some countries may adopt a different approach -- most notably, in the UK copyright to works created in the course of employment belongs ab initio to the employer.

The solution may also vary between the public and the private sector. Finally, special rules may apply to university teachers and researchers (there is a tendency in many jurisdictions to grant academics copyright in their works, even when they are employers or public agents).

Bottom line: the contract with your employer is likely to contain an intellectual property clause, regulating the questions of copyright ownership. Make sure you know what it says. In case of doubt, contact the legal department in your institution.

This only applies to the works created in the course of employment. Of course, if you work as a researcher and you also compose music or write fiction, your employer has no right to claim copyright in those works that are unrelated to your employment.

Special case: software

In most jurisdictions, regardless of what approach to copyright in works created in the course of employment is adopted by default, copyright in software created by an employee automatically belongs to the employer.

Joint authorship: general rule

The solution may vary between jurisdictions!

One work can be created by several authors. Detailed scenarios may vary and different jurisdictions may adopt different solutions in particular cases, but as a general rule, copyright to a work of joint authorship belongs in equal abstract parts to all the co-authors. The rules on ‘traditional’ (corporeal) joint ownership (i.e. when e.g. a house is jointly owned by several co-owners: spouses, siblings etc.) apply by analogy. It means that in order to use a work of joint authorship you would need permission from all the co-authors (unless one of them has a right to represent the others); one author cannot authorise the re-use of a work of joint authorship.

Detailed questions concerning how co-authors split profits, who can represent them etc. are normally regulated in a contract between them. In some sectors (most notably in cinematography) the importance of such contracts is primordial.

Anonymous and pseudonymous works

Authors are free to remain anonymous, or to use a pseudonym, without prejudice to their copyright. Some special rules may apply to anonymous and pseudonymous works at the national level (e.g. shorter period of protection, as the public doesn’t know if the author is alive or dead);  normally such works are dealt with via a representative.

Posthumous works

Some jurisdictions may grant quasi-copyright in posthumous works, i.e. works discovered after the death of the author. This quasi-copyright would then belong to the one who discovered the work. Therefore, if in 2015 someone discovers a posthumous work of Molière, he may claim an exclusive right in it (as French law recognizes this special copyright in posthumous works). The right is granted to reward the discoverer, but is normally much shorter than ‘real’ copyright.

Subsequent ownership

In most jurisdictions, copyright can be transferred (or at least an exclusive license can be granted for the whole remaining copyright term). All the rights will then belong to the transferee, and the original author will be left with nothing (apart from the money the transferee paid for the transfer). It is fairly common that authors transfer their copyright to a publisher in order to get published (it is not required by law, but many publishers would refuse to publish otherwise). However, even in such cases a certain link between the work and its author is preserved: the work always has to be attributed to the original author (it’s a moral right), and copyright expires 70 years after the death of the original author, regardless of who holds it.

After the death of the author -- general rule

If the author hadn’t transferred copyright during his lifetime, after his death it becomes a part of his estate (just like any other property he owned) which is passed onto his heirs.

Copyright Exceptions


A copyright exception is a statutory rule that allows users to perform certain acts that would normally be restricted, without the authorisation of the rightholder. The acts covered by exceptions may be described as ‘islands of freedom’ in the ocean of monopoly.

Art. 5 of the Infosoc Directive contains a list of exceptions that can be adopted in national laws of the EU Member States. This article does not apply directly in national laws (i.e. it does not grant any rights to users), but is rather intended to limit the freedom of national legislators by prohibiting them from adopting exceptions that would go beyond that list (see Recital 32 of the InfoSoc Directive). In other words: exceptions transposed in national laws can be narrower (or they can not be transposed at all), but they cannot be broader than in the Directive. This does not apply to the temporary reproduction exception, which is the only mandatory exception in the Directive and as such has to be adopted by every Member State ‘as-is’. In general, every other exception is transposed in a much narrower way than allowed by the Directive.

Exceptions and contracts

In most jurisdictions, copyright exceptions are overridable by contracts, i.e. a contractual clause (e.g. in a license) can restrict the uses allowed by a copyright exception. Therefore, if you access content on a contractual basis (which is often the case on the Internet, where in order to access the content of a service, the user usually has to accept the Terms of Service), make sure that the contract does not preclude the applicability of copyright exceptions. Many public licenses (like Creative Commons) contain a clause that expressly does NOT limit the benefits of copyright exceptions.

This may not apply in every jurisdiction. In jurisdictions like Belgium or Portugal, copyright exceptions may not be overridable by contracts (i.e. a contractual clause that limits the uses allowed under an exception is void).

The following paragraphs will discuss those exceptions that are important from the point of view of language resources. It will focus mostly on the limits laid down by the InfoSoc Directive, but some national implementations will also be briefly mentioned. For detailed information about the exact scope of each exception in your jurisdiction (which is indispensable if you want to rely on an exception), contact a lawyer specialised in your local law.              

The exceptions described below do not apply to software. The Software Directive contains a different list of exceptions which are described in a relevant section below [see Copyright and Software].

Temporary acts of reproduction

According to art. 5.1 of the InfoSoc Directive, temporary acts of reproduction which are an essential part of a technological process and whose sole purpose is to enable transmission of a work in a network or other lawful use of the work shall be exempted from the reproduction right.

This is the only ‘mandatory’ exception in the Directive and therefore it can be found in national law of every EU Member State.

This exception allows such activities as browsing and caching, which necessarily include reproductions. In order to enter within the scope of the exception, a reproduction has to be necessary for the completion of and inseparable from the technical process of which it is part. It has to be deleted automatically after a certain period of time. Cache copies made while browsing the Internet meet these requirements (CJEU C-360/13, Meltwater).


Quotation is probably the most important copyright exception. Art. 5.3 (d) allows Member States to adopt exceptions for ‘quotations for purposes such as criticism or review, provided that they relate to a work or other subject-matter which has already been lawfully made available to the public, that, unless this turns out to be impossible, the source, including the author's name, is indicated, and that their use is in accordance with fair practice, and to the extent required by the specific purpose’. Quotation has therefore to be justified by its purpose (which may include research and teaching). In some jurisdictions, only excerpts of works can be quoted (e.g. in France), while others allow whole works to be quoted as long as it’s justified by the purpose (e.g. in Germany). Some jurisdictions may require that the citation is included in an independent work (both France and Germany, but e.g. not Slovakia), which means that a mere compilation of citations, without any original contribution, is not allowed. The CJEU ruled (C-145/10, Painer), however, that the Directive does not require the citing work to meet the criteria for copyright protection. Member States are thus allowed to abandon this requirement (which is what Slovakia did recently), but are not obliged to do so.

Teaching and research exception

According to art. 5.3 (a) of the InfoSoc Directive, Member States are allowed to provide for an exception from both the reproduction right and communication to the public right for ‘use for the sole purpose of illustration for teaching or scientific research, as long as the source, including the author's name, is indicated, unless this turns out to be impossible and to the extent justified by the non-commercial purpose to be achieved’. Minimum requirements include therefore: attribution and use for a non-commercial purpose. It is indeed difficult to define where to draw a line before a commercial and a non-commercial activity, especially in case of applied research. ‘Commercial’ should be interpreted as ‘intended towards direct or indirect economic advantage’.

Unfortunately, the national implementations of this exception vary greatly between Member States. While the UK provides for a relatively large research exception (s. 29 CDPA), its German implementation is narrower (s. 52a and 53(2) UrhG), and the French one is extremely narrow. Some countries may even not have it at all (e.g. Spain).

Member States may only allow excerpts of works to be used within the scope of this exception, or require that an equitable remuneration be paid for those uses to a collecting society.

Private copy

Art. 5.2 (b) allows Member States to adopt exceptions from the reproduction right ‘in respect of reproductions on any medium made (…) for private use and for ends that are neither directly nor indirectly commercial’. This exception allows for ‘private copies’  of works to be made for personal purposes, but does not allow to share these copies with a public. The CJEU requires that the copy be made from a lawful source (C-435/12, ACI Adam) -- i.e. an illegally downloaded movie (that someone uploaded without authorisation of the right holder) cannot be exempted under this exception. Some jurisdictions may allow private copies to be made for work-related purposes (i.e. when a researcher makes a copy of a newspaper article that is interesting from the point of view of his research) as long as no communication to the public is involved.



Copyright and Software

You said that software is protected by copyright? How does this work?

Copyright may protect original software -- both the executable and the source code (CJUE C-406/10 SAS Programming). The graphic interface may also be a copyright-protected work. Various functionalities, however -- like ideas -- are not protected by copyright (CJUE C-406/10 SAS Programming).

The user will usually be asked to accept a copyright license before he can install the program; anecdotally, some may even argue that opening the box with the support on which the program is recorded equals acceptance of the end-user license (hence the term “shrink-wrap licenses”). Please keep in mind that in general the provisions of copyright licenses override statutory exceptions (see Copyright Exceptions above)!

You mentioned that “regular” exceptions don’t apply to software - so, are there any “special” exceptions concerning software?

Yes, there are three of them [art. 5 and 6 of the Software Directive]:

  • the making of a back-up copy by a lawful user of the computer program; this may not be prevented by a license;
  • a lawful user of the program may observe, study or test the functioning of the program in order to determine the ideas and principles which underlie any element of the program; this, according to some commentators, allows the user “to see what’s visible”;
  • a lawful user of a program may decompile it (i.e. to reconstruct the source code from the object code), but only if it’s indispensable to achieve interoperability of the program with other programs, and only if the decompilation process is limited to the parts that are necessary to achieve this interoperability. This is not authorised, however, if the information on how to achieve this interoperability is readily available. It is forbidden to share the source code so obtained, to use it for other purposes than achieving interoperability and in particular to develop a similar computer program.

What about software patents?

There is a lot of controversies concerning patentability of software. In the EU, “programs for computers” are not patentable as such (art. 52 of the European Patent Convention), but a non-obvious use of software to solve a technical problem may be protected by a patent.

For special rules concerning software licensing and Free/Open Source Software see Data and Software Licenses below.

Orphan Works

by Pawel Kamocki


Mass digitization projects (such as Google Books, Europeana and Gallica) drew the attention of lawmakers to the issue of orphan works, ie. works whose rightholders cannot be found. Such a situation makes the reproduction and distribution of the work impossible, since the rightholder’s authorization is required under copyright law; unauthorized use of a work constitutes copyright infringement.  A report estimates that this problem affects about 40% of books in the British Library, and up to 90% of the photographs in the London Metropolitan Archive; therefore, it represents a major challenge which also affects academic research.

Orphan Works Directive

The first major report on orphan works was published in 2006 in the United States, but so far no statutory solution has been adopted in the US. The European Parliament acted more promptly: on 25 October 2012 (just seventeen months after the publication of the first draft!) it adopted the Directive 2012/28/EU on certain permitted uses of orphan works. The text has been implemented into German law by the Law of 1 October 2013. The provisions on orphan works are found in sections 61 to 61c UrhG.

New copyright exception

According to recital 20 of Directive 2012/28/EC, the legal framework regarding orphan works is indeed an exception to copyright which completes the list of exceptions in art. 5 of Directive 2001/29/EC. Therefore, its application is subject to the three-step test, an international rule according to which copyright exceptions can be adopted in certain special cases, provided that they do not conflict with a normal exploitation of the work and does not unreasonably prejudice the legitimate interests of the author. Accordingly, the German legislator has implemented the Directive 2012/28/EC in Section VI of the German Copyright Act, containing the list of copyright limitations.

Notion of orphan work (what is an orphan work?)

According to this new framework, a work is considered an orphan work if none of the rightholders in that work is identified or, even if one or more of them is identified, none is located despite a diligent search. A closer study of the Directive and its national transpositions reveals that the work in question must be published, and the first publication must take place on the territory of a EU Member State. This excludes from the scope of the notion of orphan works unpublished manuscripts, but also works originally published outside the European Union.

Case of multiple rightholders (what if there are multiple rightholders?)

If a work has several rightholders (eg. several authors) and only some of them cannot be identified or located, the work can still be considered partially orphan. The use of such a work is subject to the approval of those right holders who have been identified or located; in relation to the remaining rightholders, the work is considered orphan.

Subject-matter (which works are concerned?)

The Directive 2012/28/EC does not apply to all orphan works, but only to works "published in the form of books, journals, newspapers, magazines or other writings" as well as cinematographic, audiovisual works and phonograms contained in the collections of publicly accessible libraries, educational establishments or museums as well as in the collections of archives or of film or audio heritage institutions. Therefore, it can be noted that the scope of orphan works regime is defined more by the support in which the work is fixed (eg. a book or a newspaper) than on the category of the work itself (eg. literary or graphic). As a result, graphic works (images, photographs, drawings, engravings...) included in books, magazines and newspapers are also concerned by this framework. This results from an express provision of Art. 1 (4) of Directive 2012/28 / EC. Art. 10 of the Directive contains a review clause according to which the European Commission shall issue an annual report (starting from 2015) concerning the possible inclusion of other categories of works ( "in particular photographs and other images that exist as independent works") in the scope of the Directive.

Beneficiary institutions (which categories of users are concerned?)

Orphan works may only be used by certain institutions (publicly accessible libraries, educational establishments and museums, archives, film or audio heritage institutions and public-service broadcasting organisations, established in a EU Member State - jointly referred to as “beneficiary institutions”) in the collections of which they are contained. As a result, for example, a university can not invoke the provisions on orphan works to use a book that is contained in the collection of any library (because virtually every book can be found in some library!). It seems, however, that the directive authorizes the use of orphan works within the framework of a public-private partnership (eg. a partnership between a university and a private research institute). It should be noted that research institutions (outside of universities) are not concerned by the Directive.

Permitted uses (what uses can be made of concerned works?)

Orphan works can be reproduced (in the sense of art. 2 of Directive 2001/29/EC) without permission of copyright holders, but only for the purposes of digitization, making available, indexing, cataloging, preservation or restoration. They can also be made available to the public (in the sense of Art. 3 of Directive 2001/29 / EC), but can not be distributed by sale or otherwise (in the sense of Art. 4 Directive 2001/29 / EC). The moral rights of the authors of orphan works (like paternity and integrity) have to be respected. In addition, an orphan work may be used by a beneficiary institution solely in order to achieve aims related to its public-interest missions. Therefore, universities can use orphan works in their educational mission, but also in their research mission. Any use for profit is forbidden, but beneficiary institutions are allowed to generate revenues in relation to their use of orphan works in order to cover their expenses for the reproduction and communicating the works to the public. It is not clear whether the cost of the diligent search of rightholders can also be covered by these revenues. Unfortunately, modifications (making of derivative works) of the orphan works are not allowed (see s. 62 UrhG), as a result, the orphan works framework is of very little use for language research.

Important condition: diligent search (what is diligent search?)

Diligent search of copyright holders is necessary for a work to be considered orphan. Such diligent search must be conducted in relation to each individual work before it is used. Logically, they should be carried out first of all in the Member State where the first publication of the work took place (or in the Member State in which the producer of an audiovisual work or a phonogram is located), but also in other countries if there is evidence to suggest that relevant information on rightholders is to be found in those countries. The diligent search must be conducted in good faith (i.e. with a genuine intention to find the rightholder); during the diligent search, at least the appropriate sources (determined by each Member State; for Germany, see the annex below) shall be consulted. The beneficiary institutions are required to keep records of their searches and inform the competent authority in their country (in Germany - Deutsche Patent- und Markenamt), indicating the intended uses of works. This competent authority shall then forward the information to the Office for Harmonization in the Internal Market, so they can be stored in a special database ( The fact that a work is listed in the database seems to be sufficient for the work to be considered orphan.

What if rightholders reappear?

The reappearance of the right holder puts an end on the status of an orphan work. The beneficiary institution is obliged to cease all use of the work as soon as the holder proves his copyright ownership. In such a case, the right holder should receive fair compensation for the use that has been made of their works.


Orphan works represent a large portion of university libraries’ collections. In particular, PhD and Master theses are considered "gray literature", often concerned with the orphan works problem. The phenomenon can also affect magazines, dictionaries, encyclopedias and other scientific writings, if their publisher goes bankrupt or is bought by another company, without IPR issues being properly addressed in the acquisition process. From a researcher’s perspective, the framework is important because it allows, in certain circumstances, for digitisation and making available of certain works that would otherwise be doomed to oblivion. However, in our view, the adopted solution is very disappointing, mainly because it concerns only certain categories of works. What about research datasets stored in electronic form, which can also become orphaned? What about the software which becomes obsolete so fast, not only least because of its support (it is not easy, and it will be more and more difficult, to recover data from 8” disks used in the 1970s). What about the orphan books published outside the European Union? At present, these categories of works, if their rightholders cannot be identified or located, cannot lawfully be used, unless the use falls into the scope of another copyright exception.

Related Rights and Databases

What are related rights?

Related rights (or neighboring rights) are specific rights that protect the effort of various actors of the creative industry (the so-called “auxiliaries of creation”) other than the authors themselves, such as performers (musicians, actors) and producers (record companies, broadcasting organisations, film producers). There is much less harmonisation concerning related rights than concerning copyright; therefore, substantial differences between related rights in various EU Member States may exist. In some jurisdictions, these rights may have considerable impact on research activities, protecting for example scientific and critical editions of public domain works (see: or unoriginal photographs (such as those taken by satellites). But the related right that is the most important for data-intensive research is the sui generis database right, created by the Directive 96/9/CE on legal protection of databases (the Database Directive). This right, unlike most other related rights, is heavily harmonized across the EU.

What is the sui generis database right?

The sui generis database right is a right created by the Database Directive. It protects databases regardless of their originality, it is therefore different from copyright. The only criterion for this special protection is substantial investment in the obtaining, verification or presentation of the contents.

A database is defined as a collection of independent works, data or other materials arranged in a systematic or methodical way and individually accessible by electronic or other means (art. 1(2) of the Database Directive). Software is excluded from the definition, even though the distinction between software and data is sometimes difficult in practice.

Databases, if the selection or arrangement of their contents meet the originality standard, may be protected by copyright (see Copyright Law Overview above). The sui generis right is independent from copyright, and it may protect both original and unoriginal databases.

In order for a database to be protected by the sui generis right there must be a substantial investment in either the obtaining, verification or presentation of its contents. Please note that the investment in the creation of the contents is not relevant.

What is the required threshold of investment remains rather unclear and may vary depending on the jurisdiction and circumstances. Some decisions recognise an investment of several thousands EUR as sufficient, while others place the bar much higher. In general, if the creation of the database required several months of work of a qualified team of researchers, the database is likely to attract legal protection.

Please note that the protection by the sui generis database right is independent from the copyright status of data in the database (the definition of a database mentions ‘independent works, data or other materials’). Therefore, elements that are not protected by copyright (such as purely factual statements, measurements, quantitative data) may fall within the scope of a monopoly as parts of a database.

Who does the sui generis database right belong to?

The sui generis database right belongs to the maker of the database, defined as the person (natural or legal) who bears the risk of the investment. Therefore, if a database is created in the course of employment, the right will belong to the employer, and not the employees.

In practice, the investment can be made jointly by several entities (e.g. research institutions participating in one research project). In such a case, it shall be assumed that the right belongs jointly to all the investors, and that the benefits should be shared in proportion to their investment, unless a contract between the investors specifies otherwise.

What are the restricted acts (exclusive rights of the maker of a database)?

The maker of a database can prevent extraction or re-utilization of the whole database or of its substantial part.

Extraction is a permanent or temporary transfer of the contents of the database to another medium, by any means or in any form. It corresponds to the reproduction right (see Copyright Law Overview above).

Re-utilization is any form of making available to the public, including by on-line transmission. It corresponds to the distribution right (see Copyright Law Overview above).

Please note that extraction and re-utilization of an insubstantial part of the database cannot be restricted. However, repeated and systematic extraction and/or re-utilization of insubstantial parts of the database can also be prohibited if it unreasonably prejudices the legitimate interest of the maker of the database (i.e. if it amounts to an extraction and/or re-utilization of a substantial part of the database).

In evaluating whether a part of a database is substantial, both qualitative and quantitative aspects should be taken into account (i.e. a part is substantial if it constitutes e.g. 40% of the whole database, but also if it is qualitatively small (e.g. 1%), but containing qualitatively important data (e.g. data that only exist in this one database)).

What is the term of protection?

The exclusive rights of the maker of a database expire 15 years after the completion of the database. However, a new substantial investment is made e.g. in updating the database, it shall be considered a new database, and its term of protection is renewed as well. This means that in theory a database can be protected forever, as long as its maker makes a substantial investment in it at least every 15 years.

Are there any exceptions to the sui generis database right?

A lawful user of a database (i.e. a user that has lawful access to the database) can freely extract and re-use insubstantial parts of a database. However, repeated and systematic extraction and/or re-utilization of insubstantial parts which amounts to an extraction and/or re-utilization of a substantial part is not permitted.

Furthermore, extraction (but not re-utilization!) by a lawful user, even of substantial parts of a database is permitted in two cases:

  • if it’s for private purposes - unfortunately, this applies only to non-electronic databases;
  • if it is for non-commercial scientific research, as long as the extraction is justified by its purpose and the source is indicated.

Please note that these exceptions apply only to extraction (copying) and not re-utilization (any form of communication and sharing, even within one research team!) of the database, their scope is therefore extremely limited.

How to license the sui generis database right?

The sui generis database right can be licensed or transferred in the same way as copyright (see Licensing Practice below). Most of the public licenses, however, does not take this right into account. Nowadays, version 4.0 of Creative Commons licenses concern both copyright and the sui generis database right (see below).

Licensing Practice

By Pawel Kamocki

Overview and Public Licenses

What is a copyright license?

Copyright license is a permission to perform certain acts that are otherwise restricted; the name comes from a Latin word ‘licentia’, meaning (roughly) ‘permission’. A copyright license can be defined as a legally enforceable ‘promise not to sue’ for performing the allowed restricted acts. Unlike copyright transfer, a license does not transfer any exclusive rights to the licensee; the rights stay with the licensor. A copyright license can therefore be compared to a lease agreement (where e.g. the property of the apartment stays with the landlord, the tenant is only allowed to live there during a certain period of time), while copyright transfer can be compared to sale (the seller loses all the rights which are transferred to the buyer). In practice, a license should be concluded in writing (although many jurisdictions may also recognise unwritten, i.e. implied licenses). Please keep in mind that it is impossible to transfer (or to license) more rights that one actually has; therefore, logically, before you license a work, you have to make sure that all the relevant rights belong to you.

What should a license include?

A license should include several elements in order to be valid (or just functional). These are typically:

  • the parties: who is the licensor (the ‘giver’), and who is the licensee (the ‘receiver’); in ‘public’ licenses, the permission is granted to the general public, and so the licensee is defined by a deictic expression, such as e.g. User or Customer; in certain contexts (such as Terms of Use of social media services), the licensor can be defined in this way.
  • the subject matter, i.e. the definition of the licensed content; it can be specified in an Annex, if it’s very long, e.g. if it includes several hundreds of press articles;
  • the scope, i.e. the licensed rights; does the license only concern copyright (reproduction / communication to the public / distribution) or is the sui generis database right (extraction / re-utilisation) also included?;
  • the purpose: is the permission limited, e.g. in order for the content to be used for scientific research, for publication in a specific newspaper etc.;
  • the duration: is the license limited in time (e.g. for ten years), or is it for the duration of the copyright or other licensed right (which in practice means that the uses can continue for an unlimited period of time);
  • the territorial scope: in some jurisdictions, this may be a condition for validity of a license; it is still very important in agreements concerning e.g. broadcasting rights; as a general rule in the Digital Age licenses should be worldwide;
  • information about exclusivity and transferability: an exclusive license means that the relevant permission can only be granted to the licensee (the licensee has the exclusivity); researchers are typically dealing with non-exclusive licenses; a transferable licenses means that the licensee can grant permissions (within the scope of the licenses) to further sub-licensees; in the context of academic research, transferability is desired, but often difficult to negotiate with rightholders;
  • applicable law and competent jurisdiction (especially if it’s an international contract): consult a lawyer to help you with this clause;
  • remuneration (if applicable).

How is a CC license different from my contract with the editor? Public vs. bespoke licenses.

A license can be an agreement (a contract) between the licensor and the licensee including any freely negotiated terms. Such a license is sometimes referred to as a ‘bespoke’ license, or a custom-made license. In order to simplify the licensing process, avoid interoperability problems and achieve some common goals, standard ‘public’ licenses started to appear in the 1980s. A public license allows the licensor to authorise the general public (i.e. everybody) to perform certain uses of his work; if such a license is used, it is no longer necessary to grant individual permissions (and e.g. to answer dozens of requests per day). From the legal point of view, a public license is an offer to conclude a contract; this offer is then accepted by conduct when a user starts using the licensed work. Therefore, public licenses are still binding contracts which should be respected. When it comes to licensing of research data, public licenses should be used whenever possible.

Public Licenses for Data and Software

There is a relatively wide range of readily-available public licenses for software (b) as well as for other works (a).

  1. Non-software licenses (Data licenses)
    1. Creative Commons

Creative Commons is a non-profit organisation founded in 2001 by - among others - Lawrence ‘Larry’ Lessig, a law professor at Harvard Law School. The organisation proposes a suite of public licenses known under the same name. Their most current version (4.0) was released in November 2014 and is a major improvement, as it covers not only copyright, but also related rights, such as the sui generis database right, which makes it a perfect tool for licensing of research data.

Previous versions of CC licenses may exist in national (ported) versions, i.e. versions that are not only translated, but also adapted to national jurisdictions. In our view, the use of ported versions of CC licenses should be avoided, due to compatibility problems.

Creative Commons licenses are build of four building blocks, each corresponding to a different requirement.

  • BY (attribution): this is a mandatory element of every CC license. Contrary to what is commonly believed, the attribution obligation in CC licenses extends beyond a simple indication of the name of the author; in fact, the user is obliged to retain a copyright notice (e.g. (c) 2016 Paweł Kamocki), a license notice (e.g. This work is licensed under a Creative Commons Attribution 4.0 International License), a disclaimer of warranties (if supplied) and a link to the licensed material.
  • SA (share-alike): according to this requirement, if derivative works are made, they have to be licensed under the same or compatible license, i.e. a license containing the same (or compatible) requirements. There is only one license approved for compatibility with CC BY-SA 4.0 license: the Free Art License 1.3. In every other case, in order to comply with the SA requirement, you will have to re-license the derivative work under the same CC license, or its more recent version.
  • NC (non-commercial) means that no commercial use can be made. Commercial use is defined as use primarily intended for commercial advantage or monetary compensation. Please note that this category is extremely unclear and can discourage potential users. In our view, when it comes to licensing of research data, this requirement should be avoided.
  • ND (no derivatives) means that no derivative works can be made, i.e. the material cannot be modified, adapted or translated, but can only be used « as is ». Arguably, adding another layer of annotation or meta-data, or even incorporating the material into a larger dataset would violate this requirement. Therefore, licenses containing the ND requirement should not be used for research data.

These four building blocks can be combined into six different licenses:

Please note that the BY requirement is a mandatory element of every CC license, and that SA and ND requirements are mutually exclusive. Only CC BY and CC BY-SA meet the standards of the Open Definition, and only CC BY is compatible with Open Access standards (see What is an ‘open’ license? below).

Apart from the licenses, Creative Commons offers also other tools. These include:

  • CC0 ( -- a copyright waiver, i.e. a tool by which the rightholder assumes an obligation not to exercise his exclusive rights. In theory, CC0 material can be used without any restrictions at all; however, in our view the tool raises some doubts as to its enforceability in many EU jurisdictions. Before you decide to use it, consult the legal department in your institution.
  • CC+ (CC Plus) -- a rather obscure tool that allows the right holder to add a requirement to one of the CC licenses to ‘open it up’. For example, by adding a ‘plus’ to CC BY-NC, the licensor can allow certain commercial uses of his work under specific conditions (e.g. for a fee, or free of charge for small companies). The ‘plus’ clause should be drafted by a professional lawyer. For more information see:
  • Public Domain Mark ( -- can be applied to a work that has been identified to be free of copyright. This is only an information sign and does not waive any rights or create any obligations. You can use it e.g. if you are sure that copyright in the work expired, e.g. because the author died more than 70 years ago -- that can save others some time.

Please note that CC licenses are not appropriate for software licensing. For software licenses, see below.

b) Software Licenses

There is a plethora of available software licenses which can be divided into three categories:

  • Permissive licenses -- contain only minimal conditions concerning the use, modification and re-distribution of software. This means that there is no guarantee that modified versions of the software will remain ‘open source’, or free. The most popular of those licenses are BSD, MIT and Apache licenses.
  • Strong copyleft licenses -- contain an obligation to license modified versions of software under a compatible license (cf.: the SA requirement in CC licenses), therefore preserving its ‘openness’. The most common license of this kind is the GNU GPL (General Public License) in its various versions.
  • Weak copyleft licenses -- are used to license software libraries; the copyleft requirement in these licenses applies only to the modifications of the code, and not all the code that is linked to it. These licenses allow linking between software licensed under copyleft and permissive licenses. The most common examples of such a license are GNU LGPL (Lesser GPL) and Mozilla Public License.

Multi-licensing and re-licensing

Public licenses are in principle irrevocable. However, it is possible to re-license material under a different license, or even to license it under several licenses from the start (this practice is called multi-licensing). If material is licensed under several licenses, these licenses do not apply cumulatively, but alternatively, i.e. the user can choose under which license he wants to use the material (logically, he would choose the least restrictive license, or the license which is compatible with his project). Multi-licensing can not only ‘open up’ material that is already licensed (e.g. material licensed under CC BY-NC can be re-licensed under CC BY), but also solve some specific interoperability problems (e.g. software can be licensed under two incompatible licenses, like GPL and Artistic License, in order to de facto increase its reusability).


What is an ‘open’ license?

Open Science is the movement to make scientific research, data and dissemination accessible to all levels of an inquiring society ( Open Science revolutionizes the knowledge discovery process by amplifying collective intelligence and bringing people with various microexpertise together. For more information about Open Science in general, see:,, M. Nielsen, Reinventing Discovery: The New Era of Networked Science (Princeton University Press 2011).

Open Science movement is based on several principles, most of which have been developed before the term Open Science was introduced [see:]. Some of these principles refer to organisational aspects of scientific work (Open Methodology, Open Peer Review); others, however, are directly connected to IP licensing. The latter include: Open Access (a), Open Data (b) and Open Source (c). Please note that each of these standards adopts a slightly different definition of openness.

  1. Open Access:

The Open Access (OA) movement concerns access to scientific literature, e.g. academic journal articles, conference papers, theses, book chapters, and monographs (but not research data).  It was initiated by three declarations (often jointly referred to as 3B): the Budapest Open Access Initiative (2002, []), the Bethesda Statement on Open Access Publishing (2003, []), and the Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities (2003, []).

Two approaches to OA can be distinguished: gratis OA and libre OA (see: P. Suber, Gratis and Libre Open Access, available at: Gratis OA simply means free online accessibility, but says nothing about any re-use rights: a scientific paper is gratis OA, if it can be accessed for free (without subscription fees) on the Internet, but not further re-used (eg. analysed with a piece of software or modified in any way). Gratis OA does not require any specific licensing solutions (an ‘all rights reserved’ paper can still be freely accessible on the Internet). By contrast, Libre OA consists in granting the user some re-use rights. Unfortunately, there seems to be no common definition of Libre OA, as each of the 3B declarations adopts a slightly different approach. It seems, however, that according to all of them a very broad range of re-use rights should be given to the user (the texts mention the rights to: read, download, copy, distribute, print, search, link, crawl for indexing, pass as data to software, transmit, display publicly, make and distribute any derivative works) which can be exercised for any (‘lawful’ or ‘responsible’) purpose, including commercial uses. The only condition is proper attribution of authorship. In terms of licensing, it seems that Libre OA is therefore best achieved via the use of a CC BY license (preferably 4.0).

There are essentially two ways to publish an OA paper, referred to as green and gold OA [for more information, see:]. Green OA refers to self-archiving (e.g. in an institutional repository or even on a personal website); gold OA refers to publishing in an open-access journal (a situation in which the author has no publication fee to pay has recently been referred to as diamond OA [see:]).

Under article 29.2 of the Horizon2020 Model Grant Agreement [] each beneficiary must ensure open access to all peer-reviewed scientific publications relating to its results (for more information see: []).

b) Open (Research/Science) Data

The concept of open data for science predates the Internet (a landmark date is 1958, the year in which the World Data Centre, now transformed into the World Data System, was created), but it is in the Digital Age that the movement gained momentum. The most commonly used definition of Open Data is the so-called “Open Definition” proposed by Open Knowledge International. According to its short version, a dataset is open if  “anyone can freely access, use, modify, and share for any purpose (subject, at most, to requirements that preserve provenance and openness).” [for the full version see:].

In this approach, openness of datasets is best achieved through the use of CC BY 4.0 and CC BY-SA 4.0. Other less popular licenses: ODC-BY and ODbL (roughly equivalent to CC BY 4.0 and CC BY-SA 4.0), as well as waivers (CC0 and its equivalent PDDL) can also be used. The use of GNU FDL (Free Documentation License) is not encouraged as this license is not compatible with any other license. For more information regarding licenses that comply with the Open Definition, see:

Please note that datasets may often contain personal data (see Personal Data protection  below), in which case they are difficult to be made publicly available.

OpenAIRE’s Open Data Pilot ( addressed the issue of ‘open access to research data’ [sic] generated by selected Horizon 2020 projects.

Apart from researchers, governments and public institutions are an important source of open data. Typically, public bodies release their data under a country-specific license, e.g. DatenLizenz Deutschland ( in Germany, Open Government License ( in the UK, Licence Ouverte ( in France. It is our view that the use of such licenses in research should if possible be avoided.

c) Open Source, or FOSS (Free Open Source Software)

Chronologically, Free Software/Open Source is the first ‘open’ movement and a source of inspiration for other such initiatives. It started in mid-1980s, when Richard Stallman, an MIT-educated hacker, announced the GNU project (aiming at creating an open-source operating system) and when the Free Software Foundation (FSF) was founded. The GNU project eventually led to the creation of Linux in early 1990s. Subsequently, the movement drew more attention from the software industry (e.g. the source code of Netscape Internet suite was released, which enabled the creation of Mozilla Firefox). In order to make open-source model more appealing to business, the Open Source Initiative (OSI) was founded, promoting a somewhat more lenient and business-friendly conception of Open Source Software.

According to the Free Software Definition (originally published by Stallman in 1986), software is free when four freedoms are given to its every user:

  • Freedom 0: The freedom to run the program as he wishes, for any purpose.
  • Freedom 1: The freedom to study how the program works, and to change it. This freedom requires access to the source code.
  • Freedom 2:  The freedom to redistribute copies.
  • Freedom 3: The freedom to distribute copies of modified versions, i.e. to give the whole community a chance to benefit from the changes.
  • [see:]

The Free Software movement emphasises the user’s freedoms and often refers to such ideological concepts as freedom and fairness. It is primarily associated with GNU GPL, a copyleft license, and its various derivatives (LGPL, Affero GPL), but it also approves many other licenses (see:

The Open Source Initiative (OSI) maintains a list of approved licenses which comply with the Open Source Definition []. Among these, the licenses that are widely used and that have strong communities are particularly recommended [see:]. The latter include:

  • Apache License, 2.0 (Apache-2.0)
  • BSD 3-Clause "New" or "Revised" license (BSD-3-Clause)
  • BSD 2-Clause "Simplified" or "FreeBSD" license (BSD-2-Clause)
  • GNU General Public License (GPL)
  • GNU Library or "Lesser" General Public License (LGPL)
  • MIT license (MIT)
  • Mozilla Public License 2.0 (MPL-2.0)
  • Common Development and Distribution License (CDDL-1.0)
  • Eclipse Public License (EPL-1.0)

 As a matter of fact, although the Free Software movement and the Open Source movement are based on different premises, most licenses approved by the FSF are also approved by the OSI (the exceptions are few and rather insignificant; see a chart at: A license that complies with both standards (including all the licenses mentioned above) are sometimes referred to as FOSS (Free Open Source Software) licenses.

Please note that ‘open source’ refers to the freedom of use, modification and redistribution, rather than absence of fees. It is therefore (theoretically) possible that open source software is not ‘free’, and that free software (freeware) is not open source, even though in practice these two categories will often overlap.

License Chooser Tools

Are there any applications that can help me choose a license?                                                                                         

The task of choosing an appropriate license may seem difficult for an average researcher with a limited access to legal advice. As a response to that problem, attempts have been made to build tools (referred to as License Choosers, License Selectors or even License Wizards) that would guide the users through the jungle of available public licenses and allow him to choose one that is the most suitable for his needs.

The Licentia tool ( has been developed in 2014 by Cardellino for INRIA (French Institute for Research in Computer Science and Automation) is in fact a conglomerate of three tools: a License Search Engine (which allows to identify licenses that meet a set of requirements defined by the user), a License Compatibility Checker (which assesses whether two licenses are compatible, i.e. whether material licensed under those two licenses can be ‘mixed’) and a License Visualiser (an interesting extra feature which produces graph-based visualisations of licenses expressed in ODRL - Open Digital Rights Language Deontology).

The ELRA (European Language Resources Association) License Wizard (http://, released in April 2015, allows users to define a set of features and browse corresponding licenses. For now, the tool only includes CC, META-SHARE and ELRA licenses, so it is particularly useful for language resources.

Finaly, the Public License Selector ( developed by Kamocki, Stranak and Sedlak in 2014 as a cooperation between two CLARIN centres (IDS Mannheim and Charles University in Prague) uses an algorithm (a series of yes/ no questions) to assist the user in the licensing process. It allows to choose licenses for both data and software, and features a built-in License Interoperability Tool. Licenses that meet the ‘open’ requirement are clearly marked. Finally, unlike the two other tools, it is made available under Open Software/Open Data conditions.

All of these tools have both advantages and disadvantages; their biggest disadvantage is that they use (to a different degree) a very specific language, which in fact requires basic knowledge of Intellectual Property Law from the user. They also necessarily involve a certain degree of over- or undergeneralization, especially when it comes to assessing license interoperability. Nevertheless, they remain very useful for the research community and may indeed help facilitate re-use and sharing of tools and data.

Personal Data protection

by Pawel Kamocki

History and Sources

Germany (in 1977) and France (in 1978) were the first countries to adopt laws concerning processing of personal data. Initially, the rationale behind these rules was to protect the citizens’ privacy vis-à-vis public administration. The role of these rules changed in the 1990s, when the use of the Internet and information processing technologies became widespread among both businesses and individuals, and so did the threat for data privacy.

These early national laws inspired OECD’s Recommendations Concerning Guidelines Governing the Protection of Privacy and Trans-Border Flows of Personal Data, adopted in 1980 [] and the Council of Europe’s Convention for the protection of individuals with regard to automatic processing of personal data, adopted in 1981 [known as Convention 108:]. These documents, and in particular the second one, helped shape the EU data protection framework.

Data protection rules has been harmonised at the EU level by the Directive 95/46/EC of 24 October 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data [ ], (‘Personal Data Directive’). This framework is completed by the Directive 2002/58/EC of 12 July 2002 concerning the processing of personal data and the protection of privacy in the electronic communications sector (‘E-Privacy Directive’) [ ]. These will soon be replaced by the EU Regulation 2016/679 (General Data Protection Regulation) of 27 April 2016 [ ] which will enter into force on 25 May 2018 (see General Data Protection Regulation below).

Directives do not apply directly in the legal systems of the Member States; in order to be effective, they need to be implemented (transposed) into national laws. The national laws governing the processing of personal data include e.g.:

While the rules are harmonised, some differences between national laws (even important ones from the point of view of scientific research) may still exist.

The Personal Data Directive will soon be replaced by the EU Regulation 2016/679 (General Data Protection Regulation) of 27 April 2016 [ ] which will enter into force on 25 May 2018. Regulations, unlike directives, apply directly in the Member States and do not require implementation -- once the Regulation is adopted, all the Member States will apply a unified set of rules. For more information about the upcoming changes, see General Data Protection Regulation below.

Art. 29 of the Personal Data Directive created the Data Protection Working Party (referred to as art. 29 Working Party, or WP29: ) -- a European body made up of representatives of national data protection authorities of each EU Member State. The opinions of WP29 are not formally binding, but they have persuasive authority.

Every EU Member State has its own National Data Protection Authority (NDPA):

In many Member States it is mandatory for bigger institutions to appoint a Data Protection Officer (Datenschutzbeauftragter, correspondant informatique et libertés), who serves as a liaison between the institution and its employees and the national data protection authority. If you work for an institution that employs a Data Protection Officer, do not hesitate to contact  them — he/she will help you with all your questions concerning personal data processing.

Overview of the Data Protection framework

What is personal data?

Personal data is any information relating to an identified or identifiable natural person (art. 2 a) of the Personal Data Directive). WP29 in its Opinion on the concept of personal data [] analysed this definition into four elements:

            a) any information regardless of its nature (facts, opinions, even untrue or unproven information) and of its form (textual data, sound, image, digital or analogue);

            b) relating to; an information relates to a person if it tells something about a person, i.e. his identity, characteristics or behaviour, or if it can be used to evaluate the situation of an individual. Information can relate to an individual directly (‘Peter is six feet tall’) or indirectly, e.g. via an object (‘This extravagant limousine belongs to Peter’ tells something about Peter’s economic situation, i.e. that he is well-off);

            c) identified or identifiable; a person is identified if he or she is singled out directly (via a name, unless it’s very common (e.g. Smith)) or indirectly (e.g. via a phone number). A person is identifiable if he or she can be identified by any means likely reasonably to be used (see recital 26 of the Personal Data Directive). In assessing whether means are likely reasonably to be used, one should take into account the costs, the relevant interests of the data subject (i.e. the person that the information relates to), the potential benefits for the data controller (i.e. the person who is processing data) and the risk of dysfunctions. For example, while it is rather unlikely that someone would employ a costly high-end technology in order to learn that Mr. X is a plumber, or that he drives a Honda, when it comes to more sensitive information (Mr. X’s genetic predisposition to lung cancer or his social security number) the probability is higher. In short, the more sensitive the information, the higher standards for identifiability should apply;

            d) natural person; Personal Data Directive protects only living natural persons. However, information about dead persons or legal persons may indirectly relate to identified or identifiable natural persons (e.g. ‘The man who died of a rare genetic disease at age 42 was Peter’s father’).

            The definition of personal data is therefore extremely broad and covers all sorts of information that relate to a person, including not only the person’s name, phone number and address, but also various facts about the person’s past, opinions about the person, his or her social security number, IP address, voice, biometric information (way of walking or speaking), DNA sequences etc. This has to be kept in mind while processing all sorts of language resources, especially those containing interviews, images or voice recordings.

            For further information, see WP29 opinion on the concept of personal data [].

Are there any special categories of personal data? Are all the data equally sensitive?

Certain categories of personal data are particularly sensitive and as such benefit from a stronger protection (art. 8 of the Personal Data Directive). These include data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, trade-union membership, health or sex life. Processing of these categories of data is in principle prohibited, unless the data subject has given his explicit consent (see below).

What is anonymisation / anonymised data?

Anonymised data, i.e. data that no longer contain any information that can be related to an identified or identifiable natural person, are no longer regarded as personal data and therefore can be freely processed. In assessing whether data are properly anonymized, account should be taken of all means likely reasonably to be used to identify the data subject (including cross-reference with another dataset).

Some anonymization techniques include randomization (noise addition, permutation, differential privacy) and generalization (aggregation, k-anonymity, l-diversity, t-closeness).

The WP29’s opinion on anonymisation [ ] sets a very high standard for anonymisation, especially by pointing out the possibility of identification of data subject via cross-reference with other available datasets (e.g. social media). If you want to anonymise your dataset, contacting the Data Protection Officer at your institution may be a good first step.           

Please note that anonymisation is NOT equivalent to pseudonymisation, which consists of separating identifying elements from the dataset and keeping it in separation. Pseudonymised data are still personal data. Pseudonymisation is, however, expressly recognized by the new Regulation as one of the safeguards for the data subject’s interests.

What qualifies as processing of personal data?

Processing is a legal notion that covers all sorts of operations that can be performed upon data regardless of its purpose, including (but not limited to): collection, storage, consultation, use, dissemination (or making available otherwise), erasure or destruction (see art. 2 b) of the Personal Data Directive).

What conditions need to be met in order for personal data processing to be lawful?

In order to be lawful, processing has to meet a series of requirements: it has to be legitimate (1.), data have to comply with certain quality standards (2.), and -- to the extent required by applicable national law -- some formalities may need to be accomplished (3.).

  1. Making processing legitimate

Data processing is legitimate if the data subject (i.e. the person that the data relate to) has unambiguously given his consent (a). As a general rule, one should always obtain the data subject’s consent before undertaking any processing of personal data; nevertheless, some alternatives to consent remain available (b).

  1. The data subject’s consent.

The Personal Data Directive defines consent as “any freely given specific and informed indication of [the data subject’s] wishes by which the data subject signifies his agreement to personal data relating to him being processed” (art. 2 h)). Even though the Directive does not require that consent be in writing (but some national laws may do so), it is always preferable to obtain written consent, be it only for the purpose of proof. On the other hand, the definition requires that consent must be specific — therefore, it cannot be too general (arguably, consent for processing ‘for the purposes of scientific research’ is too general and therefore invalid; instead, the consent should probably specify the domain of the research, the name of the organisation that carries it out or the name of the research project). Consent also has to be informed, which means that a certain number of information about the processing has to be provided to the data subject before he can validly consent to the processing. These include: the identity of the data controller (the person who processes the data) and the data recipient (the person that the data are to be transferred to — if applicable), the purpose of the processing (which itself has to be explicit, legitimate and specific) and the data subject’s right to access (see below).

For more information about consent, see WP29’s opinion [].

The drafting of a consent form is a delicate task that should be performed by a trained lawyer and take into account the specificity of local law. If you want to have a consent form drafted for your project, contact a lawyer or a Data Protection Officer in your institution.

            b) Alternatives to consent

In practice, it is not always possible to obtain the data subject’s consent. This is why the Directive enumerates a number of situations in which processing may still be legitimate without consent (so-called: alternative grounds for legitimacy). These include: when processing is necessary for the performance of a contract to which the data subject is party (e.g. buying or selling a house, or even a car, would require some processing of personal data; in such a case, the contract ‘replaces’ consent and a separate consent for processing is no longer necessary), when processing is necessary in order to protect vital interests of the data subject (e.g. if the data subject is unconscious and needs a quick blood transfusion there is, obviously, no need for a written consent), but also when “processing is necessary for the purposes of the legitimate interests pursued by the controller or by the third party or parties to whom the data are disclosed”. Arguably, research (at least in domains like medicine or pharmacology) can be such a legitimate interest. This is why certain national legislators could adopt a research exception in their national laws (see below) based on this ground. Some degree of harmonisation as far as the interpretation of this ground is concerned has been recently provided by WP29 in its opinion []. Nevertheless, it shall remain a ‘security valve’ which is only used when it is practically impossible to obtain the data subject’s consent.

            2. Data quality standards

According to the Directive, personal data should be collected for specific, explicit and legitimate purpose (which the data subject should be informed of, e.g. in a consent form). The data should be adequate, relevant and not excessive in relation to this purpose; they should be accurate and, where necessary, kept up to date; and finally, they should be kept for no longer than necessary to achieve the specific purpose. Some exceptions concerning scientific research may exist in the Member States (see 3.7 below).

Moreover, the person who processes the data must implement appropriate technical and organizational measures to protect personal data against accidental or unlawful destruction or accidental loss, alteration, unauthorized disclosure or access, in particular where the processing involves the transmission of data over a network, and against all other unlawful forms of processing.

            3. Formalities

As a general rule, before processing is carried out, the data controller should notify the national supervisory authority (a list of which can be found here: about the processing and its purposes. The details of this obligation, as well as possible exemptions, are specified in national laws. If you want to know about them, please contact the Data Protection Officer in your institution, or your national supervisory authority.

What are the rights of the data subject?

Even if the data subject (i.e. the person that the data relate to) consented to processing, he or she retains certain rights with regards to his personal data. The most important of these rights is to be informed about the purpose of processing, the identity of the data controller etc. (cf. above about consent) -- this information should be provided in a consent form. But the data subject also has a right to access the data in order to rectify them, erase inaccurate data or block their further unlawful processing. Following his request, the data subject should be granted access to his data without excessive delay or expense. A particular instance of this right, concerning erasure of personal data from an Internet search engine, is commonly referred to as ‘the right to be forgotten’ (see: CJEU case C-131/12 Google Spain:

Finally, the data subject has a right to object to the processing of his data for direct marketing purposes (this is why if you are receiving a commercial newsletter, you should be given a possibility to unsubscribe at every moment).

Are there any specials rules related to research?

According to the Personal Data Directive, personal data have to be processed for a specific purpose and no further processed in a way incompatible with this purpose. This means that — a contrario — data can be further processed for purposes compatible with the original purpose (the so-called ‘purpose extension’). The text of the Personal Data Directive, as well as the WP29 opinion [] suggest that scientific research (especially historical and statistical) may often be regarded as a compatible purpose, therefore allowing to process data which were originally collected for a different purpose, for scientific purposes. However, before you rely on this principle, consult your national law or contact the Data Protection Officer in your institution.

The Personal Data Directive contains no explicit exceptions for research; however, some of the principles of the Directive (processing for pursuit of legitimate interests of the data controller, further processing for compatible purposes) allow a degree of flexibility here. Therefore, research exceptions can be found in national laws (most importantly: s. 33 of the UK Data Protection Act; LDSGs in German states), but they will often be formulated in a narrow and imprecise way. For the sake of clarity and legal security it is always better to obtain the data subject’s written consent. However, if you want to know more about research exceptions in your jurisdiction, contact the Data Protection Officer in your institution.

Can personal data be transferred abroad?

Within the European Union (and the EEA) personal data can be transferred freely. This is also the case when data are transferred towards a third country that ensures an adequate level of data protection. The adequacy of the level of protection is officially assessed by the European Commission; the relevant decisions can be found here: Most importantly, according to the EC, the United States does not ensure an adequate level of protection (although transatlantic data transfers are facilitated by the Privacy Shield agreement [] which has replaced the recently invalidated Safe Harbour agreement).

Data can be transferred towards the countries that do not ensure an adequate level of data protection in a limited number of cases, e.g. if the data subject has given his unambiguous consent for the transfer. Therefore, if you want to share a dataset containing personal data with a research team from the US, you should modify your consent form accordingly.

Changing laws in EU

Text and Data Mining Exception in the UK

On June 1, 2014, new amendments to copyright regarding research and education came into effect in the UK. The text of the new laws may be read here. The UK Intellectual Property Office has written some easy-to-read explanations of the reform here.

Text and data analysis for non-commercial research

Among those new amendments, the one that is most relevant to scientific research is the addition of Art. 29A to the Copyrights, Designs and Patents Act (CDPA), concerning ‘copies for text and data analysis for non-commercial research’ . The text reads:

(1) The making of a copy of a work by a person who has lawful access to the work does not infringe copyright in the work provided that—

(a)the copy is made in order that a person who has lawful access to the work may carry out a computational analysis of anything recorded in the work for the sole purpose of research for a non-commercial purpose, and

(b)the copy is accompanied by a sufficient acknowledgement (unless this would be impossible for reasons of practicality or otherwise).

(2) Where a copy of a work has been made under this section, copyright in the work is infringed if

(a) the copy is transferred to any other person, except where the transfer is authorised by the copyright owner, or

(b)the copy is used for any purpose other than that mentioned in subsection

(5) To the extent that a term of a contract purports to prevent or restrict the making of a copy which, by virtue of this section, would not infringe copyright, that term is unenforceable.” (emphasis added)

The scope of this new exception was restricted by Art. 5.3(a) of the InfoSoc Directive (see Copyright Exceptions above). This is why it had to be limited to non-commercial research, and require acknowledgement of the source (unless it would be impossible). Two elements, however, make this exception more than a simple transposition of the research exception in the Directive (please note that the UK already has a relatively broad research exception - Art. 29 CDPA):

  • On one hand, lawful access to the work is a pre-condition of this exception. It is, however, fairly unclear what lawful access means exactly. The definition of ‘lawful use’ in Recital 33 of the InfoSoc Directive (‘A use should be considered lawful where it is authorised by the rightholder or not restricted by law’ -- and reproductions of copyright-protected works are, in principle, restricted by law) may provide some guidance here, but the solution would then seem absurd (an authorisation of the rightholder would be necessary to rely on the exception, since a contract between the rightholder and the user could determine what is and what is not ‘lawful use’!). The guidance published by the UK Intellectual Property Office [ suggests however that one has lawful access to the work if he or she has a right to read it. This condition remains open to some interpretation in the future.
  • On the other hand, the new exception is expressly non-overridable by contractual clauses (see Copyright Exceptions for general rules concerning the relation between copyright exceptions and contracts). This means that a contractual clause (e.g. in a license or in terms of service) that prevents or restricts TDM is unenforceable. This is very important for researchers.

Furthermore, it should be noted that this new exception is limited to the reproduction right, even though art. 5.3(a) of the InfoSoc Directive allows for exceptions to both reproduction and communication to the public rights. In practice, this means that copies made in the process of TDM cannot be shared or transferred -- this would amount to copyright infringement.

It should also be noted that publishers may still restrict access to their databases with Digital Rights Management (DRM, also referred to as Technological Protection Measures (TPMs)). Indeed, circumvention of such measures is prohibited (see Art. 6 of the InfoSoc Directive) and constitutes copyright infringement. UK law does provide a process for requesting access to works protected by DRM, which is to write to the UK Secretary of State under the terms of section 296ZEA of the Copyright, Designs and Patents Act (also see Jisc, EFF, and Martin Eve). The Libraries and Archives Copyright Alliance (LACA) underwent this process in 2015, on behalf of a researcher at a UK university who wanted access to TPM-protected material on a website. The Intellectual Property Office turned down the request for remedy because of language in 296ZEA: "does not apply to copyright works made available to the public on agreed contractual terms in such a way that members of the public may access them from a place and at a time individually chosen by them." See CILIP press release.

Finally, this new exception concerns only copyright and not the sui generis database right. It seems that in many cases the owner of the sui generis database right may prevent TDM operations (as it has the exclusive rights of extraction and re-utilization of a database’s substantial part).

Text and Data Mining exception in France

On October 7, 2016, a Law for a digital Republic (Loi pour une république numérique) was adopted in France. As expected, thanks to the intervention of the Parliament in the text of the bill (, the document contains TDM exceptions which have been introduced in the Intellectual Property Code (CPI), effective immediately.

According to art. L. 122-5, 10° of the CPI the rightholders cannot forbid “Copies and digital reproductions made from a lawful source for the purposes of mining text and data included in or associated with scientific publications, for public research purposes, excluding all commercial purposes” [translation - PK]. The details related to the application of this new rule are to be regulated by a decree which is expected to appear within the next six months. Until then, the provision remains a dead letter. It is interesting, however, that the text does not speak about “lawful access” or “lawful user”, but instead focuses on the lawfulness of the source itself. The exception, however, is limited to “public research” (i.e., probably, research carried out by public organisations) and to data “included in or associated with scientific publications”.

Interestingly, a similar (yet different) provision has also been adopted regarding the sui generis database right. According to art. L. 342-3°, the rightholder cannot forbid “Copies and digital reproductions of a database by a lawful user, made for the purposes of mining text and data included in or associated with scientific publications, for research purposes, excluding all commercial purposes. Archiving and communication of technical copies made during the process, after the completion of the research project for which they were made, is guaranteed by an organisation designated by decree. Other copies and reproductions are deleted”. This text is rather enigmatic (especially in the context of art. 9 of the Database Directive) and its interpretation will not be an easy task.


German Exceptions: Copyright in the Knowledge Economy (2017)

A new law on copyright in the knowledge economy, Urheberrechts-Wissensgesellschafts-Gesetz, was adopted by the Bundestag in mid 2017 to replace ss. 52a, 52b and 53b with new sections 60a-60h. It enters into force March 2018 and will remain valid for a limited period of five years. After this time, the legislator must expressly prolong their validity or replace them with different rules, following the likely adoption of a new Directive on the Digital Single Market (see below). See the full text of the law (German).

Most relevant to language resources are the elimination of §52a, which governs “making works available to the public for teaching and research”, and the replacing of §60 with an extensive set of proposed rules on “permitted uses for teaching, research, and institutions”, a new §60a through §60h. These set specific percentages for what portion of works may be used for different purposes, such as teaching and research, and remove key areas of legal uncertainty. Of particular interest to language resources are:

  • § 60a on Teaching:

    • In order to illustrate teaching at educational institutions, up to 15 per cent of a published work may be reproduced, disseminated, made publicly available, and otherwise publicly reproduced for non-commercial purposes.

  • § 60c on Scientific Research:

    • For the purpose of non-commercial scientific research, up to 15 percent of a work may be reproduced, disseminated and made publicly accessible for a defined circle of persons for their own scientific research and for individual third parties, to the extent that this serves to verify the quality of scientific research.

    • For personal scientific research (presumably, research by a single researcher), up to 75 per cent of a work may be reproduced.

    • It is not permitted by paragraphs under these provisions to include these in the form of public lectures, screenings or demonstrations of a work, or to make them publicly available.

  • § 60d on Text and Data Mining:

    • For text and data mining on vast numbers of works (as source material) for scientific research, it is permitted:

      • to reproduce the source material in an automated and systematic manner in order to create a corpus, especially through normalization, structuring and categorization,

      • to make the corpus accessible to a defined circle of persons for joint scientific research as well as to individual third parties to check the quality of scientific research,

      • For non-commercial purposes only

      • The corpus and the copies of the source material shall be deleted after completion of the research and access to the public is to be terminated. However, it is permissible to transmit the corpus and reproductions of the original material to libraries and archives for permanent storage.

      • As far as the sui generis database right is concerned, such data mining as specified above is deemed lawful and therefore cannot be prohibited by the rightholder.

Overall, the proposals give much clearer guidance for researchers while remaining within  the constraints of EU law (especially art. 5 InfoSoc Directive).


EC’s Proposal for the Directive on Copyright in the Digital Single Market

Copyright reform in the EU has been a long time coming. The latest development is that on September 14, 2016, a Proposal for a Directive on Copyright in the Digital Single Market was released.

Article 3 of this proposal covers text and data mining by research organizations. It reads:

1. Member States shall provide for an exception to the rights provided for in Article 2 [reproduction right] of Directive 2001/29/EC [the InfoSoc Directive], Articles 5(a) [reproduction of copyright-protected databases] and 7(1) [extraction and re-utilisation of a database] of Directive 96/9/EC [the Database Directive] and Article 11(1) of this Directive for reproductions and extractions made by research organisations in order to carry out text and data mining of works or other subject- matter to which they have lawful access for the purposes of scientific research.

2. Any contractual provision contrary to the exception provided for in paragraph 1 shall be unenforceable.

3. Rightholders shall be allowed to apply measures to ensure the security and integrity of the networks and databases where the works or other subject-matter are hosted. Such measures shall not go beyond what is necessary to achieve that objective.

4. Member States shall encourage rightholders and research organisations to define commonly-agreed best practices concerning the application of the measures referred to in paragraph 3.

First, it should be noted that the text is only a proposal; if the new Directive is adopted (which is likely to take about two years), its final shape may be completely different from the proposal (in fact, it is not rare that the Commission makes bold proposals which are then significantly watered down during the negotiations). If the proposal is adopted, it may constitute a major step forward for TDM in Europe. The following elements of the proposal deserve to be mentioned:

  • The proposed TDM exception would be mandatory, and not optional (like the exceptions of Art. 5.2 and 5.3 of the InfoSoc Directive), and as such it would have to be adopted by all the EU Member States;
  • The exception would concern both copyright and the sui generis database right. It is a major improvement on the scarceness of exceptions to the sui generis database right provided for in Art. 9 of the Database Directive.
  • The exception would apply only to research organisations (such as universities and research institutes). According to recital 4 of the proposal, “Despite different legal forms and structures, research organisations across Member States generally have in common that they act either on a not for profit basis or in the context of a public-interest mission recognised by the State. Such a public-interest mission may, for example, be reflected through public funding or through provisions in national laws or public contracts. At the same time, organisations upon which commercial undertakings have a decisive influence allowing them to exercise control because of structural situations such as their quality of shareholders or members, which may result in preferential access to the results of the research, should not be considered research organisations for the purposes of this Directive”. It seems therefore that even though commercial purposes are not expressly excluded from the scope of the exception, research organisations “upon which commercial undertakings have a decisive influence” would not be able to benefit from the exception, which is rather disappointing;
  • Lawful access would be a pre-condition of this exception (see the remarks concerning the UK exception above). Recital 3 of the proposal mentions “lawful access, for example through subscriptions to publications or open access licences”; this seems to mean that access is lawful if it is authorised by the rightholder. It remains unclear therefore if data available in “gratis Open Access” (i.e. freely accessible on the Internet, but without any re-use rights -- see What is an ‘open’ license? above);
  • The exception would not be overridable by contractual clauses (see the remarks concerning the UK exception above);
  • The exception would *not* override Digital Rights Management/Technological Protection Measures, which is regrettable. It seems, however, limit the publishers’ right to use DRM that are necessary to ensure the security and integrity of their data. Par. 4 encourages Member States to define best practices regarding the use of DRMs; however, these best practices would likely be country-specific, which would go against the purpose of a directive, i.e. harmonization of law throughout the EU.

The draft has been criticized by research organizations and nonprofits such as Creative Commons (“The Directive fails to deliver on the promise for a modern copyright law in Europe”) and Wikimedia (“The European Commission’s leaked plans for EU copyright reform show that their primary concern is rightsholder revenue, with the public’s interest in accessing and sharing knowledge taking a back seat.”). Criticisms include that the text and data mining exception is available only to nonprofit research institutions, that sites that host “large amounts of works” will be compelled to enter agreements with rightsholders to monitor their platforms for copyright infringement, the creation of a new 20-year copyright for press publishers that  will require news aggregators (such as Google News) to pay fees in order to aggregate articles.

General Data Protection Regulation

What is the General Data Protection Regulation (GDPR)?

GDPR is a EU regulation adopted on 14 April 2016, after over four years of the adoption process (the European Commission released the proposal for GDPR on 25 January 2012). Its aim is to unify data protection within the EU, and to strengthen the legal protection of individuals against unlawful processing of their personal data. The GDPR is going to replace the Personal Data Directive. It will come into force on 25 May 2018.

What is a regulation?

A regulation is a legal act of the European Union. Unlike directives (which require transposition into Member States’ internal laws), regulations are of direct effect and they apply in a uniform manner in all EU Member States. GDPR will therefore, at least to a large extent, replace not only the Personal Data Directive, but also the national laws on data protection. Full unification, however, will not be achieved, as the GDPR leaves some aspects to the discretion of the Member States (for example concerning derogations from certain rights of the data subjects whose data are processed for research purposes -- see below).

So, what is going to change?

In a nutshell: not that much. Indeed, the GDPR is not intended to be a revolution, but an evolutionary step forward. The general framework will remain largely unchanged, and the additions will mostly come from well-established opinions of the WP29. The following changes should be mentioned here:

  • the territorial scope of the EU data protection rules will be widened. Unlike the Directive, the Regulation will apply to all the companies who are offering services to data subjects in the EU, or monitoring their behaviour (art. 3 of the Regulation), even if they are established on foreign territory. This mechanism has been designed to ‘capture’ such US companies as Facebook or Google, who often managed to dodge EU data protection rules by claiming that they do not carry out any personal data processing on the territory of the EU.
  • on the same note, the fines for breach of data protection rules will now be revenue-based (up to 4% of annual worldwide turnover). This is intended to have a dissuasive effect, particularly on large US-based companies (such as those mentioned above).
  • genetic data and biometric data have been included in the list of special categories of data (sensitive data - see above about special categories of data in the Personal Data Directive).
  • pseudonymisation has been officially included as one of the safeguards for the rights and freedoms of data subjects. Article 4(5) defines pseudonymisation as ‘the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to a natural person’. Pseudonymisation is *not* equivalent to anonymisation, but may in some special cases (including processing for research purposes) be a sufficient safeguard to allow for processing of data without the data subject’s consent.
  • data processors (i.e. entities who process data *on behalf* of the data controller) will have to comply with a number of obligations (art. 28, 30, 31, 32, 37…), which is not the case under the Directive (which is focused on data controllers). The obligations of data processors will therefore cumulate with those of data controllers, creating an additional layer of protection for data subjects’ interests.
  • a new principle of accountability has been introduced (Art. 5(2)); according to this principle, data controllers shall be responsible for, and able to demonstrate the compliance with the rules governing processing of personal data. In order to do so, data controllers shall conduct a risk assessment (art. 35), implement data protection measures (both organisational and technical) “by design and by default” (art. 35) and keep detailed records of processing operations (art. 30). This will place a significant burden (including burden of proof) on data controllers.
  • data subject’s consent will be governed by stricter rules. Apart from being (as per the Directive) freely given, specific and informed, consent will now also have to be “unambiguous” (art. 4(11)); furthermore, according to Recital 32, if consent is requested by electronic means, the request must be clear, concise and not unnecessarily disruptive to the use of the service for which it is provided; if consent is given in the context of a written declaration which also concerns other matters (e.g. in terms of service), the request for consent shall be clearly distinguishable from these other matters (art. 7(2));
  • the Regulation contains a detailed list of elements to be taken into account while assessing compatibility of purposes if the controller wants to process data for a new, compatible purpose (as processing data for a new purpose does not require new consent if the new purpose is compatible with the original one -- see above about purpose extension); pseudonymisation may be taken into account here;
  • data controllers will be required to notify any personal data breach to the supervisory authority ‘without undue delay, and where feasible, not later than 72 hours after having become aware of it’ (art. 33);
  • the existing rights of data subjects (see above) will generally be reinforced (defined in more detail and accompanied with provisions which make it easier to claim damages); a new right to data portability will be added according to which in most cases the data subject shall have the right to receive the personal data concerning him or herin a structured, commonly used and machine-readable format and to transmit those data to another controller (art. 20);
  • more organisations will have to appoint Data Protection Officers (see art. 37-39).

Processing of data for research purposes will be governed by art. 89 of the GDPR. It says that:

  • when such processing is carried out, organisational and technical measures should be in place in order to ensure that appropriate safeguards have been implemented to protect the interests of data subjects, and in particular to ensure that the data minimization principle (according to which processing shall be limited to necessary data) is respected;
  • pseudonymisation and anonymisation should be implemented whenever the purposes can be fulfilled in that manner;
  • EU or Member State laws may -- if necessary -- allow to limit some rights of the data subject. No such specific laws exist at the moment, and the future of existing national provisions on data processing for research purposes is uncertain.

Moreover, recital 33 concerns consent for data processing for research purposes. It says that ‘It is often not possible to fully identify the purpose of personal data processing for scientific research purposes at the time of data collection. Therefore, data subjects should be allowed to give their consent to certain areas of scientific research when in keeping with recognised ethical standards for scientific research. Data subjects should have the opportunity to give their consent only to certain areas of research or parts of research projects to the extent allowed by the intended purpose’.

Arguably, the purpose extension principle has gained some flexibility under the GDPR, which allows research purposes to enter more easily in that category. That would mean that it will become easier to process data originally collected for a different purpose, for research purposes. How this will work in practice, however -- just like many other changes under the GDPR -- remains to be seen.

Further Reading

Official policy documents (in chronological order):

  • OECD Principles and Guidelines for Access to Research Data from Public Funding (2007). Link.
  • Green Paper on Copyright in the Knowledge Economy (2008). Link.
  • Communication from the European Commission of 19.10.2009: Copyright in the Knowledge Economy. Link.
  • European Commission Recommendation of 17.07.2012 on access to and preservation of scientific information. Link.
  • Communication from the European Commission of 9.12.2015: Towards a modern, more European copyright framework. Link.
  • Guidelines on Open Access to Scientific Publications and Research Data in Horizon 2020 (2016). Link.

Reports and Studies (in chronological order):

  • First evaluation of Directive 96/9/EC on the legal protection of databases (European Commission, 2005). Link.
  • GOWER, Andrew. Gower’s Review of Intellectual Property (UK, 2006). Link.
  • Report on the application of Directive 2001/29/EC on the harmonisation of certain aspects of copyright and related rights in the information society (European Commision, 2007). Link.
  • La mise à disposition ouverte des oeuvres de l’esprit (France, 2007). Link.
  • HÄDER, Michael, Der Datenschutz in den Sozialwissenschaften. Anmerkungen zur Praxis sozialwissenschaftlicher Erhebungen und Datenverarbeitung in Deutschland (Germany, 2009). Link
  • DE COCK BUNING, Madeleine, Barbara van Dinther, Christina H. Jepperson-de Boer, Allard Ringnalda. The legal status of research data in the Knowledge Exchange partner countries (Knowledge Exchange, 2011).  Link.
  • HARGREAVES, Ian. Digital Opportunity: a Review of Intellectual Property and Growth (UK, 2011). Link.
  • GUIBAULT, Lucie and Andreas Wiebe, eds. Safe to be open: Study on the protection of research data and recommendations for access and usage (OpenAIRE+, 2013). Link.
  • DASISH Data Service Infrastructure for the Social Sciences and Humanities (DASISH, 2013). Link.
  • BEER, Nikolaos et al. Datenlizenzen für geisteswissenschaftliche Forschungsdaten - Rechtliche Bedingungen und Handlungsbedarf (DARIAH, 2014). Link
  • Standardisation in the area of innovation and technological development, notably in the field of Text and Data Mining Report from the Expert Group (European Commission 2014). Link.
  • TRIAILLE, Jean-Paul et al. Study on the Legal Framework of Text and Data Mining (European Commission 2014). Link.
  • Review of the EU copyright framework (European Parliament, 2015). Link.
  • KLIMPEL Paul and John H. Weitzmann. Handreichung: Rechtliche Rahmenbedingungen für Digitalisierungsprojekte von Gedächtnisinstitutionen (irights, 2014). Link
  • Enquiries into Intellectual Property’s Economic Impact. Chapter 7: Legal Aspects of Open Access to Publicly Funded Research (OECD, 2015). Link.
  • Making Open Science a Reality (OECD, 2015). Link.
  • KLIMPEL, Paul and John H. Weitzmann. Forschen in der digitalen Welt. Juristische Handreichung für die Geisteswissenschaften (DARIAH, 2015). Link.
  • CASPERS, Marco and Lucie Guibault. Baseline report of policies and barriers of TDM in Europe (Future TDM, 2016). Link.
  • RESEARCH CONSULTING, “Text and data mining in higher education and public research” (2016). Link.

Books and Articles:

  • KELLI, Aleksei, Arvi Tavast, Heiki Pisuke (2012). Copyright and Constitutional Aspects of Digital Language Resources: The Estonian Approach. Link.
  • Legal Framework of textual data processing for Machine Translation and Language Technology research and development activities [wikibook]. Link.
  • UHLIR, Paul F., Peter Schroeder (2007). Open Data for Global Science. Link.
  • STODDEN, Victoria (2009). The Legal Framework for Reproducible Scientific Research. Link.
  • SUBER, Peter (2012). Open Access. Link.
  • NIELSEN, Michael (2011). Reinventing Discovery: The New Era of Networked Science. Princeton University Press.
  • LESSIG, Lawrence (2001). The Future of Ideas. The Fate of the Commons in a Connected World. Link.
  • STALLMAN, Richard M. (2nd ed. 2010). Free Software, Free Society. Selected Essays of Richard M. Stallman. Link.
  • BERNAULT, Carine (2016). Open Access et droit d’auteur. Larcier.
  • CLÉMENT-FONTAINE, Mélanie (2014). L’oeuvre libre. Larcier.
  • GUIBAULT, Lucie & Christina Angelopoulos eds. (2011). Open Content Licensing. From Theory to Practice. Link.
  • BLEDIMAN, Dana (2013). Access to Information and Knowledge: 21st Century Challenges in Intellectual Property and Knowledge Governance. Edward Elgar Publishing.

CLARIN Sesources:

  • CLARIN-D Language Resources Legal Issues Bibliography. Link.
  • LIP CLARIN-D Legal Information Platform (English) (German)
  • Creative Commons and Language Resources: General Issues and What's New in CC 4.0 (May 2014, Version 1.1. August 2014) - Pawel Kamocki & Erik Ketzan. Link.

CLARIN Legal Issues Committee (CLIC)

Committee / People

News and Meetings

The CLARIN Legal Issues Committee (CLIC) will hold its next meeting at the CLARIN Annual Conference in Aix-en-Provence, Oct. 26-28, 2016:

CLIC White Papers

  • The CLARIN Legal Issues Committee (CLIC) White Paper Series is a venue for the open access publication of commentary and scholarship on legal issues and language science under the editorial direction of the CLIC.
  • Creative Commons and Language Resources: General Issues and What's New in CC 4.0 (May 2014, Version 1.1. August 2014) - Pawel Kamocki & Erik Ketzan (PDF)
  • Guidelines for Building Language Corpora Under German Law: Guidelines by the DFG Review Board on Linguistics (May 2017) (PDF)