Go to the content search To enable reserachers to search for specific patterns across collections of data, CLARIN offers a search engine that connects to the local data collections that are available in the centres. The data itself stays at the centre where it is hosted – therefore the underlying technique is called federated content search. The search engine summarizes and displays what is available. An easy next step is to go to the centre's specialised search interface to perform a more sophisticated query.
The technology behind this federated content search is SRU/CQL and a CLARIN-specific extension to this protocol.
Difference between federated content search and metadata search
The federated content search approach differs from the metadata search as e.g. performed in the Virtual Language Observatory, where all metadata is first harvested (copied to a single server) and then centrally indexed. This is for several reasons:
- Legal issues make it impossible for some resources to be copied to another location.
- The size of many data sets makes decentralized indexing the most viable option.
- Most language resources are annotated in a collection-specific manner, which makes it hard to use or develop one single search engine that can cope with all of them.
Although more scaleable, federated content search comes at the cost of being less powerful than a local search and certain features are absent, e.g., ranking. This is why that federated content search will often be particularly useful as a first step to discover where interesting language resources hosted and at which centre(s) a more specialised search could be tried.
- Web page with the specification of the Federated Content Search protocol
- Alpha version of the content search engine (aggregator)
- CE-2014-0400 – (revised) Federated Content Search Aggregator Workplan
- documents from a Federated Content Search workshop