In the last decade a growing importance of search engines could be observed. Given the increasing number of knowledge exposed and connected within the Linked Open Data Cloud, users expect to be
enabled for searching within the LOD cloud for any information. However, diverse data types need specific search functionality, like semantic search, geo-spatial search and full-text search. Hence,
using only one data management system will not provide the needed functionality at the expected level.
In this paper, we will describe search services that provide specific search functionality via a generalized interface inspired by RDF. In addition we will describe the developed on-top application
layer that is connecting the search services and allow the implementation of a distributed search taking advantage of the best for each search service while connecting powerful tools like Openlink
Virtuoso, Elastic Search and PostGIS within one framework. At last, we will isolate the performance challenges.
Affiliation
In our recent paper, we proposed a new kind of citations, called the expanded citations, which link scientific papers and concepts from them. The expanded citations are represented in RDF and can be processed by machines. In this paper, we use the expanded citations to introduce projections of concepts which can be useful in searching for publications. The analysis of the projections and their time evolution gives a knowledge about the role and the significance of the concept in a given domain.
University of Lodz, Poland
The generation of semantic metadata out of unstructured content is an integral part of the editorial workflow in the age of hashtags and likes. Metadata allow the inter-linking of content and foster discoverability. In our definition, semantic metadata are derived from content and not from structures, have tangible contexts and meanings, being thus particularly well suited as interfaces to information units. To do justice to this privileged role, the requirements on the quality of semantic metadata are extremely high. Humans have limited resources for the acquisition of information and a high expectation on systems with which they come in contact. Unfortunately, manual annotation is not an option. As a matter of fact it is inefficient, if viable at all, far too subjective and only practised very reluctantly by content creators. Current solutions often rely on automatic extraction methods followed by enrichment steps based on resources such as thesauri and ontologies. A continuously growing impedance between resources and the progressing knowledge transforms the maintenance of such resources into a Sisyphean task. We show how different metadata post-processing steps can be bootstrapped from a representative and growing repository of information to yield the desired quality. The ideas presented here were implemented as part of the editorial workflow at ZEIT Online, the online version of a highly regarded German newspaper.
Breno started working with text analytics 2005 at Fast Search while still in University in his home town Rio de Janeiro. In 2011 he earned a Masters degree in Computer Science from the Technical University of Munich, where he worked mostly on statistical NLP. After working for two startups, Breno joined IntraFind 2012 as a text analytics software architect, being mainly involved in text classification, tagging systems and information retrieval.