Session, FRI 13:30 - 15:00

Publishing and Libraries II

A distributed search framework for full-text, geo-spatial and semantic search

In the last decade a growing importance of search engines could be observed. Given the increasing number of knowledge exposed and connected within the Linked Open Data Cloud, users expect to be enabled for searching within the LOD cloud for any information. However, diverse data types need specific search functionality, like semantic search, geo-spatial search and full-text search. Hence, using only one data management system will not provide the needed functionality at the expected level.
In this paper, we will describe search services that provide specific search functionality via a generalized interface inspired by RDF. In addition we will describe the developed on-top application layer that is connecting the search services and allow the implementation of a distributed search taking advantage of the best for each search service while connecting powerful tools like Openlink Virtuoso, Elastic Search and PostGIS within one framework. At last, we will isolate the performance challenges.

Andreas Both, Axel-Cyrille Ngonga Ngomo, Ricardo Usbeck, Christiane Lemke, Denis Lukovnikov and Maximilian Speicher

Affiliation

Projections of concepts

In our recent paper, we proposed a new kind of citations, called the expanded citations, which link scientific papers and concepts from them. The expanded citations are represented in RDF and can be processed by machines. In this paper, we use the expanded citations to introduce projections of concepts which can be useful in searching for publications. The analysis of the projections and their time evolution gives a knowledge about the role and the significance of the concept in a given domain.

Marcin Skulimowski

University of Lodz, Poland

High-quality semantic metadata with little effort - an utopia?

The generation of semantic metadata out of unstructured content is an integral part of the editorial workflow in the age of hashtags and likes. Metadata allow the inter-linking of content and foster discoverability. In our definition, semantic metadata are derived from content and not from structures, have tangible contexts and meanings, being thus particularly well suited as interfaces to information units. To do justice to this privileged role, the requirements on the quality of semantic metadata are extremely high. Humans have limited resources for the acquisition of information and a high expectation on systems with which they come in contact. Unfortunately, manual annotation is not an option. As a matter of fact it is inefficient, if viable at all, far too subjective and only practised very reluctantly by content creators. Current solutions often rely on automatic extraction methods followed by enrichment steps based on resources such as thesauri and ontologies. A continuously growing impedance between resources and the progressing knowledge transforms the maintenance of such resources into a Sisyphean task. We show how different metadata post-processing steps can be bootstrapped from a representative and growing repository of information to yield the desired quality. The ideas presented here were implemented as part of the editorial workflow at ZEIT Online, the online version of a highly regarded German newspaper.

About

IntraFind provides products and solutions for an efficient search and research process under consideration of all available data sources of a company. Full-text search and a full range of text analytics methods guarantee optimal and complete search results.
The IntraFind spectrum ranges from simple search in an application to enterprise search, to metadata management up to specialized, search-based applications and text analytics solutions.
IntraFind’ s customer support is comprehensive - beginning with an analysis of requirement, IntraFind offers consultancy, conception and implementation of projects along with the support during running operations. IntraFind Software AG was founded in 2000 and has its main office in Munich, Germany. Renowned IntraFind clients include: AUDI AG, Robert Bosch GmbH, ZEIT ONLINE GmbH.

Breno Faria

Breno started working with text analytics 2005 at Fast Search while still in University in his home town Rio de Janeiro. In 2011 he earned a Masters degree in Computer Science from the Technical University of Munich, where he worked mostly on statistical NLP. After working for two startups, Breno joined IntraFind 2012 as a text analytics software architect, being mainly involved in text classification, tagging systems and information retrieval.