Session, THU 13:30 - 15:00

Publishing and Libraries I

A Rule based Metadata Extraction Framework

The project is in use since 2013 at DIN (Deutsches Institut für Normung e. V). Which is a german standardization organization. We do extract mainly bibliographic and content metadata on standards, like references, similarities, keywords, classification data. We also integrated the DIN Terminology Base for NER reasons, which allows content annotation with terms and definitions.

Solved Use Cases are:

1. Standard Recommendation for Beuth Webshop (Publishing House) based on document similarities (http://www.beuth.de/en/)

2. Reference Extraction for DIN standards

3. Document auto classification for ICS

4. Term association Explorer

5. Terminology extraction,

6. Keyword extraction, …

The framework is completely browser based and allows graphical definition of information extraction rules without Java coding skills. Thus our information specialists can extract meaning out of unstructured standard content. Additionally these metadata can be extracted into external files or used for enhancing the semantic index on standards. Thus the Framework is additionally an intelligent extensible search platform with linguistic support, with information visualization capabilities.

„SNIF@DIN –Information Extraction and Information Access Framework for Standards”

Holger Koch and Jakob Pyttlik

www.dinsoftware.de

Entity Facts - A new service by German National Library

The brief of the German National Library is basically to collect Germany's published cultural and scientific heritage since 1913, to preserve it for posterity and to make it accessible for use. It is based in two locations: Leipzig (founded in 1912) and in Frankfurt am Main (founded in 1946). Today the two institutions form a single unit, uniquely charged with a special task for Germany as formulated in the "Law regarding the German National Library". The German National Library therefore comprehensively documents the intellectual, literary and musical output of the German-speaking countries. The collection brief has been expanded to include electronic, non-carrier based publications in response to the shift in publication practices by publishers and authors.

The German National Library is an institution which provides a wide range of services: not only as a public reference library in Leipzig and Frankfurt am Main, but also as a producer and provider of a broad spectrum of services for libraries, the book trade, scientific institutions and, not least, for individual users.

About

The German National Library has published a new web service called Entity Facts. The main goal of Entity Facts is to provide aggregated information about entities from various sources in a way that makes it easy to present this data to users, who mainly come from the cultural heritage and scientific domain.

The information served by Entity Facts is based on the Integrated Authority File (Gemeinsame Normdatei, GND) - the main authority file used in the German-speaking world - and merged with other sources such as Wikipedia, VIAF or IMDb. The information is provided as machine- and human-readable data in a straightforward and lightweight way over an Application Programming Interface (API).

Our intention is to enable a reuse of authority data for developers who do not have domain specific knowledge. This is realized through an easy to understand JSON-LD data model, which is providing ready-to-use data. Linking to and merging data from different sources offers data enrichments, which are improved continuously. The infrastructure of the service is designed to extend and update the dataset easily.

www.dnb.de

Sarah Hartmann and Michael Büchner

Sarah Hartmann is a librarian at the German National Library (DNB) where she works at the Office for Library Standards. Since May 2013 she is part of the team who is responsible for the authority file – Gemeinsame Normdatei (GND) – used in libraries in the German-speaking countries and numerous other institutions mainly from the cultural heritage domain. Sarah led the project to implement the Entity Facts Service at the DNB.

Michael Büchner is a researcher at the IT department of the Germany National Library and working for the Deutsche Digitale Bibliothek. He studied Computer Science and Library and Information Science at the universities of Jena, Erfurt and Leipzig, Germany. Michael Büchner is a member of the coordination team at the Deutsche Digitale Bibliothek, which is responsible for the coordination of the further development of the portal. His focus is on the technical development and evaluation, persistent identifiers and authority files connected with the Deutsche Digitale Bibliothek. Beside that Michael Büchner was engaged as developer in the project of Entity Facts at the Germany National Library.

Semantics for the Music Industry: the Development of the Music Business Ontology (MBO)

In the paper we show the development of the Music Business Ontology (MBO). The MBO was developed in reaction to problems towards data and communication in the music industry. Based on a qualitative pre-study we analyzed the music industry, its players and data and software in use. First, we identified typical services and data formats. Consequently, we extracted concepts and properties from the music business.

The development of software tools for the music business serving well-defined tasks followed the design of the ontology. As a result, the MBO increases transparency of the music business as well as it serves for a better understanding of the music business itself among its actors. The introduction of the Music Business Ontology changes the way actors and systems in the music business interact with each other. It decreases the need for different interfaces and formats and thus considerably reduces complexity.

Frank Schumacher, Ronny Gey and Stephan Klingner

Institute for Applied Informatics, Germany

POSTER

Three birds (in the LLOD cloud) with one stone: BabelNet, Babelfy and the Wikipedia Bitaxonomy

In this paper we present the current status of linguistic resources published as linked data and linguistic services in the LLOD cloud in our research group, namely BabelNet, Babelfy and the Wikipedia Bitaxonomy. We describe them in terms of their salient aspects and objectives and discuss the benefits that each of these potentially brings to the world of LLOD NLP-aware services. We also present public Web-based services which enable querying, exploring and exporting data into RDF format.