Session, FRI 10:30 - 12:00

Bridging the language utterance Gap

Semantic Applications bridging the Language Utterance Gap

Although full text search functions and search engines exist since decades, intelligent searches are still rarely seen. Users still need to reformulate their queries and need to find related words in order to express the sought concept they have in mind. Neither the IR community nor the search engine builders have recognized yet that with any search a translation task is connected. The language of the information authors needs to be mapped into the language used by the information seekers, even if both speak the same native language. And further, the terms used during retrieval time may be completely different, than the terms used during the information production. Semantic technologies offer a solution for this translation problem.

In this introductory talk we show a number of motivating real world examples where more intelligent retrieval can bridge the language gap between authors and users. Such an intelligent retrieval needs to account for the language utterance of both sides and can use an ontology for translating language utterances into a controlled vocabulary. Since the used terms differ from person to person and over time, the language utterance needs to be acquired and maintained in a continuous process. We argue that companies who have established terminology management or use controlled vocabularies are in a prime position to build up the needed background knowledge for the adoption of semantic searches.

Did you ever recognize that names of entities are not fixed? Your family name may change with marriage or divorce. Streets, places, cities, communities, counties and even countries may be renamed occasionally. Products get renamed as well as departments, companies and organizations. But, what doesn’t change is the use of old names in information. Names of entities are hence only valid for a certain period of time. Again an intelligent search needs to account for this situation. We argue that the time information connected with the validity of a name can be used to extend retrieval functionality. E.g. collecting all relevant information of a department or company if its name has changed, identifying a company under different names in multiple address records, translating historical address for the purpose of geocoding, etc.

About Thomas Hoppe

Thomas Hoppe

Datenlabor Berlin develops customer-specific data products and algorithms from the initial conception up to prototypical solutions, especially but not exclusively for SMEs.

It applies appropriate methods and concepts from data gathering and integration, over data cleansing, text and data mining, data and (social) networks analysis, knowledge engineering and modeling up to the validation and evaluation of the designed algorithms.

Even young, it worked already for a number of well-known customers, including T-Systems, Ontoprise, Vivaki, Ontonym, Europublic and locadeo.

Dr. rer. nat. Thomas Hoppe studied computer science at the Technical University Berlin with a focus on artificial intelligence, knowledge representation and machine learning. In 1995 he received his doctorate from the University Dortmund in the area of logic programming for his thesis “Incremental Partial Deduction”.
After a short intermezzo as research assistant at the Technical University Berlin, where he was responsible for the extraction and modeling of medical knowledge for an information system for the early internet with a semantic knowledge representation about leukemia and bone marrow transplantation, he worked from 1995 for Deutsche Telekom Berkom (later T-Systems Nova) as project manager where he established the work area “search engines and hierarchical navigation systems”. During that phase he developed for the Deutsche Telekom an Intranet search engine – parallel to the rise of Google, based on a related technology – together with Neofonie, and invented two related patents.
In 2004 he moved to Deutsche Telekom Network and Projects (later T-Systems Business Services) where he developed in the knowledge management department together with Ontoprise a document retrieval system based on a semantic representation of products, technologies and documents. He was responsible for the development of the system and the modeling of the domain knowledge.
In 2008 he founded the company Ontonym with three partners from the Free University Berlin. Ontonym focuses on the modeling of background knowledge for semantic search and developed a state-of-the-art semantic search engine on top of Solr. Besides his position as CEO he was responsible for the modeling of ontologies for Ontoprise, Vivaki, T-Systems, Condat, Art&Com and Ontonym’s thesaurus about recruitment and continued education of about 11.400 concepts with about 17.360 terms – partially multilingual. This thesaurus is in use in the semantic search functions of ingenieurkarriere.de, its candidate database, in the wdb-suchportal.de (the search function of the Weiterbildungsdatenbank Berlin-Brandenburg) and myworkbook.de.
In 2014 he founded the Datenlabor Berlin, where he analysis data as data scientist and develops specialized customer-specific data analysis algorithms. As knowledge engineer he supports companies during knowledge acquisition, modeling and maintenance of domain knowledge.
He is co-author of the book “Corporate Semantic Web – Wie semantische Anwendungen in Unternehmen Nutzen stiften” which appear in I/2015 published by Springer.