Session, THU 10:30 - 12:00

e-Commerce

A Keyword Spotter for Natural Language Queries

We describe a software tool named "Rootvole" which extracts semantic entities and numerical values from natural language queries (textual or vocal). It has been used successfully in a variety of projects, such as an Android app to search a second hand car database or searching in a TV program.

Based on an analysis of our voice search projects, we were looking for a more lightweight solution compared to those found in the literature. For general processing of voice queries we developed a text parsing library that can be used to match text with semantic concepts.

The Rootvole framework is implemented as a Java library. The extraction algorithm is based on a form of a parsing expression grammar, where we generate the expressions to be detected beforehand by regular expressions and store them in a vocabulary.

The following items get distinguished:

predefned entities: an information entity that does not change over time, e.g. "saturday"
dynamic entities: an information entity that changes over time depending on the database, e.g. "blue mercedes"
values: a numerical value with unit and constraints "maximum 200 kilometres"

The talk motivates the field of voice search, introduces the Rootvole keyword spotter and illustrates its use in different domains and applications.

About

Telekom Innovation Laboratories (T-Labs) are the central research and development unit of the company. Organizationally, T-Labs belongs to the purvey area of responsibility of the Chief Product & Innovation Officer.

Their mandate is to work closely with operative units at Deutsche Telekom, offering new ideas and support in the development and implementation of innovative products, services and infrastructures for Telekom's growth areas.

With locations in Berlin, Darmstadt and Bonn (Germany), Beer Sheva and Tel Aviv (Israel) and Mountain View (U.S.), T-Labs concentrates on medium-term themes and on technologies for setting Telekom apart from its competition and founding new businesses. Some 400 experts and scientists from a wide variety of disciplines, as well as young entrepreneurs, from more than 25 nations all work together at T-Labs.

Felix Burkhardt

Felix Burkhardt does tutoring, consulting, research and development in the working fields human-machine dialog systems, text-to-speech synthesis, speaker classification, ontology based natural language modeling, voice search and emotional human-machine interfaces.

Originally an expert of Speech Synthesis at the Technical University of Berlin, he wrote his ph.d. thesis on the simulation of emotional speech by machines, recorded the Berlin acted emotions database, "EmoDB", and maintains several open source projects, including the emotional speech synthesizer "Emofilt" and the speech labeling and annotation tool "Speechalyzer". He has been working for the Deutsche Telekom AG since 2000, currently for the Telekom Innovation Laboratories in Berlin.

JSON driven semantics and query in Rakuten's Ad services

Rakuten Corporation is Japan's top e-commerce company and its advertisement (Ad) services process around a billion transactions on a given day. High responsiveness, throughput and availability are some of the KPIs which have to be maintained by these Ad services. Therefore, the semantics of its data contents are modelled or described by using JavaScript Object Notation (JSON) format. Since JSON is specifically designed for easier and faster data interchange, it is a best selection for high performance Ad services. JSON is a widely accepted data modelling standard and it is supported by numerous query languages or technologies such as JSONiq, JaQL, JSONPath etc. Moreover, JSON messages are easy to process for the client side JavaScript based content generation technique. Therefore, JSON is our main solution to deal with the issues related to the responsiveness or the performance of one of our Ad services, namely, RASTA (RAkuten Super Tail Ad) Display. Let me share our success story of how we have achieved some of the milestones such as an average response time of appx. 65milliseconds and 24,500 Query per Second (QPS) etc. Much of its credit goes to the approach of JSON based semantics and query. I will also touch upon the future plans of using Linked Data with JSON-LD in order to expand the Ad related data with other services.

About the author

Sidhant is currently working as a senior engineer for Rakuten Inc., Japan which is Japan's top E-commerce and Internet Company. Sidhant is a regular speaker in many international IEEE and ACM conferences. He has authored and published more than six research papers in major IEEE and ACM international conferences in the field of service oriented architecture, web services, E-learning, semantic web and web ontologies. Some of his research papers are indexed in DBLP (http://www.informatik.uni-trier.de/~ley/pers/hd/r/Rajam:Sidhant.html ).

Sidhant has more than 8 years of IT industry work experience mainly in designing, developing and operating web based solutions such as on-line banking, CRM, ERP and advertising etc. He has gained his IT experience by working globally in many reputed international companies such as Oracle Corp., Panasonic Corp. and Rakuten Inc. In 2006, Oracle awarded Sidhant with I-Appreciate Best Performance Award. In 2013, Rakuten Inc. awarded Sidhant with Most Valuable Person Award (MVP) to appreciate his efforts in the performance improvement of the advertisement systems. In Rakuten, Sidhant works for the Big Data Department and mainly for the design and development of the advertisement products. These advertisement products are deployed at many services in Rakuten's eco-system including famous Ichiba E-commerce solution. He is focusing mainly on the continuous improvement of the advertisement products in terms of high performance, availability and architecture. Sidhant has a consistent and glittering academic career fully concentrated on IT and computer science from highly reputed international universities including University of Aizu(Japan) and University of Pune(India). In University of Aizu, he was awarded a prestigious Monbukagakusho scholarship by Government of Japan for studying Master of Science. On social network, Sidhant can be followed at (http://jp.linkedin.com/pub/sidhant-rajam/4/4a7/527).

Sidhant Rajam

Rakuten, Inc. (4755:Tokyo), is one of the world's leading Internet service companies. We provide a variety of products and services for consumers and businesses, with a focus on e-commerce, finance, and digital content. In both 2012 and 2013, Rakuten was ranked among the world’s ‘Top 10 Most Innovative Companies’ in Forbes magazine’s annual list. Rakuten is expanding worldwide and currently operates throughout Asia, Europe, the Americas and Oceania. Founded in 1997, Rakuten is headquartered in Tokyo, with over 11,000 employees and partner staff worldwide. For more information: http://global.rakuten.com/corp/ .

Digital marketing 360 metadata

Digital marketing can benefit greatly from well designed, adaptive metadata architecture that supports the whole life-cycle of the marketing information.

If you consider using metadata for digital marketing and want to benefit from the full potential, then this presentation is essential for you.

It gives you a model how to capture, manage and use the four fundamental type of metadata for intelligent content targeting.

Four fundamental type of metadata are derived from the content lifecycle phases:

Creation of content, when the author also has the best opportunity to create descriptive metadata.
Managing and publishing content, when the business can add the contextual metadata.
Engagement of content, when consumers can contribute their feedback to be added on.
Use of analytics, what has happened in similar situations with other consumers (web analytics), what behavioral metadata can be observed on interacting consumer (real time targeting).

I call the model "Digital marketing 360 metadata". It ties together the four fundamental metadata types of content life-cycle. Leaving one type out – or viewpoint one may call it – causes a distorted or incomplete picture of what content should be shown to each persona or registered consumer.

The concept tells you what your customers care about, and you should too:

the consistent and easy to understand categories and definitions – intuitive faceted search, terms and definition in common language etc.
which products business wants to promote and budle with other products etc. – managing the contextual metadata
engage consumers to tell how they value the actual delivered content, products and services – learning from feedback
observing how other people acted (with the content) in near history and observe how new customer interacts in real time – learning from behavior of the consumers.

Placing the good metadata on web pages

So, all this valuable metadata should be collected, contextualized, structured and then published on the web pages in order to get boosted SEO and to provide more meaningful search results to hasty consumers.

I bet, most of the needed semantic data about products and their "consumer context" is available somewhere in your company’s product and marketing processes, but not retained and piped through the complex (disconnected?) systems to the web page publishing as a high-quality metadata.

That has been the case in the most of companies whom I have consulted. What comes to the structured metadata, there are schemas published and clear guidelines to follow - by Google for example - in order to inject metadata to the HTML markup in your publishing process.

But why are Google and other big boys giving this all out? I guess this is part of their Knowledge graph initiative to semantically model the entire world? But, it makes you also wonder if what they says is biased, since it is in their interests to grab as much as possible meaningful (codified) data from your pages to enlarge their knowledge graph and also – most importantly – to make money from your data.

Shaping up the data quality, harmonization of identifiers for business objects, integrating data flows and establishing a common terminology can take half a decade or longer. It is about educating people and revamping the corporate culture, and not about buying new technology.

However, by choosing the right approach and using semantic technology savvy tools – the data integration may be much easier than you think. SKOS compliant tools with Linked Data integration give you a kick start.

About

We create competitive advantage to our clients by planning fit-for-purpose IT-solutions. Success and value of an IT-project is determined in the planning phase. That is where our key competence and focus is. Our work is always based on the company's business needs, thus resulting to increased sales, better efficiency or improved risk management. Our expertise consists of superior know-how in the selected service areas, proven methodologies and best practices.

Talent Base's service areas:

Heimo Hänninen

Experienced business savvy data and information architect.

Digital Marketing requires high quality metadata about: your consumers, your products, product data and marketing content, your partners, your sales activities and pricing, to mention a few. Linked Open Data and semantic technologies are robust, yet flexible way of merging and managing metadata for marketing from different sources. With LOD you can also realize Enterprise Linked Data in wider scope.

My specialties:

- business analysis

- data architectures

- business ontology, taxonomy and vocabulary design (SKOS, RDFs, Topic Maps)

- master data management and data standards

- creative mind to break down complex tasks into more digestible chunks.

- personable and professional enjoying dialog with both, business and IT people.

Holistic approach is mandatory when designing data models:

* business drivers and scenarios (focus on data abstraction and objectives: what is needed and why? where do you want to go?)

* user stories (focus on user research: how to use the data? who needs the data?)

* information flow (focus on process: identify systems, how data flows? what data and where? )

* data/information sources (focus on integration, inventory and feedback: where and how to get? what is it in details? how to improve it?)