Thèse de Doctorat en Informatique de John SAMUEL

John SAMUEL soutiendra à le 6 octobre à 14h à l’ISIMA sa thèse de Doctorat en Informatique, intitulée:804f9a8cb0

 « Feeding a Data Warehouse with Data coming from Web Services. A Mediation Approach for the DaWeS prototype. »

Le jury sera composé de:

Pr. Farouk TOUMANI, Université Blaise Pascal
Dr. Christophe REY, Université Blaise Pascal
Pr. Jérôme DARMONT, Université Lumière Lyon 2
Pr. Omar BOUCELMA, Aix-Marseille Université
Dr. Emmanuel COQUERY, Université Claude Bernard Lyon 1

Résume:

The role of data warehouse for business analytics cannot be undermined for any enterprise, irrespective of its size. But the growing dependence on web services has resulted in a situation where the enterprise data is managed by multiple autonomous and heterogeneous service providers. We present our approach and its associated prototype DaWeS, a DAta warehouse fed with data coming from WEb Services to extract, transform and store enterprise data from web services and to build performance indicators from them (storedenterprise data) hiding from the end users the heterogeneity of the numerous underlying web services. Its ETL process is grounded on a mediation approach usually used in data integration. This enables DaWeS (i) to be fully configurable in a declarative manner only (XML, XSLT, SQL, datalog) and (ii) to make part of the warehouse schema dynamic so it can be easily updated. (i) and (ii) allow DaWeS managers to shift from development to administration when they want to connect to new webservices or to update the APIs (Application programming interfaces) of already connected ones. The aim is to make DaWeS scalable and adaptable to smoothly face the ever-changing and growing web services offer. We point out the fact that this also enables DaWeS to be used with the vast majority of actual web service interfaces defined with basic technologies only (HTTP, REST, XML and JSON) and not with more advanced standards (WSDL, WADL, hRESTS or SAWSDL) since these more advanced standards are not widely used yet to describe real web services. In terms of applications, the aim is to allow a DaWeS administrator to provide to small and medium companies a service to store andquery their business data coming from their usage of third-party services, without having to manage their own warehouse. In particular, DaWeS enables the easy design (as SQL Queries) of personalized performance indicators. We present in detail this mediation approach for ETL and the architecture of DaWeS.

Besides its industrial purpose, working on building DaWeS brought forth further scientific challenges like the need for optimizingthe number of web service API operation calls or handling incomplete information. We propose a bound on the number of calls to webservices. This bound is a tool to compare future optimization techniques. We also present a heuristics to handle incomplete information.