Deliverable 7.2 - GES3 Data Integration

EUBra‐BIGSEA project aims at developing a set of cloud services empowering Big Data analytics to ease the development of massive data processing applications. EUBra‐BIGSEA will develop models, predictive and reactive cloud infrastructure QoS techniques, efficient and scalable Big Data operators and a privacy and quality analysis framework, exposed to several programming environments. 

The Acquisition and Engineering of Georeferenced Environmental, Stationary, Streaming and Social (GES3) data (Task 7.2) is related to the Use Case 1 ‐ (UC1) ‐ Data Acquisition (D7.1). In particular, these data come from sources that are related to urban traffic and cover four main data types: stationary data, dynamic spatial data, environmental data, and social network data. Despite that the EUBra‐BIGSEA pilot has been initially planned over the data of the city of Curitiba, where the pilot case is being constructed, the EUBra‐BIGSEA framework will be applicable (at least partially) to other scenarios. Therefore, the data integration covers the general problem of mechanisms for collecting, cleaning, transforming and integrating all the listed data sources, in order to understand the dynamics of traffic and transportation public services in Brazilian cities.

After a description of all data sources, the integration process has gone through the following steps:

  1. Data sources within the same theme were integrated (such as official sources);
  2.  We performed an integration along different data types (such as stationary and dynamic spatial data);
  3.  We identified their issues as data quality, entity matching, or data mining problem;
  4.  We identified mechanisms to improve their integration and quality for the final user.