Deliverable 7.4 Toolbox for GES³ Data Second Release

EUBra-BIGSEA project aims at developing a set of cloud services empowering big data analytics to ease the development of massive data processing applications. EUBra-BIGSEA will develop models, predictive and reactive cloud infrastructure QoS techniques, efficient and scalable big data operators and a privacy and quality analysis framework, exposed to several programming environments. EUBra-BIGSEA aims at covering general requirements of multiple application areas, although it will showcase in the treatment of massive connected society information, and particularly in traffic recommendation.

The project starts with the analysis of the use case scenarios that will be used for demonstration, but considering those requirements in a broader way. EUBra-BIGSEA is an API-centric project whose main objective is to create a sustainable international (European and Brazilian) cooperation activity in the area of cloud services for big data analytics. In particular, T7.2 aims at improving efficiency and throughput of data scientists and data curators.

The Acquisition and Engineering of Georeferenced Environmental, Stationary, Streaming and Social (GES³) data (Task 7.2) is related to the Use Case 1 - (UC1) - Data Acquisition (D7.1). In particular, these data come from sources that are related to urban traffic and cover four main data types: stationary data, dynamic spatial data, environmental data, and social network data. Despite that the EUBra-BIGSEA pilot has been initially planned for the data of the city of Curitiba, where the pilot case is being constructed, the EUBra-BIGSEA framework will be applicable to some extent to other scenarios.

Task 7.4 deals with the creation of predictive models from the aforementioned GES 3 data sources, in order to understand and anticipate traffic and transportation public services scenarios in Brazilian/European cities. These models are based on a specific set of data mining and machine learning supervised techniques such as linear and logistic regression, support vector machines and gaussian processes. Similarly to the descriptive model algorithms implemented in the first release of the Toolbox for GES³ Data (described in D.7.3), the implemented code of this second release will be used as a proof-of-concept in different WPs to, for example, evaluate the infrastructure (WP3), data services (WP4), expressiveness of programming abstractions (WP5) and the identification of security and privacy concerns (WP6). They will also be used to implement the use cases (WP7). Next steps include integration (indirectly) with resource allocation and evaluation of workload (WP3) and improvements in the implementation (convert pending prototypes to WP4 and WP5 technologies).

Together with Task 7.3, Task 7.4 will provide the toolbox needed to implement the complex analytics scenarios of Routes for People Use Case (Task 7.5).