EUROPE - BRAZIL COLLABORATION OF BIG DATA SCIENTIFIC RESEARCH THROUGH CLOUD-CENTRIC APPLICATIONS

Deliverables

Deliverable 5.1 - EUBra-BIGSEA Software Architecture

The purpose of this report is the design of the software architecture of the EUBra-BIGSEA platform. The document describes the overall functioning and interactions between the platform components, and serves as development roadmap for the developers of the project.

This document summarizes the architectural design to be implemented in the course of the EUBra-BIGSEA project.

It highlights the interrelations among the different work packages involved in the decisions adopted, and also outline the reasoning behind the choices made.

Deliverable 2.2 - User communities engagement and dissemination strategy

The scope of this report is to provide an overview of the user communities targeted by EUBra-BIGSEA as relevant stakeholders to facilitate the sustainability of the project on the one hand, and on the other hand to lay the ground for the technology transfer of the EUBra-BIGSEA assets. It leverages the analysis made in D2.1 to identify the stakeholders, classify them, and determine the right messaging to convey to engage them.

The first outcome of this document is the definition of a clear Engagement and Dissemination strategy targeting the different EUBra-BIGSEA User Communities. Then, the second outcome is a set of individual engagement plans including activities, tools and expected impact for each one of the identified communities. The Engagement strategy and plans targeting the user communities are an important step to maximise the impact of the project. The outcomes will be the basis for the work that will be carried in WP8 to define the exploitation and sustainability plans.

Deliverable 8.1 - Assets, Market Analysis & Stakeholder Synergy Plan

Big Data analytics offers to both private as well as public sectors the promise to provide valuable insight to that can create competitive advantages, foster innovation and scientific discovery and drive efficiency and progress across multiple domains and industries. EUBra-BIGSEA demonstrates the value of using cloud services on applications with high social and business impact, addressing main scenarios of high interest for both Europe and Brazil: processing of massive data coming from highly connected societies. It also demonstrates the value of developing Big Data services for capturing, federating and annotating on the order of PB of data on top of efficient programming models. These Big Data services impose multiple challenges on resource provision, performance, Quality of Service and privacy on a cloud infrastructure and services to support Big Data applications.

This report is the first one of WP8 with the purpose of providing an initial analysis of the exploitable assets, and defining a stakeholder engagement plan targeting the customer segments for both the individual assets and new tools and services, including messages and formats best suited for them. With the focus on the general uptake of technologies, this document analyses the market landscape of the relevant sectors for EUBra-BIGSEA to position the preliminary identified assets.

Deliverable 7.3 - Toolbox for GES3 Data Initial Release

This document deals with the creation of descriptive models from the GES3 aforementioned data sources, in order to understand the dynamics of traffic and transportation public services in Brazilian cities. Furthermore, it describes the elaboration of a toolbox containing descriptive models, their implementation, deployment and application on the smart cities context

Models applies a specific set of data mining and machine learning unsupervised techniques clustering, association rules, feature extraction and common used summarization and aggregation of data. Results from Task 7.3 are the first ones using the GES3 data and computation intensive algorithms. Thus, implemented code has been used as a proof-of-concept in differents WPs to, for example, evaluate infrastructure, expressiveness of programming abstractions, identification of security and privacy concerns and in the realization of the use cases. Next steps include integration (indirectly) with resource allocation and evaluation of workload and improvements in the implementation. Together with Task 7.4, Task 7.3 will provide the toolbox needed to implement the complex analytics scenarios of Routes for People Use Case (Task 7.5).

Deliverable 7.2 - GES3 Data Integration

EUBra‐BIGSEA project aims at developing a set of cloud services empowering Big Data analytics to ease the development of massive data processing applications. EUBra‐BIGSEA will develop models, predictive and reactive cloud infrastructure QoS techniques, efficient and scalable Big Data operators and a privacy and quality analysis framework, exposed to several programming environments. 

The Acquisition and Engineering of Georeferenced Environmental, Stationary, Streaming and Social (GES3) data (Task 7.2) is related to the Use Case 1 ‐ (UC1) ‐ Data Acquisition (D7.1). In particular, these data come from sources that are related to urban traffic and cover four main data types: stationary data, dynamic spatial data, environmental data, and social network data. Despite that the EUBra‐BIGSEA pilot has been initially planned over the data of the city of Curitiba, where the pilot case is being constructed, the EUBra‐BIGSEA framework will be applicable (at least partially) to other scenarios. Therefore, the data integration covers the general problem of mechanisms for collecting, cleaning, transforming and integrating all the listed data sources, in order to understand the dynamics of traffic and transportation public services in Brazilian cities.

After a description of all data sources, the integration process has gone through the following steps:

  1. Data sources within the same theme were integrated (such as official sources);
  2.  We performed an integration along different data types (such as stationary and dynamic spatial data);
  3.  We identified their issues as data quality, entity matching, or data mining problem;
  4.  We identified mechanisms to improve their integration and quality for the final user.

Deliverable 5.2 - Programming abstractions design

This document describes the implementation of the programming model prototypes developed as a part of the EUBra-BIGSEA platform. The programming models offer the tools to abstract the data services to the user scenarios and execute them on the QoS infrastructure. COMPSs and Apache Spark are the two available frameworks for the porting of the scenarios. This document, together with the description of the software components available in the project’s repository, realizes the milestone MS13 First release of the programming layer.

  • COMPSs applications can be written in sequential Java, Python or C/C++, and make use of other higher-level software components, such as OPHIDIA workflows. Sequential code is instrumented with data flow information that COMPSs uses to infer parallelism. COMPSs is platform agnostic and deals both with the execution and the negotiation with the computing infrastructure to request the necessary resources for the execution of the workflows. In this project, COMPSs has been extended to create a Mesos framework and to support NoSQL storage. Additional dependencies are easily coded inside COMPSs jobs through the use of Docker containers.

 

  • Lemonade (Live Exploration and Mining Of Non-trivial Amount of Data from Everywhere) is a visual platform for distributed computing, aimed to enable implementation, experimentation, testing and deployment of data processing and machine learning applications. It provides developers with high-level abstractions, called operations to build processing workflows using a graphical web interface. Lemonade currently generates Spark code, and it will be extended to support COMPSs workflows during the second year. Lemonade provides (or will provide) many operations typically used for Extraction, Transformation and Loading (ETL), including Data transformation, Machine Learning, Statistic analysis, Text processing and Data visualization. Lemonade is formed by a set of components which provide the whole functionality

Deliverable 3.3 - BIGSEA QoS infrastructure services initial version

This document describes the cloud services to be used by the other components in the project EU-Bra BIGSEA to execute the processing workload. The document covers the implementation details of the architecture depicted in D3.1 QoS Monitoring System Architecture.

The document describes how to deploy a fully functional cluster, how to access it and how the components are integrated. Through the document you will find the instructions to tune-up the BIGSEA cloud services recipes to deploy them on different IaaS, as well as the security mechanisms for deploying frameworks.

Finally, it covers the QoS Monitoring system, based on Monasca, describing how to implement probes and deploy agents.

Deliverable 3.2 - Big Data Application Performance models and run-time Optimization Policies

This document describes the performance methods and tools to predict the execution time of big data applications that have been developed during the first year of EUBra-BIGSEA. Moreover, this report introduces the optimization models which will be the core of horizontal elasticity policies. Optimization-based policies will trigger the configuration of the cloud infrastructure providing QoS guarantees for big data applications execution while minimising resource usage costs.

The ultimate goal of the optimization based policies that will be developed within EUBra-BIGSEA is to determine the system configuration leading to the minimum cost, embedding performance prediction methods in an optimization framework. Performance models will be kept alive at run-time, and will be integrated within an optimization algorithm, to provide the proactive-based approach with the insights to perform the dynamic adjustment of the system configuration.

Deliverable 3.1 - QoS Monitoring System Architecture

The Quality of Service (QoS) architecture is the computational core of the EUBra-BIGSEA platform. The performance of data analytics applications running on the EUBRra-BIGSEA platform are profiled in advance, so a QoS guarantee is defined based on the performance requirements. 

This document describes the QoS Monitoring System Architecture as well as the software architecture of other cloud-service related components and their interactions. The purpose of this report on the QoS Monitoring System Architecture is to define the software components that will collect the execution data from the cloud architecture, as well as the main components that intervene in the full process of deployment, configuration, contextualization and execution. Each component is described in terms of its external interfaces and dependencies on other components.

The project identifies three types of workloads: (i) persistent, (ii) periodic batch and (iii) interactive jobs, which will be served by different schedulers. Persistent jobs will be served by the Marathon scheduler, periodic jobs by means of Chronos scheduler, and interactive jobs through interactive shells (e.g. spark shells). Those schedulers will deploy frameworks that will embed the executable services and negotiate the resources with Mesos.

Deliverable 2.3 - Preliminary Action Plan Report

Cloud computing, Big Data technologies, the Internet of Things (IoT), 5G communications and cyber security are the building blocks of the digital economy. The uptake of cloud computing and virtualised infrastructures plays an essential role in enabling the transition towards a distributed global community, enhancing collaborative work and tackling the challenges of big data.
The cooperation between Europe and Brazil seeks to sustain and enhance the social and economic conditions, increase competitiveness, creating jobs, and addressing common global challenges in areas like energy, international cyber policy, sustainable development, climate change, and the environment.
Policy collaboration to date has included work on identifying barriers that may preclude the adoption of cloud-based services in Europe and in Brazil and on identifying concrete joint initiatives to minimise such barriers.Current EU-Brazil collaboration is expected to advance cloud-centric applications for big data, and move forward towards facilitating policy coordination between the EU and Brazil.

EUBra-BIGSEA is committed to making a significant contribution to the cooperation between Europe and Brazil in the area of advanced cloud services for Big Data applications. EUBra-BIGSEA facilitates the integration of European and Brazilian technologies and experiences to bring forward scientific innovation through a use case scenario approach that is important for both Europe and Brazil, furthermore it is looking into the current challenges, research and innovation opportunities and highlighting as well the relevant EU-Brazil joint effort initiatives in the areas addressed.

Download the EUBra-BIGSEA Preliminary Action Plan Report to find out more