EUROPE - BRAZIL COLLABORATION OF BIG DATA SCIENTIFIC RESEARCH THROUGH CLOUD-CENTRIC APPLICATIONS

Deliverables

Deliverable 3.4 EUBra-BIGSEA QoS infrastructure services intermediate version

This document describes the first release of the infrastructure services of the EUBra-BIGSEA platform.This deliverable puts together results from previous deliverables, such as the monitoring system, performance models, and the contextualization service to provide the first complete version of the EUBra-BIGSEA QoS-aware infrastructure. In the current version, the platform enables users to provide data processing applications that will be profiled and modeled in a pre-production phase and, then, be deployed for either asynchronous execution (e.g., driven by external triggers such as the availability of new data) or for periodic execution. In both cases, the platform will be able to estimate the initial amount of resources needed to run the application within the specified deadlines and will be able to trigger adaptation of the running infrastructure to match the resources needed in order to satisfy the deadlines.

Deliverable 5.1 - EUBra-BIGSEA Software Architecture

The purpose of this report is the design of the software architecture of the EUBra-BIGSEA platform. The document describes the overall functioning and interactions between the platform components, and serves as development roadmap for the developers of the project.

This document summarizes the architectural design to be implemented in the course of the EUBra-BIGSEA project.

It highlights the interrelations among the different work packages involved in the decisions adopted, and also outline the reasoning behind the choices made.

Deliverable 2.2 - User communities engagement and dissemination strategy

The scope of this report is to provide an overview of the user communities targeted by EUBra-BIGSEA as relevant stakeholders to facilitate the sustainability of the project on the one hand, and on the other hand to lay the ground for the technology transfer of the EUBra-BIGSEA assets. It leverages the analysis made in D2.1 to identify the stakeholders, classify them, and determine the right messaging to convey to engage them.

The first outcome of this document is the definition of a clear Engagement and Dissemination strategy targeting the different EUBra-BIGSEA User Communities. Then, the second outcome is a set of individual engagement plans including activities, tools and expected impact for each one of the identified communities. The Engagement strategy and plans targeting the user communities are an important step to maximise the impact of the project. The outcomes will be the basis for the work that will be carried in WP8 to define the exploitation and sustainability plans.

Deliverable 8.1 - Assets, Market Analysis & Stakeholder Synergy Plan

Big Data analytics offers to both private as well as public sectors the promise to provide valuable insight to that can create competitive advantages, foster innovation and scientific discovery and drive efficiency and progress across multiple domains and industries. EUBra-BIGSEA demonstrates the value of using cloud services on applications with high social and business impact, addressing main scenarios of high interest for both Europe and Brazil: processing of massive data coming from highly connected societies. It also demonstrates the value of developing Big Data services for capturing, federating and annotating on the order of PB of data on top of efficient programming models. These Big Data services impose multiple challenges on resource provision, performance, Quality of Service and privacy on a cloud infrastructure and services to support Big Data applications.

This report is the first one of WP8 with the purpose of providing an initial analysis of the exploitable assets, and defining a stakeholder engagement plan targeting the customer segments for both the individual assets and new tools and services, including messages and formats best suited for them. With the focus on the general uptake of technologies, this document analyses the market landscape of the relevant sectors for EUBra-BIGSEA to position the preliminary identified assets.

Deliverable 7.3 - Toolbox for GES3 Data Initial Release

This document deals with the creation of descriptive models from the GES3 aforementioned data sources, in order to understand the dynamics of traffic and transportation public services in Brazilian cities. Furthermore, it describes the elaboration of a toolbox containing descriptive models, their implementation, deployment and application on the smart cities context

Models applies a specific set of data mining and machine learning unsupervised techniques clustering, association rules, feature extraction and common used summarization and aggregation of data. Results from Task 7.3 are the first ones using the GES3 data and computation intensive algorithms. Thus, implemented code has been used as a proof-of-concept in differents WPs to, for example, evaluate infrastructure, expressiveness of programming abstractions, identification of security and privacy concerns and in the realization of the use cases. Next steps include integration (indirectly) with resource allocation and evaluation of workload and improvements in the implementation. Together with Task 7.4, Task 7.3 will provide the toolbox needed to implement the complex analytics scenarios of Routes for People Use Case (Task 7.5).

Deliverable 7.2 - GES3 Data Integration

EUBra‐BIGSEA project aims at developing a set of cloud services empowering Big Data analytics to ease the development of massive data processing applications. EUBra‐BIGSEA will develop models, predictive and reactive cloud infrastructure QoS techniques, efficient and scalable Big Data operators and a privacy and quality analysis framework, exposed to several programming environments. 

The Acquisition and Engineering of Georeferenced Environmental, Stationary, Streaming and Social (GES3) data (Task 7.2) is related to the Use Case 1 ‐ (UC1) ‐ Data Acquisition (D7.1). In particular, these data come from sources that are related to urban traffic and cover four main data types: stationary data, dynamic spatial data, environmental data, and social network data. Despite that the EUBra‐BIGSEA pilot has been initially planned over the data of the city of Curitiba, where the pilot case is being constructed, the EUBra‐BIGSEA framework will be applicable (at least partially) to other scenarios. Therefore, the data integration covers the general problem of mechanisms for collecting, cleaning, transforming and integrating all the listed data sources, in order to understand the dynamics of traffic and transportation public services in Brazilian cities.

After a description of all data sources, the integration process has gone through the following steps:

  1. Data sources within the same theme were integrated (such as official sources);
  2.  We performed an integration along different data types (such as stationary and dynamic spatial data);
  3.  We identified their issues as data quality, entity matching, or data mining problem;
  4.  We identified mechanisms to improve their integration and quality for the final user.

Deliverable 5.2 - Programming abstractions design

This document describes the implementation of the programming model prototypes developed as a part of the EUBra-BIGSEA platform. The programming models offer the tools to abstract the data services to the user scenarios and execute them on the QoS infrastructure. COMPSs and Apache Spark are the two available frameworks for the porting of the scenarios. This document, together with the description of the software components available in the project’s repository, realizes the milestone MS13 First release of the programming layer.

  • COMPSs applications can be written in sequential Java, Python or C/C++, and make use of other higher-level software components, such as OPHIDIA workflows. Sequential code is instrumented with data flow information that COMPSs uses to infer parallelism. COMPSs is platform agnostic and deals both with the execution and the negotiation with the computing infrastructure to request the necessary resources for the execution of the workflows. In this project, COMPSs has been extended to create a Mesos framework and to support NoSQL storage. Additional dependencies are easily coded inside COMPSs jobs through the use of Docker containers.

 

  • Lemonade (Live Exploration and Mining Of Non-trivial Amount of Data from Everywhere) is a visual platform for distributed computing, aimed to enable implementation, experimentation, testing and deployment of data processing and machine learning applications. It provides developers with high-level abstractions, called operations to build processing workflows using a graphical web interface. Lemonade currently generates Spark code, and it will be extended to support COMPSs workflows during the second year. Lemonade provides (or will provide) many operations typically used for Extraction, Transformation and Loading (ETL), including Data transformation, Machine Learning, Statistic analysis, Text processing and Data visualization. Lemonade is formed by a set of components which provide the whole functionality

Deliverable 3.3 - BIGSEA QoS infrastructure services initial version

This document describes the cloud services to be used by the other components in the project EU-Bra BIGSEA to execute the processing workload. The document covers the implementation details of the architecture depicted in D3.1 QoS Monitoring System Architecture.

The document describes how to deploy a fully functional cluster, how to access it and how the components are integrated. Through the document you will find the instructions to tune-up the BIGSEA cloud services recipes to deploy them on different IaaS, as well as the security mechanisms for deploying frameworks.

Finally, it covers the QoS Monitoring system, based on Monasca, describing how to implement probes and deploy agents.

Deliverable 3.2 - Big Data Application Performance models and run-time Optimization Policies

This document describes the performance methods and tools to predict the execution time of big data applications that have been developed during the first year of EUBra-BIGSEA. Moreover, this report introduces the optimization models which will be the core of horizontal elasticity policies. Optimization-based policies will trigger the configuration of the cloud infrastructure providing QoS guarantees for big data applications execution while minimising resource usage costs.

The ultimate goal of the optimization based policies that will be developed within EUBra-BIGSEA is to determine the system configuration leading to the minimum cost, embedding performance prediction methods in an optimization framework. Performance models will be kept alive at run-time, and will be integrated within an optimization algorithm, to provide the proactive-based approach with the insights to perform the dynamic adjustment of the system configuration.

Deliverable 3.1 - QoS Monitoring System Architecture

The Quality of Service (QoS) architecture is the computational core of the EUBra-BIGSEA platform. The performance of data analytics applications running on the EUBRra-BIGSEA platform are profiled in advance, so a QoS guarantee is defined based on the performance requirements. 

This document describes the QoS Monitoring System Architecture as well as the software architecture of other cloud-service related components and their interactions. The purpose of this report on the QoS Monitoring System Architecture is to define the software components that will collect the execution data from the cloud architecture, as well as the main components that intervene in the full process of deployment, configuration, contextualization and execution. Each component is described in terms of its external interfaces and dependencies on other components.

The project identifies three types of workloads: (i) persistent, (ii) periodic batch and (iii) interactive jobs, which will be served by different schedulers. Persistent jobs will be served by the Marathon scheduler, periodic jobs by means of Chronos scheduler, and interactive jobs through interactive shells (e.g. spark shells). Those schedulers will deploy frameworks that will embed the executable services and negotiate the resources with Mesos.

Deliverable 2.3 - Preliminary Action Plan Report

Cloud computing, Big Data technologies, the Internet of Things (IoT), 5G communications and cyber security are the building blocks of the digital economy. The uptake of cloud computing and virtualised infrastructures plays an essential role in enabling the transition towards a distributed global community, enhancing collaborative work and tackling the challenges of big data.
The cooperation between Europe and Brazil seeks to sustain and enhance the social and economic conditions, increase competitiveness, creating jobs, and addressing common global challenges in areas like energy, international cyber policy, sustainable development, climate change, and the environment.
Policy collaboration to date has included work on identifying barriers that may preclude the adoption of cloud-based services in Europe and in Brazil and on identifying concrete joint initiatives to minimise such barriers.Current EU-Brazil collaboration is expected to advance cloud-centric applications for big data, and move forward towards facilitating policy coordination between the EU and Brazil.

EUBra-BIGSEA is committed to making a significant contribution to the cooperation between Europe and Brazil in the area of advanced cloud services for Big Data applications. EUBra-BIGSEA facilitates the integration of European and Brazilian technologies and experiences to bring forward scientific innovation through a use case scenario approach that is important for both Europe and Brazil, furthermore it is looking into the current challenges, research and innovation opportunities and highlighting as well the relevant EU-Brazil joint effort initiatives in the areas addressed.

Download the EUBra-BIGSEA Preliminary Action Plan Report to find out more 

 

Deliverable 7.1 - End-User Requirements Elicitation

This document presents the requirement analysis of the massively connected society use case that will be used for demonstration. The use case deals with traffic data analysis and the requirements cover both the perspective of the data processors and the final end-users.
The requirements analysis process has been implemented in four major steps:

  1. Preparation of a questionnaire distributed among data analysis developers to identify the requirements and data sources.
  2. Analysis of the general project scenario, which is split into three use cases: Data Acquisition, Descriptive Models and Predictive Models.
  3. identification of User Stories - individual descriptions of whole interactions of the users with the system per use case.
  4. Identification of 25 functional and non-functional requirements from the use stories per Use Case, used to identify 18 technical requirements.

The requirements from the Use Cases serve to guide the implementation of the application and services that will consume the Big Data services of EUBra-BIGSEA platform. The 18 technical requirements are directly related to the EUBra-BIGSEA platform developers, and address functionalities such as: integration of external data sources, Bag of Tasks, QoS-bounded submission, self-adaptive elasticity, privacy annotation, authentication, and data privacy protection.

Deliverable 6.1 - Requirements and Coordinated Security Strategy

This document identifies the security requirements that will drive the implementation of the EUBra-BIGSEA global security solution addressing: (i) the provisioning of Authentication, Authorization and Accounting (AAA), (ii) the assurance of the security properties of the cloud and Big Data services, and (iii) the protection of the data privacy.

The 30 high level requirements will ensure the desgin and implementaiton of a secure environment for the infrastructure, for the application developers and even for the end users of the applications running inside the framework. The defined solution includes two distinct AAA blocks:

  • A EUBra-BIGSEA Infrastructure AAA Service, to provide the AAA functionalities to infrastructure managers and application developers/providers;
  • EUBra-BIGSEA Applications AAAaaS, to serve the end users of applications hosted in the EUBra-BIGSEA.

The document also revise the state of the art and includes the security assessment of key infrastructure components and the development of solutions for the issues uncovered, the benchmarking and improvement of intrusion detection systems, and the proposal of metrics to characterize the trustworthiness of the system, together with the definition of two distinct privacy control barriers, responsible of protecting the anonymity of both the raw data to be used and of the data resulting from the predictive and descriptive models built.

Deliverable 4.1 - Design of the integrated big and fast data eco-system

Starting from the end-user requirements highlighted in deliverable D7.1, this document defines the architecture of the integrated fast and Big Data ecosystem, which represents the central data management component of the EUBra-BIGSEA platform.

The proposed architecture integrates multiple classes of big data systems. Two different aspects of the proposed architecture are:

  • A comprehensive evaluation and assessment of the big data tools available in the general landscape from data storage, access, analytics and mining standpoint.
  • deep data sources analysis in terms of data model, formats, volume, metadata, and functional needs.

Key features highlighted in this document are also the integration of different classes of big/fast data tools to address multifaceted use cases requirements, the dynamicity and elasticity of the environment jointly with a secured-by-design ecosystem.
The proposed architecture joins all these elements in a cloud environment aiming at providing, to some extent, a general approach to deal with the high social impact use cases and scenarios like the one proposed in the project.

Deliverable 2.1 - Communication Strategy And Web Platform Development

This document defines the whole communication strategy of the project, outlining in particular the channels and tools that have been and will be selected for the communication to and engagement with each specific target stakeholder. The objective of the communication and dissemination strategy is to support the project goals through an effective communication and engagement approach, which ensures the wide promotion and high visibility around the innovation and benefits provided by the EUBra-BIGSEA to its stakeholders.