EUROPE - BRAZIL COLLABORATION OF BIG DATA SCIENTIFIC RESEARCH THROUGH CLOUD-CENTRIC APPLICATIONS

Deliverables

Deliverable 2.5 Final Action Plan Report

EUBra-BIGSEA is committed to generating significant impact on the cooperation between Europe and Brazil in the area of advanced cloud services for big data applications. EUBra-BIGSEA facilitates the integration of European and Brazilian technologies and experiences to bring forward scientific innovation through a use case scenario approach that is important for both Europe and Brazil.

The Preliminary Action Plan report (D2.3) provided an initial overview of current challenges, research and innovation opportunities, including an analysis of the priorities identified, highlighting as well the relevant EU-Brazil joint effort initiatives in the areas addressed.

The Plan is the first of two reports produced under EUBra-BIGSEA Task 2.3 “Joint EU Brazil Cloud Computing Research & Innovation Action Plan” focusing on outlining the joint EU Brazil cloud computing research and innovation action plans in the area of Quality of Service for cloud computing infrastructures; big data analytics; security and privacy; standards; portability and interoperability; smart cities and urban mobility planning. As a Preliminary Joint Action Plan, the D2.3 report delivered in M12 of the project covered the main challenges that EUBra-BIGSEA was addressing during the first year, providing an overview of recent developments and opportunities, while offering an initial plan aimed at boosting international cooperation and sharing of new expertise with other relevant initiatives in Europe and Brazil.

This Final Research & Innovation Action Plan (D2.5), further develops this analysis and discusses the potential socio-economic impact of the research roadmap proposed in EUBra-BIGSEA. It provides a detailed update of the EUBra-BIGSEA outputs in each of the discussed priority areas as well as an update on the potential and implemented collaboration activities. The EUBra-BIGSEA outputs should be seen as practical solutions developed by the project in response to the challenges outlined. Together with the synergies created with similar projects and initiatives, these provide a complete picture of the joint actions performed to enhance the international co-operation, knowledge and technology exchange between Europe and Brazil in the context of the EUBra-BIGSEA project.

Deliverable 7.6 Validation of the Requirements

The EUBra-BIGSEA project is aimed at developing a set of cloud services empowering Big Data analytics to ease the development of massive data processing applications. EUBra-BIGSEA will develop models, predictive and reactive cloud infrastructure QoS techniques, efficient and scalable Big Data operators and a privacy and quality analysis framework, exposed to several programming environments. EUBra-BIGSEA aims at covering general requirements of multiple application areas, although it will showcase in the treatment of massive connected society information, and particularly in traffic recommendation.

The validation of the requirements is the last document of the Use Case work package, which aims at demonstrating the capabilities of the platform. This document is defined in some way as a comprehensive compendium of the components developed and its applicability, and constitutes a good source of information for readers who want to exploit EUBra-BIGSEA components.

In the beginning of the project, 25 functional and non-functional requirements were identified from 11 use stories defined from the three use cases. After the analysis of those requirements, 18 technical requirements were identified. Those requirements have been properly addressed by the components of EUBra-BIGSEA.

EUBra-BIGSEA has developed a set of 19 components that address five layers (infrastructure management, programming models, security and privacy, high-level data analytic services and final user applications. These components provide platform-agnostic automatic deployment and configuration of virtual infrastructures, horizontal elasticity, vertical elasticity at hypervisor and framework level, QoS prediction and scheduling, privacy annotation, quality assurance, security mechanisms, vulnerability assessment, data analytic functions, parallel programming models, high-level design tools for data analytic workflows, a large set of high-level services for traffic data processing and modelling, entity matching services and three final user applications.

The whole ecosystem of EUBra-BIGSEA gives answer to the requirements for three use cases defined during the early stage of the project (data integration, descriptive models and predictive models). They deal with the way jobs are processed, the way data is ingested and stored as well as the management of the results. The system demonstrates the execution of short jobs, parallel jobs, high-throughput jobs, QoS boundaries, remote datacube analysis, and the creation of parallel applications from a graphical user interface. Several demos have been generated by integrating several components.

Finally EUBra-BIGSEA has developed several applications integrating multiple components of the platform that have been used to validate and demonstrate the components and to exemplify how solutions can be built. The main examples are: tools for convenient management of the deployment of a full elastic virtual cluster; horizontal and vertical elasticity on applications; data analytics with privacy annotations; and composing complex workflows with a graphical interface, which generates parallel COMPSs and Spark code.

Deliverable 4.5 Data Quality-aware Service (DQaS) Architecture

The combination of data and technology is having a high impact on the way we live. The world is getting smarter thanks to the quantity of collected and analysed data, i.e., to the Big Data sources.

In such a scenario, the EUBra-BIGSEA project aims to develop a cloud platform for Big Data management and exploitation. In particular, in the project, cloud services able to empower Big Data analytics and thus able to support the development of data processing applications have been designed. Such services have been developed by considering Big Data issues (i.e., data volume, variety, velocity and veracity), QoS, privacy, and security constraints.

This deliverable focuses on Data Quality (DQ) that is a fundamental ingredient in order to effectively exploit Big data. In fact, data quantity can create a real value only if combined with data quality: good decisions and actions are the results of correct, reliable and complete data. In such a scenario, methods and techniques able to evaluate the quality of the available data are needed. In fact, most of the literature contributions in this field are related to structured data. New algorithms have to be designed in order to deal with novel requirements related to variety, volume and velocity issues. Such methods are provided by the Data Quality-aware Service (DQaS) that is composed of two main modules:

● DQaaS (Data Quality as a Service): it is in charge to provide a descriptive view of the quality of the sources with the aim to support the analytics applications in understanding which are the relevant and useful data to consider in the more advanced analyses;

● EMaaS (Entity Matching as a Service): it supports data integration by providing approaches for managing entity matching.

This deliverable describes these two modules illustrating their functionalities and performance.

 

Deliverable 4.4 QoS extensions for the big and fast data eco-system

This deliverable describes the design and the implementation details regarding the big and fast data ecosystem extensions and adaptations for Quality of Service (QoS) guarantees. It focuses on the main technologies involved in the eco-system (i.e., Ophidia and Spark) and the extensions and implementations to support automated deployment, job monitoring, elasticity, as well as the integration activities of these technologies with the QoS infrastructure developed at the level of WP3. The extensions support data processing applications with QoS deadlines by dynamically adjusting the resources available for the job execution.

The solutions presented in the document capitalize on the plugin-based architecture implemented by WP3 to provide a reactive and dynamic PaaS environment over cloud-based resources, capable of handling Big Data applications with QoS constraints, minimizing resource usage at the same time.

Moreover, the document provides some usage examples and relevant code snippets related to these extensions, along with a brief evaluation of a couple of WP4 implementations built on top of the adapted big data technologies to highlight how the resource availability can impact on the execution time and, in turn, affect QoS constraints.

 

Deliverable 4.2 Report about the final implementation of the integrated big and fast data eco-system

The integrated big and fast data eco-system represents a central component of the EUBra-BIGSEA platform and it is devoted to the management and processing of big data. It integrates a wide set of technologies, libraries and services in a cloud environment aiming to provide a general architecture to address the data challenges related to the massive connected societies use cases, by tackling big data issues, such as volume, variety and velocity. The eco-system interacts with the other layers of the platform to support data processing in Quality of Service (QoS), cloud–based scenarios, while also taking into account data privacy and security. Additionally, the strong integration with the abstraction layer (e.g. Lemonade and COMPSs) allows the application developers to take advantage of the eco-system technologies and libraries directly from the programming frameworks.

This document describes the final implementation of the data eco-system, starting from the requirements and its initial design defined in deliverable D4.1 “Design of the integrated big and fast data eco-system”. The technologies selection, deployment, testing and validation have played a key role during the implementation of the eco-system. Some applications have been developed at the level of the data platform to provide the proper means to evaluate and validate the features provided by the eco-system, which supports among others: fast data analysis over continuous streams from external data sources, general purpose data mining and machine learning, OLAP-based analysis on multidimensional data. To test the applications, some testbed infrastructures have been deployed in both Europe and Brazil. Additionally, a toolbox of algorithms and libraries has also been defined to support end-users analysis and provide them with a reference guide.

Deliverable 6.4 Methodologies for trustworthiness estimation

The problem of measuring security is that it is, usually, much more dependent on aspects that are unknown about the system (e.g., unknown vulnerabilities) and about the potential attackers, than on what is known about it. To make the process feasible, the alternative is to focus on estimating trustworthiness based on evidences regarding specific security characteristics or behaviours, considering that the security concerns of a large and complex system should not be addressed individually or in an ad hoc manner, as this may result in insufficient solutions. The use of security assessment techniques and tools (e.g., testing, analysis, vulnerability and attack injection, etc.) to provide a degree of trustworthiness on the security of the components of the infrastructure, and how resistant they are to malicious attempts, is thus a key aspect. The results obtained allow the adjustment of the quality of protection established from the provider point of view, thus providing a realistic measure of what level of security can be promised.

Deliverable D6.1 (Requirements and Coordinated Security Strategy) defined a coordinated strategy for achieving the required levels of security for the EUBra-BIGSEA infrastructure. Such strategy was intended to guide the research, development and integration of the security solutions along the project. In practice, the main objective of that document was to define a global security solution able to deal with the security objectives of the project: the provisioning of Authentication, Authorization and Accounting (AAA); the assurance of the security properties of the cloud and Big Data services; and the protection of the data privacy. The result was a list of 30 high level requirements whose implementation provides a secure environment for the infrastructure, the application developers and even the end users of the applications running inside the framework.

Deliverable D6.3 (Techniques and tools to assess the security of Cloud and Data Services) focused on the second aspect of the EUBra-BIGSEA security approach and on the respective requirements: techniques and tools for assessing the security of cloud and Big Data services. The requirements defined in D6.1 include the security assessment of key infrastructure components, the benchmarking of Intrusion Detection Systems, and the proposal of an approach to estimate the trustworthiness of the system. In practice, D6.3 presented the techniques and tools to be used for supporting such assessments, including the concepts involved in a trust relationship and on trustworthiness assessment.

The present deliverable (D6.4) presents the results of the application of the techniques and tools introduced in D6.3, and discusses the main trustworthiness observations. The results presented contribute to the other work packages of the project by providing relevant information to support the identification of better configurations in terms of security, the potential mitigation of some of the vulnerabilities identified, and the estimation of the level of trustworthiness of the components assessed. Overall, results show weaknesses in the components evaluated, which leads to the estimation of a high level of trustworthiness (in a scale including four levels: very low, low, high, and very high).

Deliverable 3.5 EUBra-BIGSEA QoS Infrastructure services final version

This document describes the final version of the infrastructure services that enable the EUBra-BIGSEA ecosystem. These services include: (i) tools for predicting big data application performance, (ii) mechanisms for horizontal and vertical elasticity, and (iii) approaches based on pro-activity and on optimization to perform horizontal and vertical scaling of running big data applications, as well as load balancing of the infrastructure.

Deliverable 6.2 AAA provisioning services and mechanisms

The EUBra-BIGSEA project aims at developing cloud services empowering Big Data analytics to ease the development of massive data processing applications. For this, the project requires the research of efficient mechanisms to ensure privacy and security, on top of a QoS-aware layer for the smart and rapid provisioning of resources in a cloud-based environment.

The security concerns of a large and complex system should not be addressed individually or in an ad-hoc manner, as this may result in inadequate solutions. This is even more important in the context of complex systems such as the one being developed in the context of the EUBra-BIGSEA. So, a coordinated strategy allowing to achieve the appropriate level of security is mandatory. Such strategy, already discussed in the previous Deliverable D6.1, guides the research, development and integration of the security solutions along the project.

One of the pillars of this strategy corresponds to the inclusion of AAA (Authentication, Authorisation and Accounting) solutions into the EUBra-BIGSEA platform, including two distinct AAA blocks: 1) the EUBra-BIGSEA iAA Service, to provide infrastructure-level AA (Authentication and Authorization) functionalities to infrastructure managers and application developers/providers; and 2) the EUBra-BIGSEA Applications AAAaaS (Authentication, Authorization and Accounting as a Service), focused on the authentication and authorization of the end users of applications hosted in the EUBra-BIGSEA platform.

This document presents the two forementioned AAA blocks, which have been developed and integrated in the scope of EUBra-BIGSEA framework and, combined together, provide the AAA services required for operating the EUBra-BIGSEA applications and underlying infrastructure. As discussed along the document, these two blocks share several architectural similarities, despite serving distinct purposes. They were both implemented according to a common modular design which allows both sharing several common components (in order to reduce software development and maintenance costs) and adequate cloud-based deployment and lifecycle management strategies.

Deliverable 4.3 Security and privacy extensions for the big and fast data eco-system

EUBra-BIGSEA aims at covering general requirements of multiple application areas, although it will showcase in the treatment of massive connected society information, and particularly in traffic recommendation.

This document reports on the extensions developed for the integrated Big and Fast data eco-system to cope with security and data privacy concerns. The integrated fast and Big Data eco-system represents the central component devoted to data management aspects (i.e., access, analytics/mining, and quality) of the EUBra-BIGSEA platform. Developers and end-users will exploit it to execute data analytics and mining operations over different types of data sources. These data can contain sensitive or personal information that can identify individuals. Additionally, data source owners can define policies for the usage of certain information that must be enforced. In either way, data privacy and anonymity techniques are required to guarantee that the data policies are respected while using and processing the data within the Big Data ecosystem technologies. Besides data privacy, the data platform should also ensure that access to the technologies and the data is provided only to authorized users. In this regard, a common authorization/authentication mechanisms is exploited by the services and technologies involved in the ecosystem.

Deliverable 5.3 Report on programming abstractions

The programming abstractions layer is a centric component of the EUBra-BIGSEA platform. It provides the enabling components to transparently implement the applications scenarios on top of the Big Data layer in the project. In this document, we focus on the description of the last version of the software architecture components that enable the development of modules and libraries (building blocks) that abstract the data layer intricacies to the applications.  In particular, we describe how the programming frameworks are integrated with the QoS infrastructure, how security is provided in the definition of the applications and we present the current use cases implementations through these components.

 

Deliverable 7.5​ ​Routes​ ​for​ ​people​ ​application

The Routes for People application, core of this Deliverable D7.5, is a visual demonstrator of the integration of the result of the processing algorithms developed in WP7 for Sentiment Analysis, Crowdedness prediction, Traffic congestion estimation and Route recommendation. Routes for People is a web application that runs on top of the elastic infrastructure of BIGSEA that enables users to find best transportation routes in Brussels, Madrid, Valencia and Helsinki from Europe and Belo Horizonte, Campina Grande, Curitiba, Fortaleza, and Rio de Janeiro from Brazil.

All code from Routes for People is Open Source and accessible in GitHub at https://github.com/eubr-bigsea?utf8=%E2%9C%93&q=rfp&type=&language= (repositories rfp-lb, rfp-web, rfp-db, docker-opentripplanner, and rfp-marathon-launch-scripts). A set of use cases and functional and nonfunctional requirements have been careful analysed to guide the validation of the system. The application integrates with security mechanisms of WP6, data integration from WP4 and uses the results of the data processed using algorithms implemented with WP5’s programming models. The application is the result of an intense cooperation among the partners, being the web application developed by the UPV, using the data collected by UPV, UFMG, UFCG and UFTPR, preprocessed by CMCC, UNICAMP and POLIMI data quality and privacy services, using the Authentication and Authorisation services of UC and integrating processing algorithms from UFCG and UFMG using BSC’s and UFMG programming models that rely on a infrastructure managed by UPV with the prediction models of POLIMI and UFMG.

As Deliverable D7.5 is of type “demo”, an extended video has been recorded and uploaded to youtube in https://www.youtube.com/watch?v=JoxZE2CaO44 showing the different functionalities of the application. The application is accessible under the URL http://routes4tp.i3m.upv.es/ (Site A) and it is freely open. Furthermore, for the latest updates, bug fixes, and features, please check our development environment at http://forward.i3m.upv.es:9090/ (Site B).

Deliverable 6.3 Techniques and tools to assess the security of Cloud and Data Services

The main problem faced by system administrators nowadays is the protection against unauthorized access or corruption due to malicious actions. In fact, due to the impressive growth of the Internet, security has become one vital concern in any information infrastructure, especially in Cloud computing. Unfortunately, security is still commonly misunderstood, which leads to systems/components being deployed with critical vulnerabilities.

The assurance of the security properties is a concern that is transversal to all the cloud layers. This way, security assessment (e.g., testing, analysis, vulnerability and attack injection, etc.) must be performed to provide a degree of trustworthiness on the security of all the components, and to understand how resistant they are to malicious attempts. The information obtained allows the adjustment of the quality of protection established from the provider’s point of view, thus obtaining a realistic measure of what level of security can be promised.

The goal of this deliverable is to present techniques and tools for assessing the security properties of the software components used to support the EUBra-BIGSEA platform. In practice, we focus on researching assessment techniques targeting the services provided and also the infrastructure beneath, using as inspiration existing techniques in security assessment. This includes testing and analysis techniques (both static and dynamic), and the use of vulnerability and attack injection (techniques that are useful to assess the existing security protection systems deployed). It also includes techniques that can aggregate the outputs of such assessments and transform them in evidences that provide users with a degree of trustworthiness on the security of the services and an estimation of how resilient they are against malicious attacks.

D6.1 (Requirements and Coordinated Security Strategy) defined a strategy to achieve the required level of security with regard to the EUBra-BIGSEA infrastructure. Such strategy was intended to guide the research, development and integration of the security solutions along the project. In practice, the main objective of that document was to define a global security solution able to deal with the security objectives of the project: the provisioning of Authentication, Authorization and Accounting (AAA); the assurance of the security properties of the cloud and Big Data services; and the protection of the data privacy. The result is a list of 30 high-level requirements whose implementation will provide a secure environment for the infrastructure, for the application developers and even for the end users of the applications running inside the framework.

This deliverable (D6.3) focuses on the requirements related with the assurance of security properties. In practice, techniques are presented to: assess the robustness and security of the EUBra-BIGSEA application development services; assess the security of application containers, Cloud Management Frameworks (CMFs), and virtualization infrastructures; benchmark Intrusion Detection Systems (IDSs); test the behavior of NoSQL databases; and provide an overall trustworthiness characterization of Cloud infrastructures. Note that the results of the application of such techniques are not presented, as those will be later addressed in deliverable D6.5. Those results will contribute to the other work packages of the project by supporting the identification of the best configuration of the components in terms of security, the mitigation of existing vulnerabilities in those components, the definition of a potential intrusion prevention strategy, and the provision of an indicator of the level of trustworthiness on the overall platform.

Deliverable 3.4 EUBra-BIGSEA QoS infrastructure services intermediate version

This document describes the first release of the infrastructure services of the EUBra-BIGSEA platform.This deliverable puts together results from previous deliverables, such as the monitoring system, performance models, and the contextualization service to provide the first complete version of the EUBra-BIGSEA QoS-aware infrastructure. In the current version, the platform enables users to provide data processing applications that will be profiled and modeled in a pre-production phase and, then, be deployed for either asynchronous execution (e.g., driven by external triggers such as the availability of new data) or for periodic execution. In both cases, the platform will be able to estimate the initial amount of resources needed to run the application within the specified deadlines and will be able to trigger adaptation of the running infrastructure to match the resources needed in order to satisfy the deadlines.

Deliverable 7.4 Toolbox for GES³ Data Second Release

EUBra-BIGSEA project aims at developing a set of cloud services empowering big data analytics to ease the development of massive data processing applications. EUBra-BIGSEA will develop models, predictive and reactive cloud infrastructure QoS techniques, efficient and scalable big data operators and a privacy and quality analysis framework, exposed to several programming environments. EUBra-BIGSEA aims at covering general requirements of multiple application areas, although it will showcase in the treatment of massive connected society information, and particularly in traffic recommendation.

The project starts with the analysis of the use case scenarios that will be used for demonstration, but considering those requirements in a broader way. EUBra-BIGSEA is an API-centric project whose main objective is to create a sustainable international (European and Brazilian) cooperation activity in the area of cloud services for big data analytics. In particular, T7.2 aims at improving efficiency and throughput of data scientists and data curators.

The Acquisition and Engineering of Georeferenced Environmental, Stationary, Streaming and Social (GES³) data (Task 7.2) is related to the Use Case 1 - (UC1) - Data Acquisition (D7.1). In particular, these data come from sources that are related to urban traffic and cover four main data types: stationary data, dynamic spatial data, environmental data, and social network data. Despite that the EUBra-BIGSEA pilot has been initially planned for the data of the city of Curitiba, where the pilot case is being constructed, the EUBra-BIGSEA framework will be applicable to some extent to other scenarios.

Task 7.4 deals with the creation of predictive models from the aforementioned GES 3 data sources, in order to understand and anticipate traffic and transportation public services scenarios in Brazilian/European cities. These models are based on a specific set of data mining and machine learning supervised techniques such as linear and logistic regression, support vector machines and gaussian processes. Similarly to the descriptive model algorithms implemented in the first release of the Toolbox for GES³ Data (described in D.7.3), the implemented code of this second release will be used as a proof-of-concept in different WPs to, for example, evaluate the infrastructure (WP3), data services (WP4), expressiveness of programming abstractions (WP5) and the identification of security and privacy concerns (WP6). They will also be used to implement the use cases (WP7). Next steps include integration (indirectly) with resource allocation and evaluation of workload (WP3) and improvements in the implementation (convert pending prototypes to WP4 and WP5 technologies).

Together with Task 7.3, Task 7.4 will provide the toolbox needed to implement the complex analytics scenarios of Routes for People Use Case (Task 7.5).

Deliverable 5.1 - EUBra-BIGSEA Software Architecture

The purpose of this report is the design of the software architecture of the EUBra-BIGSEA platform. The document describes the overall functioning and interactions between the platform components, and serves as development roadmap for the developers of the project.

This document summarizes the architectural design to be implemented in the course of the EUBra-BIGSEA project.

It highlights the interrelations among the different work packages involved in the decisions adopted, and also outline the reasoning behind the choices made.

Deliverable 2.2 - User communities engagement and dissemination strategy

The scope of this report is to provide an overview of the user communities targeted by EUBra-BIGSEA as relevant stakeholders to facilitate the sustainability of the project on the one hand, and on the other hand to lay the ground for the technology transfer of the EUBra-BIGSEA assets. It leverages the analysis made in D2.1 to identify the stakeholders, classify them, and determine the right messaging to convey to engage them.

The first outcome of this document is the definition of a clear Engagement and Dissemination strategy targeting the different EUBra-BIGSEA User Communities. Then, the second outcome is a set of individual engagement plans including activities, tools and expected impact for each one of the identified communities. The Engagement strategy and plans targeting the user communities are an important step to maximise the impact of the project. The outcomes will be the basis for the work that will be carried in WP8 to define the exploitation and sustainability plans.

Deliverable 8.1 - Assets, Market Analysis & Stakeholder Synergy Plan

Big Data analytics offers to both private as well as public sectors the promise to provide valuable insight to that can create competitive advantages, foster innovation and scientific discovery and drive efficiency and progress across multiple domains and industries. EUBra-BIGSEA demonstrates the value of using cloud services on applications with high social and business impact, addressing main scenarios of high interest for both Europe and Brazil: processing of massive data coming from highly connected societies. It also demonstrates the value of developing Big Data services for capturing, federating and annotating on the order of PB of data on top of efficient programming models. These Big Data services impose multiple challenges on resource provision, performance, Quality of Service and privacy on a cloud infrastructure and services to support Big Data applications.

This report is the first one of WP8 with the purpose of providing an initial analysis of the exploitable assets, and defining a stakeholder engagement plan targeting the customer segments for both the individual assets and new tools and services, including messages and formats best suited for them. With the focus on the general uptake of technologies, this document analyses the market landscape of the relevant sectors for EUBra-BIGSEA to position the preliminary identified assets.

Deliverable 7.3 - Toolbox for GES3 Data Initial Release

This document deals with the creation of descriptive models from the GES3 aforementioned data sources, in order to understand the dynamics of traffic and transportation public services in Brazilian cities. Furthermore, it describes the elaboration of a toolbox containing descriptive models, their implementation, deployment and application on the smart cities context

Models applies a specific set of data mining and machine learning unsupervised techniques clustering, association rules, feature extraction and common used summarization and aggregation of data. Results from Task 7.3 are the first ones using the GES3 data and computation intensive algorithms. Thus, implemented code has been used as a proof-of-concept in differents WPs to, for example, evaluate infrastructure, expressiveness of programming abstractions, identification of security and privacy concerns and in the realization of the use cases. Next steps include integration (indirectly) with resource allocation and evaluation of workload and improvements in the implementation. Together with Task 7.4, Task 7.3 will provide the toolbox needed to implement the complex analytics scenarios of Routes for People Use Case (Task 7.5).

Deliverable 7.2 - GES3 Data Integration

EUBra‐BIGSEA project aims at developing a set of cloud services empowering Big Data analytics to ease the development of massive data processing applications. EUBra‐BIGSEA will develop models, predictive and reactive cloud infrastructure QoS techniques, efficient and scalable Big Data operators and a privacy and quality analysis framework, exposed to several programming environments. 

The Acquisition and Engineering of Georeferenced Environmental, Stationary, Streaming and Social (GES3) data (Task 7.2) is related to the Use Case 1 ‐ (UC1) ‐ Data Acquisition (D7.1). In particular, these data come from sources that are related to urban traffic and cover four main data types: stationary data, dynamic spatial data, environmental data, and social network data. Despite that the EUBra‐BIGSEA pilot has been initially planned over the data of the city of Curitiba, where the pilot case is being constructed, the EUBra‐BIGSEA framework will be applicable (at least partially) to other scenarios. Therefore, the data integration covers the general problem of mechanisms for collecting, cleaning, transforming and integrating all the listed data sources, in order to understand the dynamics of traffic and transportation public services in Brazilian cities.

After a description of all data sources, the integration process has gone through the following steps:

  1. Data sources within the same theme were integrated (such as official sources);
  2.  We performed an integration along different data types (such as stationary and dynamic spatial data);
  3.  We identified their issues as data quality, entity matching, or data mining problem;
  4.  We identified mechanisms to improve their integration and quality for the final user.

Deliverable 5.2 - Programming abstractions design

This document describes the implementation of the programming model prototypes developed as a part of the EUBra-BIGSEA platform. The programming models offer the tools to abstract the data services to the user scenarios and execute them on the QoS infrastructure. COMPSs and Apache Spark are the two available frameworks for the porting of the scenarios. This document, together with the description of the software components available in the project’s repository, realizes the milestone MS13 First release of the programming layer.

  • COMPSs applications can be written in sequential Java, Python or C/C++, and make use of other higher-level software components, such as OPHIDIA workflows. Sequential code is instrumented with data flow information that COMPSs uses to infer parallelism. COMPSs is platform agnostic and deals both with the execution and the negotiation with the computing infrastructure to request the necessary resources for the execution of the workflows. In this project, COMPSs has been extended to create a Mesos framework and to support NoSQL storage. Additional dependencies are easily coded inside COMPSs jobs through the use of Docker containers.

 

  • Lemonade (Live Exploration and Mining Of Non-trivial Amount of Data from Everywhere) is a visual platform for distributed computing, aimed to enable implementation, experimentation, testing and deployment of data processing and machine learning applications. It provides developers with high-level abstractions, called operations to build processing workflows using a graphical web interface. Lemonade currently generates Spark code, and it will be extended to support COMPSs workflows during the second year. Lemonade provides (or will provide) many operations typically used for Extraction, Transformation and Loading (ETL), including Data transformation, Machine Learning, Statistic analysis, Text processing and Data visualization. Lemonade is formed by a set of components which provide the whole functionality

Deliverable 3.3 - BIGSEA QoS infrastructure services initial version

This document describes the cloud services to be used by the other components in the project EU-Bra BIGSEA to execute the processing workload. The document covers the implementation details of the architecture depicted in D3.1 QoS Monitoring System Architecture.

The document describes how to deploy a fully functional cluster, how to access it and how the components are integrated. Through the document you will find the instructions to tune-up the BIGSEA cloud services recipes to deploy them on different IaaS, as well as the security mechanisms for deploying frameworks.

Finally, it covers the QoS Monitoring system, based on Monasca, describing how to implement probes and deploy agents.

Deliverable 3.2 - Big Data Application Performance models and run-time Optimization Policies

This document describes the performance methods and tools to predict the execution time of big data applications that have been developed during the first year of EUBra-BIGSEA. Moreover, this report introduces the optimization models which will be the core of horizontal elasticity policies. Optimization-based policies will trigger the configuration of the cloud infrastructure providing QoS guarantees for big data applications execution while minimising resource usage costs.

The ultimate goal of the optimization based policies that will be developed within EUBra-BIGSEA is to determine the system configuration leading to the minimum cost, embedding performance prediction methods in an optimization framework. Performance models will be kept alive at run-time, and will be integrated within an optimization algorithm, to provide the proactive-based approach with the insights to perform the dynamic adjustment of the system configuration.

Deliverable 3.1 - QoS Monitoring System Architecture

The Quality of Service (QoS) architecture is the computational core of the EUBra-BIGSEA platform. The performance of data analytics applications running on the EUBRra-BIGSEA platform are profiled in advance, so a QoS guarantee is defined based on the performance requirements. 

This document describes the QoS Monitoring System Architecture as well as the software architecture of other cloud-service related components and their interactions. The purpose of this report on the QoS Monitoring System Architecture is to define the software components that will collect the execution data from the cloud architecture, as well as the main components that intervene in the full process of deployment, configuration, contextualization and execution. Each component is described in terms of its external interfaces and dependencies on other components.

The project identifies three types of workloads: (i) persistent, (ii) periodic batch and (iii) interactive jobs, which will be served by different schedulers. Persistent jobs will be served by the Marathon scheduler, periodic jobs by means of Chronos scheduler, and interactive jobs through interactive shells (e.g. spark shells). Those schedulers will deploy frameworks that will embed the executable services and negotiate the resources with Mesos.

Deliverable 2.3 - Preliminary Action Plan Report

Cloud computing, Big Data technologies, the Internet of Things (IoT), 5G communications and cyber security are the building blocks of the digital economy. The uptake of cloud computing and virtualised infrastructures plays an essential role in enabling the transition towards a distributed global community, enhancing collaborative work and tackling the challenges of big data.
The cooperation between Europe and Brazil seeks to sustain and enhance the social and economic conditions, increase competitiveness, creating jobs, and addressing common global challenges in areas like energy, international cyber policy, sustainable development, climate change, and the environment.
Policy collaboration to date has included work on identifying barriers that may preclude the adoption of cloud-based services in Europe and in Brazil and on identifying concrete joint initiatives to minimise such barriers.Current EU-Brazil collaboration is expected to advance cloud-centric applications for big data, and move forward towards facilitating policy coordination between the EU and Brazil.

EUBra-BIGSEA is committed to making a significant contribution to the cooperation between Europe and Brazil in the area of advanced cloud services for Big Data applications. EUBra-BIGSEA facilitates the integration of European and Brazilian technologies and experiences to bring forward scientific innovation through a use case scenario approach that is important for both Europe and Brazil, furthermore it is looking into the current challenges, research and innovation opportunities and highlighting as well the relevant EU-Brazil joint effort initiatives in the areas addressed.

Download the EUBra-BIGSEA Preliminary Action Plan Report to find out more 

 

Deliverable 7.1 - End-User Requirements Elicitation

This document presents the requirement analysis of the massively connected society use case that will be used for demonstration. The use case deals with traffic data analysis and the requirements cover both the perspective of the data processors and the final end-users.
The requirements analysis process has been implemented in four major steps:

  1. Preparation of a questionnaire distributed among data analysis developers to identify the requirements and data sources.
  2. Analysis of the general project scenario, which is split into three use cases: Data Acquisition, Descriptive Models and Predictive Models.
  3. identification of User Stories - individual descriptions of whole interactions of the users with the system per use case.
  4. Identification of 25 functional and non-functional requirements from the use stories per Use Case, used to identify 18 technical requirements.

The requirements from the Use Cases serve to guide the implementation of the application and services that will consume the Big Data services of EUBra-BIGSEA platform. The 18 technical requirements are directly related to the EUBra-BIGSEA platform developers, and address functionalities such as: integration of external data sources, Bag of Tasks, QoS-bounded submission, self-adaptive elasticity, privacy annotation, authentication, and data privacy protection.

Deliverable 6.1 - Requirements and Coordinated Security Strategy

This document identifies the security requirements that will drive the implementation of the EUBra-BIGSEA global security solution addressing: (i) the provisioning of Authentication, Authorization and Accounting (AAA), (ii) the assurance of the security properties of the cloud and Big Data services, and (iii) the protection of the data privacy.

The 30 high level requirements will ensure the desgin and implementaiton of a secure environment for the infrastructure, for the application developers and even for the end users of the applications running inside the framework. The defined solution includes two distinct AAA blocks:

  • A EUBra-BIGSEA Infrastructure AAA Service, to provide the AAA functionalities to infrastructure managers and application developers/providers;
  • EUBra-BIGSEA Applications AAAaaS, to serve the end users of applications hosted in the EUBra-BIGSEA.

The document also revise the state of the art and includes the security assessment of key infrastructure components and the development of solutions for the issues uncovered, the benchmarking and improvement of intrusion detection systems, and the proposal of metrics to characterize the trustworthiness of the system, together with the definition of two distinct privacy control barriers, responsible of protecting the anonymity of both the raw data to be used and of the data resulting from the predictive and descriptive models built.

Deliverable 4.1 - Design of the integrated big and fast data eco-system

Starting from the end-user requirements highlighted in deliverable D7.1, this document defines the architecture of the integrated fast and Big Data ecosystem, which represents the central data management component of the EUBra-BIGSEA platform.

The proposed architecture integrates multiple classes of big data systems. Two different aspects of the proposed architecture are:

  • A comprehensive evaluation and assessment of the big data tools available in the general landscape from data storage, access, analytics and mining standpoint.
  • deep data sources analysis in terms of data model, formats, volume, metadata, and functional needs.

Key features highlighted in this document are also the integration of different classes of big/fast data tools to address multifaceted use cases requirements, the dynamicity and elasticity of the environment jointly with a secured-by-design ecosystem.
The proposed architecture joins all these elements in a cloud environment aiming at providing, to some extent, a general approach to deal with the high social impact use cases and scenarios like the one proposed in the project.

Deliverable 2.1 - Communication Strategy And Web Platform Development

This document defines the whole communication strategy of the project, outlining in particular the channels and tools that have been and will be selected for the communication to and engagement with each specific target stakeholder. The objective of the communication and dissemination strategy is to support the project goals through an effective communication and engagement approach, which ensures the wide promotion and high visibility around the innovation and benefits provided by the EUBra-BIGSEA to its stakeholders.