EUBra-BIGSEA provides a Position Paper for the Cloudscape Brazil 2016 Conference, offering insight into its Cloud QoS for Big Data Applications. Read below the contribution from the European Projoject Coordinator, Ignacio Blanquer (Universidad Politécnica de Valencia), and his Brazilian counterpart, Wagner Meira Jr. (Universidade Federal de Minas Gerais).
Big Data Applications normally deal with dynamic streams of data that require a periodic processing. This is indeed the case of public transport data, which require the repetitive processing of massive data with a deadline. Adjusting the rightmost amount of resources to achieve the processing according to a SLA is a challenge to minimize cost without compromising Quality of Service. Resource Management Frameworks tend to the overestimation of requirements both in memory and computing.
Who stands to benefit and how:
EUBra-BIGSEA is an-API centric project for providing cloud services to Data Analytics applications. Therefore, EUBra-BIGSEA aims at application developers and Data scientists, who could run their processing code efficiently on a self-adjusted platform. As main use case, EUBra-BIGSEA aims at Public Transportation analysis, and it will develop an application for citizens and municipalities to plan their journey or to evaluate their transportation network.
EUBra-BIGSEA is a European - Brazilian collaboration of Big Data Scientific Research through Cloud-Centric Applications. EUBra-BIGSEA will develop a programming interface for developing data analytic applications that will leverage proactive and reactive elasticity policies on top of a cloud-agnostic platform. EUBra-BIGSEA has defined a cloud QoS software architecture that will be able to deal with the three types of jobs that has been identified through the requirement analysis: long-running high-availability jobs, periodic and deadline-bounded batch jobs and interactive tasks. EUBra-BIGSEA uses a container model to embed client software dependencies and integrates several schedulers for addressing the three types of jobs, namely Marathon, Chronos and YARN. This way, the same resource infrastructure is managed by a single service that skips the need of statically partitioning the data centre, adjusting the active resources to the workload, which enables not only mixing multiple heterogeneous workloads but also the allocation of resources through local policies, thanks to a software component called EC3. EUBra-BIGSEA instruments a Mesos framework with an elastic provisioning system that will allocate and reconfigure the Mesos slaves on demand. A service implements proactive policies to determine the amount of resources that should be assigned to a specific job to meet a given deadline. This service learns from the experience of past executions and readjust the request in the further iterations. The reactive policies will allocate more resources if the job is about to miss the deadline. The use of container applications facilitates the overcommitment of memory by the Mesos framework, by means of a Cloud Virtual Machine Automated Procurement system (CLOUDVAMP) which over-commits real memory. Finally, the system will be monitored by means of OpenStack MONASCA, which will trigger the resource reallocation when the metrics fall from a given threshold. This platform main use case is the periodic execution of a data processing algorithm over a fresh data coming from the activity of the previous day and beyond. These processing involves a series of steps that need to be done in a given order and which should be completed for a specific deadline. Data must be retrieved and preprocessed, to infer routes and vehicles usage. Models for delay estimation must be trained with the new data before the rush hour, but not so early that the data used is not updated. Therefore, this constitutes a clear use case for deadline-based QoS periodic jobs.