The combination of data and technology is having a high impact on the way we live. The world is getting smarter thanks to the quantity of collected and analysed data, i.e., to the Big Data sources.
In such a scenario, the EUBra-BIGSEA project aims to develop a cloud platform for Big Data management and exploitation. In particular, in the project, cloud services able to empower Big Data analytics and thus able to support the development of data processing applications have been designed. Such services have been developed by considering Big Data issues (i.e., data volume, variety, velocity and veracity), QoS, privacy, and security constraints.
This deliverable focuses on Data Quality (DQ) that is a fundamental ingredient in order to effectively exploit Big data. In fact, data quantity can create a real value only if combined with data quality: good decisions and actions are the results of correct, reliable and complete data. In such a scenario, methods and techniques able to evaluate the quality of the available data are needed. In fact, most of the literature contributions in this field are related to structured data. New algorithms have to be designed in order to deal with novel requirements related to variety, volume and velocity issues. Such methods are provided by the Data Quality-aware Service (DQaS) that is composed of two main modules:
● DQaaS (Data Quality as a Service): it is in charge to provide a descriptive view of the quality of the sources with the aim to support the analytics applications in understanding which are the relevant and useful data to consider in the more advanced analyses;
● EMaaS (Entity Matching as a Service): it supports data integration by providing approaches for managing entity matching.
This deliverable describes these two modules illustrating their functionalities and performance.