DQaaS (Data Quality-as-a-Service) aims to provide information about the quality of a requested dataset. Data Quality helps applications and users to understand the degree with which a dataset is suitable for their goals. In particular, considering a dataset, the service (i) offers the access to different quality metrics periodically evaluated and (ii) allows applications and users to define and assess personalized quality metrics.
DQaaS is designed for dealing with Big Data, thus it addresses volume and velocity requirements. In particular, the algorithms have been developed on architectures able to support parallelization and when applications/users request real time quality analyses, only a sample of data will be considered. These choices aim to reduce the impact that such service can have on the system performance.
Politecnico di Milano (POLIMI) designed and developed the tool and data quality assessment algorithms. Currently, effort is dedicated to the improvement of the interaction between users/applications and the data quality service. Developed under the EUBra-BIGSEA mandate, the tool can be used by data scientists or software developer to understand the quality of the datasets that they are considering for the development of their big data applications. It is an adaptive preprocessing service that can easily be exploited on various big data platforms.