DQaaS - Data Quality-as-a-Service
DQaaS (Data Quality-as-a-Service) is a service that aims to provide information about the quality of a requested dataset. Data Quality helps applications and users in understanding the degree with which a dataset is suitable for their goals. In particular, considering a dataset, the service (i) offers the access to different quality metrics periodically evaluated and (ii) allows applications and users to define and assess personalized quality metrics.
DQaaS is designed for dealing with Big Data, thus it addresses volume and velocity requirements. In particular, the algorithms will be developed on architectures able to support parallelization and when applications/users request real time quality analyses, only a sample of data will be considered. These choices aim to reduce the impact that such service can have on the system performance.
- DQaaS is currently still under development and some preliminary tests have been conducted in the academic environment.
- As of the end of April 2017, a first release is available on GitHub.
- At the moment DQaaS uses as input data the data sources available for EUBra-BIGSEA use case. Open data can also be used.
- The results that are expressed in terms of data quality dimensions.