Administrators & cloud service providers
Open Source communities & application developers
Data Analytic development framework
PRIVAaaS is a set of libraries and service that allows controlling and reducing data leakage in the context of Big Data processing and, consequently, protecting sensible information processed by data analytics platforms, with multiple types of anonymization techniques. The innovation is in the three-levels anonymization process, which adapts existing anonymization techniques to better deal with the trade-off between anonymization and data utility. As PRIVAaaS is based on policies, it improves privacy laws and regulations compliance throughout the anonymization process. Furthermore, it provides a re-identification risk-based component, which can help to avoid data re-identification after data is available for external visualization. The partners involved are University of Campinas (Brazil) and University of Coimbra (Portugal).
Scientific sector – universities and research centres that develop/maintain data analytics platforms (Ophidia – CMCC and LEMONADE - UFMG)
User scenario
PRIVAaaS is used to provide data privacy protection to data analytics platforms, in a way that it affects data utility as little as possible, improving the results of this platform. Furthermore, it calculates the re-identification risk before data is publicly available, allowing the increase of anonymization level. It helps avoiding data leakage and data re-identification, even under privacy attacks.
A real user scenario is the integration of PRIVAaaS in data analytics platform (Ophidia/Lemonade). The platform receives data sets from different sources as input to mine. These data sets contain personally identification information (PII) and sensitive information that can raise privacy concerns. Anonymization policies, based on privacy laws and regulations and data source owners requirements, are provided for these data sets. Before data is loaded to the platform, i.e., in the ETL (Extract, Transform, Load) process, anonymization is applied based on less restrictive rules of the policies. This allows lower level of anonymization and higher data utility, once data is going to be mined and the higher the data utility, the best the mining results are. Then, before data mining results leave the platform, the re-identification risk (i.e., the risk of records from the resulting dataset to be re-identified using external information) is calculated. The risk is compared with a threshold and, if necessary, the anonymization level is increased until reach the threshold.
Case studies of PRIVAaaS integrated with Ophidia showed that the benefits of using PRIVAaaS is the increase of data privacy protection for this platform.
Downloadable from: https://github.com/eubr-bigsea/PRIVAaaS
Supporting documentation is in the README.MD file https://github.com/eubr-bigsea/PRIVAaaS/blob/master/README.md
PRIVAaaS is open-source and free software.
The user needs to have knowledge about anonymization techniques and anonymization policies, as well as the operation of the data analytics platform that will integrate PRIVAaaS.
Regina Moraes: regina@ft.unicamp.br
View related publicatios
--> BASSO, T.; MORAES, R. ; ANTUNES, N. ; VIEIRA, M. ; SANTOS, W. ; MEIRA JR., W. . PRIVAaaS: privacy approach for a distributed cloud-based data analytics platforms. In: International Workshop On Assured Cloud Computing And QoS Aware Big Data, 2017, Madri. 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, 2017.
--> FERREIRA, A. ; BASSO, T. ; SILVA, H. ; MORAES, R. . PRIVA: a policy-based anonymization library for cloud and big data platform. In: XVIII Workshop de Testes e Tolerância a Falhas, 2017, Belém. XXXV Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos.
--> MATSUNAGA, R. ; RICARTE, I. ; BASSO, T. ; MORAES, R. . Towards an Ontology-Based definition of Data Anonymization Policy for Cloud Computing and Big Data. In: International Workshop on Recent Advances in the Dependability Assessment of Complex Systems, 2017, Denver. 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2017.
--> SILVA, H. ; BASSO, T. ; MORAES, R. . Privacy and data mining: evaluating the impact of data anonymization on classification algorithms. In: 13th European Dependable Computing Conference, 2017, Geneva. 13th European Dependable Computing Conference, 2017.
--> BASSO, T.; MATSUNAGA, R. ; MORAES, R. ; ANTUNES, N. . Challenges on Anonymity, Privacy and Big Data. In: Workshop on Dependability in Evolving Systems, 2016, Cali. 7th Latin-American Symposium on Dependable Computing, 2016.