Improved Estimation of Attribute Privacy Disclosure Risk Using Machine Learning
DOI:
https://doi.org/10.64290/bima.v9i2A.1087Keywords:
Attribute vulnerability to disclosure attack, Data augmentation, Privacy preserving anonymization; Privacy-Preserving data publishing.Abstract
Personal data is widely used in predictive modelling and data analytics across various domains such as healthcare. Privacy-Preserving Data Publishing has emerged as sets of techniques for privacy protection when sharing data with analysts or researchers. It aims to balance the privacy of users with the utility of the dataset. One of the most popular approaches to privacy protection when publishing data is anonymization but it often leads to loss of data utility when applied uniformly across all attributes without considering the specific vulnerabilities of the different attributes. An approach was proposed to use machine learning to estimate the contribution of individual attributes to disclosure attack. However, the main challenge with the approach is its computational intensity and insensitivity to minority cases. This paper proposes to improve the assessment of attribute vulnerability to disclosure attacks and reduce computation overhead using adaptive outlier management, a fusion of data augmentation with Conditional Tabular Generative Adversarial Network, balancing using Synthetic Minority Over-sampling Technique for Nominal Data, and stratified sampling. The proposed approach results in over 38% reduction in computation overhead. Sensitivity to vulnerable attributes was also improved by up to 20.69%.