Anonymization on Refining Partition: Same Privacy, More Utility

Hong Zhu, Shengli Tian, Genyuan Du, Meiyi Xie

In privacy preserving data publishing, to reduce the correlation loss between sensitive attribute (SA) and non-sensitive attributes(NSAs) caused by anonymization methods (such as generalization, anatomy, slicing and randomization, etc.), the records with same NSAs values should be divided into same blocks to meet the anonymizing demands of ℓ-diversity. However, there are often many blocks (of the initial partition), in which there are more than ℓ records with different SA values, and the frequencies of different SA values are uneven. Therefore, anonymization on the initial partition causes more correlation loss. To reduce the correlation loss as far as possible, in this paper, an optimizing model is first proposed. Then according to the optimizing model, the refining partition of the initial partition is generated, and anonymization is applied on the refining partition. Although anonymization on refining partition can be used on top of any existing partitioning method to reduce the correlation loss, we demonstrate that a new partitioning method tailored for refining partition could further improve data utility. An experimental evaluation shows that our approach could efficiently reduce correlation loss.