计算机学报

	Chinese Journal of Computers Full Text
Title	Privacy Preserving Approaches for Multiple Sensitive Attributes in Data Publishing
Authors	YANG Xiao-Chun WANG Ya-Zhe WANG BinYU Ge
Address	(School of Information Science and Engineering, Northeastern University, Shenyang 110004)
Year	2008
Issue	No.4(574—587)
Abstract & Background	Abstract Current privacy preserving data publishing techniques concentrate on tables with only one sensitive attribute. However, most of the real-world applications contain multiple sensitive attributes. Directly applying the existing single-sensitive-attribute privacy preserving techniques often causes unexpected private information disclosure. This paper firstly discusses the problem of secure publishing data when sensitive data contains multi attributes, and then propose a multi-dimensional bucket grouping approach on the idea of lossy join, called Multi-Sensitive Bucketization (MSB). In order to avoid exhausting search, three specific line-time greedy based MSB algorithms are proposed, which are maximal-bucket first algorithm (MBF), maximal single-dimension-capacity first algorithm (MSDCF), and maximal multi-dimension-capacity first algorithm (MMDCF). In addition, according to the differences among published data, a weighted MSB approach is further proposed. Experimental results on the real-world datasets show that the addition information loss of the proposed MSB methods were not more than 0.04 and the suppression ratios were less than 0.06. The weighted MSB approach can guarantee more than 70% publishing ratio. keywords data publishing; data privacy; multi-sensitive attributes; lossy join; l-diversity background This research is supported by Program for New Century Excellent Talents in University under grant No.NCET-06-0290, the National Natural Science Foundation of China under grant No.60503036, and Fok Ying Tung Education Foundation under grant No.104027. This paper focuses on the field of privacy preserving data publishing. The research group has done much research work in designing high efficient and practical privacy preserving data publishing algorithms and other techniques in data privacy, such as privacy preserving data outsourcing, location privacy, and so on. Privacy preserving data publishing problem is an important branch of data privacy. The problem of linking-attack, one of the main cause of revealing private information, was firstly formalized by L.Sweeney and P.Samaranti. And they presented the k-anonymity model to prevent such linking-attack. Generalization and suppression are the general way to achieve k-anonymity. Then many work have been done to design efficient, scalable, and flexible k-anonymity algorithms, which balance the trade-off between privacy and data usability as well as possible. The classical k-anonymity algorithms include Bayardo-Agrawal’s k-Optimize algorithm, Incognito algorithm, Mondrain algorithm, and so on. There are also lots of researches focus on the ineffectiveness of the k-anonymity model under some circumstances. They present several models as complementary to the traditional k-anonymity model, such as l-diversity, t-closeness, m-confidentiality, and so on. Recently, there presented several lossy-join based methods. The most important advantage of those methods is that it can assure the accuracy of the published data than those generalization and suppression based techniques. However, all the current privacy preserving data publishing techniques concentrate on table with only one sensitive attribute. As far as we are concerned, most of the real-world applications contain multiple sensitive attributes. The existing single-sensitive-attribute privacy preserving techniques cannot guarantee the security of the sensitive information with multiple sensitive attributes. In this work, the authors firstly illustrate the multi-sensitive-attribute private data publishing problem and present several efficient algorithms to solve the problem.