计算机学报

	Chinese Journal of Computers Full Text
Title	Privacy Preserving Naive Bayes Classification
Authors	ZHANG Peng1) TANG Shi-Wei2)
Address	1)(Beijing Research Institute, China Telecom Corporation Limited, Beijing 100035) 2)(School of Electronics Engineering and Computer Science, Peking University, Beijing 100871)
Year	2007
Issue	No.8(1267—1276)
Abstract & Background	Abstract Privacy preserving data mining is to discover accurate patterns without precise access to the original data. This paper focuses on privacy preserving classification, and presents a privacy preserving Naive Bayes classification approach based on data randomization and feature reconstruction. An ERRPH (Extended Randomized Response with Partial Hidding) method and a TRR (Transforming Randomized Response) method are respectively presented for enumerated data and numerical data. Then, a privacy preserving Naive Bayes classification algorithm is implemented based on those methods. Theoretical analyses show that it can provide better privacy, accuracy, efficiency, and applicability. The effectiveness is also verified by experiments. keywords data mining; privacy preservation; Naive Bayes classification; data randomization; feature reconstruction background This work is supported by the National Natural Science Foundation of China under grant No.60403041, and the Dissertation Foundation of Beijing Municipal Science and Technology Commission under grant No.ZZ6027. Nowadays, there is growing concern with the privacy implications of data mining. How to solve the privacy preserving problems during the mining process has become one of the most important topics in data mining. Data mining has an essential property that the patterns from large amounts of data usually depend on the aggregate and statistical data, but not the individual data records. Then, privacy preserving data mining that is to discovery accurate patterns without precise access to the original data has become a novel research direction. In this paper, the authors present a privacy preserving Naive Bayes classification approach based on data randomization and feature reconstruction. An ERRPH method and a TRR method are respectively presented for enumerated data and numerical data. Then, a privacy preserving Naive Bayes classification algorithm is implemented based on those methods. Theoretical analyses and experimental results show that it can provide better privacy, accuracy, efficiency, and applicability. The research group has been working on privacy preserving date mining since 2004, and issued about ten papers including the scheme, the workflow, and the evaluation measures for privacy preserving date mining, privacy preserving association rule mining approaches, privacy preserving classification approaches, and the effect of correlation of attributes for privacy preservation.