US 7,475,085 B2
Method and apparatus for privacy preserving data mining by restricting attribute choice
Charu C. Aggarwal, Mohegan Lake, N.Y. (US); and Nagui Halim, Yorktown Heights, N.Y. (US)
Assigned to International Business Machines Corporation, Armonk, N.Y. (US)
Filed on Apr. 04, 2006, as Appl. No. 11/397,297.
Prior Publication US 2007/0233711 A1, Oct. 04, 2007
Int. Cl. G06F 17/30 (2006.01)
U.S. Cl. 707—101  [707/1; 707/2; 707/9; 707/100; 707/103 R] 21 Claims
OG exemplary drawing
 
1. A method of generating at least one output data set from at least one input data set for use in association with a data mining process, the input data set comprising at least one entry including each of a plurality of attributes, comprising the steps of:
determining at least one relevance coefficient for at least a subset of the plurality of attributes;
selecting at least one relevant attribute of the at least one input data set based at least in part on the at least one relevance coefficient; and
generating the at least one output data set from the at least one input data set;
wherein the at least one output data set comprises at least one entry not including at least one of the plurality of attributes; and
wherein the at least one entry of the output data set has the at least one relevant attribute of the at least one input data set;
wherein the at least one relevance coefficient is computed using a quantitative measure of an effect on the data mining process of a deletion of at least the given attribute from each entry of the input data set.