| US 7,599,893 B2 | ||
| Methods and systems for feature selection in machine learning based on feature contribution and model fitness | ||
| Marina Sapir, Mamaroneck, N.Y. (US); Faisal M. Khan, New Rochelle, N.Y. (US); David A. Verbel, New York, N.Y. (US); and Olivier Saidi, Greenwich, Conn. (US) | ||
| Assigned to Aureon Laboratories, Inc., Yonkers, N.Y. (US) | ||
| Filed on May 22, 2006, as Appl. No. 11/438,789. | ||
| Claims priority of provisional application 60/726809, filed on Oct. 13, 2005. | ||
| Prior Publication US 2007/0112716 A1, May 17, 2007 | ||
| Int. Cl. G06F 15/18 (2006.01) | ||
| U.S. Cl. 706—12 [706/14; 706/20; 706/47; 382/128; 382/129; 382/133; 382/134; 600/300; 600/301] | 26 Claims |

| 1. A method for selecting features for a final prediction rule predictive of an outcome with respect to a medical condition,
said method comprising:
performing with a computer-implemented machine learning tool:
(a) generating a prediction rule based on training data for a cohort of patients whose outcomes with respect to said medical
condition are at least partially known, wherein for each patient the data comprises measurements for a set of features and
the outcome with respect to said medical condition for said patient to the extent known, wherein in a first iteration of (a)
said set of features includes n features with n greater than or equal to 3 with n being decremented by one in each subsequent
iteration of (a);
(b) determining a fitness value for said prediction rule, wherein said determining a fitness value comprises summing a concordance
index (CI) of said prediction rule with a product of a sensitivity and a specificity of said prediction rule;
(c) determining a value of contribution to said prediction rule for each of said features in said set of features;
(d) removing a feature from consideration from said set of features based on the values of contribution, wherein the feature
having the lowest value of contribution is removed;
(e) iterating (a)-(d) in order to produce n prediction rules and n fitness values; and
(f) selecting, based on the fitness values for said n prediction rules, one of said n prediction rules as said final prediction
rule predictive of the outcome with respect to said medical condition, wherein of said n prediction rules said final prediction
rule has the highest predictive ability with respect to the outcome with respect to said medical condition as indicated by
said fitness values; and
evaluating data for a patient with a computer implementation of said final prediction rule to produce a value predictive of
the patient's outcome with respect to said medical condition.
|