| US 7,464,068 B2 | ||
| System and method for continuous diagnosis of data streams | ||
| Wei Fan, New York, N.Y. (US); Haixun Wang, Tarrytown, N.Y. (US); and Philip S. Yu, Chappaqua, N.Y. (US) | ||
| Assigned to International Business Machines Corporation, Armonk, N.Y. (US) | ||
| Filed on Jun. 30, 2004, as Appl. No. 10/880,913. | ||
| Prior Publication US 2006/0010093 A1, Jan. 12, 2006 | ||
| Int. Cl. G06F 17/30 (2006.01) | ||
| U.S. Cl. 707—1 [707/5] | 9 Claims |

| 1. An apparatus for facilitating the mining of time-evolving data streams, said apparatus comprising: an input arrangement for accepting a data stream comprising unlabeled data; and an arrangement for determining an amount of drifts in the data stream comprising unlabeled data; said determining arrangement: employs a signature profile of an inductive model in determining an amount of drifts in the data stream; reconstructs the inductive model via actively acquiring true labels for a small sample of the unlabeled data in the data stream in order to estimate loss, wherein the inductive model is reconstructed if the estimated loss is more than an empirically determined threshold; and employs statistical measures to estimate the error rate of the inductive model; wherein reconstruction of an original decision tree comprises at least one of: updating a class probability distribution in leaf nodes in the tree; and extending leaf nodes in the tree. |