| US 7,567,188 B1 | ||
| Policy based tiered data deduplication strategy | ||
| Matthew J. Anglin, Tucson, Ariz. (US); David M. Cannon, Tucson, Ariz. (US); Colin S. Dawson, Tucson, Ariz. (US); and Howard N. Martin, Tucson, Ariz. (US) | ||
| Assigned to International Business Machines Corporation, Armonk, N.Y. (US) | ||
| Filed on Apr. 10, 2008, as Appl. No. 12/100,695. | ||
| Int. Cl. H03M 7/46 (2006.01) | ||
| U.S. Cl. 341—63 [707/202] | 25 Claims |

| 1. A method for applying a deduplication strategy to a data object based on a data storage policy, comprising:
defining a plurality of data storage policies for a deduplication pool, each data storage policy containing settings including
a maximum reference count for data chunks;
classifying the data object within a selected data storage policy;
dividing the data object into a plurality of data chunks, each data chunk having reference count data to track a number of
references thereto;
storing each data chunk of the data object in the deduplication pool if the selected data storage policy does not allow deduplication
of the data object; and
performing deduplication on the data object if the selected data storage policy allows deduplication of the data object, including
for each data chunk of the data object:
initializing the data chunk reference count data and storing the data chunk in the deduplication pool if a previously stored
identical copy of the data chunk does not exist in the deduplication pool,
updating the reference count data of and creating a pointer to a previously stored identical copy of the data chunk if the
previously stored identical copy of the data chunk exists in the deduplication pool and has a reference count less than the
selected data storage policy maximum reference count, and
initializing the data chunk reference count data and storing the data chunk in the deduplication pool if each previously stored
identical copy of the data chunk existing within the deduplication pool contains a reference count equal to or greater than
the selected data storage policy maximum reference count.
|