US 7,567,188 B1
Policy based tiered data deduplication strategy
Matthew J. Anglin, Tucson, Ariz. (US); David M. Cannon, Tucson, Ariz. (US); Colin S. Dawson, Tucson, Ariz. (US); and Howard N. Martin, Tucson, Ariz. (US)
Assigned to International Business Machines Corporation, Armonk, N.Y. (US)
Filed on Apr. 10, 2008, as Appl. No. 12/100,695.
Int. Cl. H03M 7/46 (2006.01)
U.S. Cl. 341—63  [707/202] 25 Claims
OG exemplary drawing
 
1. A method for applying a deduplication strategy to a data object based on a data storage policy, comprising:
defining a plurality of data storage policies for a deduplication pool, each data storage policy containing settings including a maximum reference count for data chunks;
classifying the data object within a selected data storage policy;
dividing the data object into a plurality of data chunks, each data chunk having reference count data to track a number of references thereto;
storing each data chunk of the data object in the deduplication pool if the selected data storage policy does not allow deduplication of the data object; and
performing deduplication on the data object if the selected data storage policy allows deduplication of the data object, including for each data chunk of the data object:
initializing the data chunk reference count data and storing the data chunk in the deduplication pool if a previously stored identical copy of the data chunk does not exist in the deduplication pool,
updating the reference count data of and creating a pointer to a previously stored identical copy of the data chunk if the previously stored identical copy of the data chunk exists in the deduplication pool and has a reference count less than the selected data storage policy maximum reference count, and
initializing the data chunk reference count data and storing the data chunk in the deduplication pool if each previously stored identical copy of the data chunk existing within the deduplication pool contains a reference count equal to or greater than the selected data storage policy maximum reference count.