Data de-duplication also called single-instance storage and intelligent compression, is a process that removes redundant copies of data as well as lessens storage overhead. This is accomplished by making sure that only a single unique item of data is stored on storage media by replacing redundant data with a pointer to the copy of the unique data. An example of how this works would be if your company had to email the same 1MB file to all 200 of its employees. Instead of taking up 200MB of storage by storing all instances of that file, only one instance of the file will be saved with data de-duplication. Data de-duplication can happen at the target or source level. In target-based de-duplication, backups are sent across the network to remote disk-based hardware whereas, in source-based de-duplication, it removes all redundant blocks before sending data to the backup target at either the server or client level.
There are also different techniques used to de-duplicate data; post-processing and inline. Which method one uses depends on the backup environment. Inline de-duplication examines data as it enters the backup system and removes redundancies. This method requires less backup storage, however, can often cause bottlenecks. Post-processing de-duplication in an asynchronous process which eliminates redundant data AFTER it is written to storage. Data de-duplication, while like compression, is different in that they each operate a unique way. Data de-duplication searches for redundant data while compression reduces the number of bits to represent data by using an algorithm. Data de-duplication offers many advantages such as a potential reduction of data footprints, longer retention intervals, reduced tape backups, quicker recovery times, and lessened bandwidth consumption. Do to the always rising number of volumes of data, data de-duplication offers better utilization of storage devices as well as network bandwidth. This benefit can turn into savings on equipment, floor space as well as energy. Application savings can often happen as well for email and other data management applications.
Rouse, Margaret. (n.d.). Data Deduplication. Retrieved from .