Dedupe or Not Dedupe: That is the Question!
Posted by Boris Yurchenko on February 22, 2018

When planning or improving an IT infrastructure, one of the most difficult challenges is defining the correct approach to developing it so that it requires as little changes at further scaling up as possible. Keeping this in mind is really important, as at some point almost all environments reach the state where the necessity of growth becomes vivid. While hyperconvergence is popular these days, managing large setups with less resources involved becomes quite easy.

Today I will deal with data deduplication analysis. Data deduplication is a technique that helps to avoid storing repeated identical data blocks. Basically, during the deduplication process, unique data blocks, or byte patterns, are identified and written to the storage array after being analyzed. While such analysis is a continuous process, other data blocks are processed and compared to the initially stored patterns. If a match is found, instead of storing a data block, the system stores a little reference to the original data block. In case of small environments, this is not crucial mostly, yet for those with dozens or hundreds of VMs, the same patterns can be met numerous times. Thus, due to the advanced algorithms used, data deduplication allows storing more information on the same physical storage volume compared to traditional data storage methods. This can be achieved in several ways, one of which is StarWind LSFS (Log Structured File System), which offers inline deduplication of data on LSFS-powered virtual storage devices.