Investigate hash-based change detection for long unchanged runs in a buffer
Constructing buffer diffs appears to be memory bandwidth limited when wider SIMD registers are used. When the two buffers being compared are identical, bandwidth is split between two streams of data being read at the same rate; when they differ, two streams are read, and two are written. In theory, one could reduce the memory traffic in the first case to a single stream of data, which is read, hashed, and then compared with a hash computed during the last diff cycle. For example, using 32B hashes to summarize 4KB blocks, instead of reading 8KB to data to determine that nothing has changed, one could read 4.03KB of data. The downsides of hash-based change detection are that it requires extra work to compute hashes in any case, and that incorrectly speculating that a 4KB block is unchanged requires that the block be retried with the standard diff method.