div255 implementation is incorrect
Submitted by Jan Schmidt
Created attachment 299452
test div255 app
div255 implements this algorithm everywhere:
d = (s + 128 + (s+128)>>8) >> 8
which produces a result that is off-by-one for roughly half the 0..65535 input range.
A correct implementation is:
d = (s + 1 + (s >> 8)) >> 8
Test python app, and a fix which implements the new algorithm attached.
Attachment 299452, "test div255 app":