x86 ubsan fixes
Fixes some undefined behavior in the x86 assembly code. I've split these off from the generic fixes because there might be a small performance impact from these, though in very light benchmarking it looks to be within about 1%. Even though we perform the unaligned loads as memcpy/memmove gcc (like you'd hope) does not actually emit those as library calls, so any differences should really be down to register scheduling.