solve the fence2k problem
With enough uptime, a uint32_t
fence value will wrap around eventually. We need to audit/fix fence comparisons to deal with this.
bonus question: I wonder if CP deals with this properly? I guess in worst case we get one gpu hang per wraparound.