Optimization: discontinuous VS/PLBU command buffer

Currently we build VS/PLBU command buffer in a dynamic array, then copy it to GPU buffer before submit. But VS/PLBU has continue command which can be used to create separate command buffer as needed which saves the copy.

Here is the steps:

create a GPU bo to hold VS/PLBU commands generated from the beginning
when it's full, create a new one and point the previous bo to it with continue command

From some experiments, I found:

the next bo's va must be bigger than current one, so it's jump forward not backward
vs/plbu_cmd_start/end is set to the first and last command by va, as bo's va is incremental, so this is also a range of the VS/PLBU command buffer, and there is some hole in this range

Record here, some one may continue the work before I have time.