gcc10 can effectively emit single precision registers if right operand modifier constraint is not in use
This results in assembler rejecting the code
/tmp/ccEG4QpI.s:646: Error: VFP/Neon double precision register expected -- 'vtbl.8 d3,{d0,d1},s8'
/tmp/ccEG4QpI.s:678: Error: invalid instruction shape -- 'vmul.f32 d0,d0,s8'
Therefore add %P qualifier to request double registers sinece 'w' could mean variable could be stored in s0..s14 and GCC defaults to printing out s0..s14. Note those registers map to d0..d7 also.
Output generated is exactly same with gcc9, and it also now compiles with gcc10
Its not documented well in gcc docs and there is a ticket for that https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84343
Signed-off-by: Khem Raj raj.khem@gmail.com