h264parse: Fixate to upstream format when possible
Submitted by Edward Hervey
If downstream can support either stream-format, we should avoid fixating
to a stream-format different from upstream.
The string field fixation is only done on the first structure since we
already truncated the caps earlier.
Avoids ending up in (sub-optimal) usage where we would get byte-stream/nal
in input, downstream can support both byte-stream and avc but needs au
alignment ... and we would end up fixating to avc/au (instead of the more