mesa issueshttps://gitlab.freedesktop.org/lima/mesa/-/issues2020-03-27T02:13:36Zhttps://gitlab.freedesktop.org/lima/mesa/-/issues/137Optimization: discontinuous VS/PLBU command buffer2020-03-27T02:13:36ZQiang YuOptimization: discontinuous VS/PLBU command bufferCurrently we build VS/PLBU command buffer in a dynamic array, then copy it to GPU buffer before submit. But VS/PLBU has continue command which can be used to create separate command buffer as needed which saves the copy.
Here is the ste...Currently we build VS/PLBU command buffer in a dynamic array, then copy it to GPU buffer before submit. But VS/PLBU has continue command which can be used to create separate command buffer as needed which saves the copy.
Here is the steps:
1. create a GPU bo to hold VS/PLBU commands generated from the beginning
2. when it's full, create a new one and point the previous bo to it with continue command
From some experiments, I found:
1. the next bo's va must be bigger than current one, so it's jump forward not backward
2. vs/plbu_cmd_start/end is set to the first and last command by va, as bo's va is incremental, so this is also a range of the VS/PLBU command buffer, and there is some hole in this range
Record here, some one may continue the work before I have time.https://gitlab.freedesktop.org/lima/mesa/-/issues/134Move issues from here to mesa/mesa2020-02-18T01:49:58ZErico NunesMove issues from here to mesa/mesaWe now have 2 places where mesa issues are being reported, here and https://gitlab.freedesktop.org/mesa/mesa/issues .
I think mesa/mesa is preferred, so mesa issues here should be discouraged.
I already proposed an update to the status p...We now have 2 places where mesa issues are being reported, here and https://gitlab.freedesktop.org/mesa/mesa/issues .
I think mesa/mesa is preferred, so mesa issues here should be discouraged.
I already proposed an update to the status page assuming this.
I wonder if there is a way to disable further issues here?
Or maybe a way to import the remaining ones to mesa/mesa and disable the Issues section here completely?
I guess other projects in the lima group (such as linux) are a separate discussion, so just to be clear I'm mostly concerned about mesa here.https://gitlab.freedesktop.org/lima/mesa/-/issues/130best/fastest way how to render video2020-01-28T14:57:16ZMichal Lazobest/fastest way how to render videoHi
I would like to ask lima devs what is to best way how to pass and render video.
I have amlogic board with VPU can decode video and as output we can have DMA_buf in some formats.
I think that mali must support some formats that shou...Hi
I would like to ask lima devs what is to best way how to pass and render video.
I have amlogic board with VPU can decode video and as output we can have DMA_buf in some formats.
I think that mali must support some formats that should be straightforward for rendering(in case of lowest power and best performance).
I know that for example Qt/gstreamer can support multiple planes dma_bufs, NV12 or maybe some other.
So what do you suggest as best solution?https://gitlab.freedesktop.org/lima/mesa/-/issues/129Index draw command stream optimization2020-04-21T01:12:33ZQiang YuIndex draw command stream optimizationCurrent index draw command stream is not efficient, for example with index array [0, 1000, 2], it needs 1001 VS execution and 1001 varying output space.
But some dump results shows we can cut off this overhead with optimizations in comm...Current index draw command stream is not efficient, for example with index array [0, 1000, 2], it needs 1001 VS execution and 1001 varying output space.
But some dump results shows we can cut off this overhead with optimizations in command stream:
```
/* ============ VS CMD STREAM BEGIN ============= */
/* 0x10010400 (0x00000000) */ 0x10018300 0x30030000 /* UNIFORMS_ADDRESS: address: 0x10018300, size: 48 */
/* 0x10010408 (0x00000008) */ 0x10000280 0x40050000 /* SHADER_ADDRESS: address: 0x10000280, size: 80 */
/* 0x10010410 (0x00000010) */ 0x00201000 0x10000040 /* SHADER_INFO: prefetch: disabled, size: 80 */
/* 0x10010418 (0x00000018) */ 0x00000000 0x10000042 /* VARYING_ATTRIBUTE_COUNT: nr_vary: 1, nr_attr: 1 */
/* 0x10010420 (0x00000020) */ 0x00000003 0x10000041 /* UNKNOWN_1 */
/* 0x10010428 (0x00000028) */ 0x10018340 0x20020000 /* ATTRIBUTES_ADDRESS: address: 0x10018340, size: 1 */
/* 0x10010430 (0x00000030) */ 0x10018350 0x20020008 /* VARYINGS_ADDRESS: address: 0x10018350, size: 1 */
/* 0x10010438 (0x00000038) */ 0x03000001 0x00000000 /* DRAW: num: 3, index_draw: true */
/* 0x10010440 (0x00000040) */ 0x00000000 0x60000000 /* UNKNOWN_2 */
/* 0x10010448 (0x00000048) */ 0x10018360 0x20020000 /* ATTRIBUTES_ADDRESS: address: 0x10018360, size: 1 */
/* 0x10010450 (0x00000050) */ 0x10018370 0x20020008 /* VARYINGS_ADDRESS: address: 0x10018370, size: 1 */
/* 0x10010458 (0x00000058) */ 0x01000001 0x00000000 /* DRAW: num: 1, index_draw: true */
/* 0x10010460 (0x00000060) */ 0x00000000 0x60000000 /* UNKNOWN_2 */
/* 0x10010468 (0x00000068) */ 0x00018000 0x50000000 /* SEMAPHORE_END: index_draw enabled */
/* ============ VS CMD STREAM END =============== */
```
[log.index-1000](/uploads/b76a68f585d05996eb939ab9250aaed2/log.index-1000)https://gitlab.freedesktop.org/lima/mesa/-/issues/100Implement 3D textures2019-08-19T05:35:00ZVasily KhoruzhickImplement 3D textures@alyssa suggested that unknown_3_1/unknown_3_2 fields can be depth of texture - it's set to 1 for 2D textures.
We still need to figure out format of instruction, I'd guess it's just different sampler type, not 0 (for 2D) or 31 (for cube...@alyssa suggested that unknown_3_1/unknown_3_2 fields can be depth of texture - it's set to 1 for 2D textures.
We still need to figure out format of instruction, I'd guess it's just different sampler type, not 0 (for 2D) or 31 (for cube).
Unfortunately blob doesn't support 3D textures (GL_OES_texture_3D is not exposed) so it has to be REd by trial and error.https://gitlab.freedesktop.org/lima/mesa/-/issues/94GP complex instruction results cannot be spilled/moved2019-07-30T21:12:04ZConnor AbbottGP complex instruction results cannot be spilled/movedI couldn't get exp2/log2 to work, so I started to reverse-engineer a little bit what these magic complex opcodes are doing.
Overall, it's actually quite similar to what's described [here](https://github.com/envytools/envytools/blob/0d9...I couldn't get exp2/log2 to work, so I started to reverse-engineer a little bit what these magic complex opcodes are doing.
Overall, it's actually quite similar to what's described [here](https://github.com/envytools/envytools/blob/0d91b8bcef3ceb47ff0b114025d301edb790d472/nvhw/sfu_tab.c) for nvidia. `complex2` multiplies the inputs, combined with adding a strange offset sometimes (I coudln't figure out why), so with the way the blob uses it it's effectively squaring the input. Each of the complex opcodes lookup polynomial coefficients in a different table, and `complex1` computes the rest of the polynomial and does the output exponent correction. I suspect that the table entries are more than 32 bits, and that the two different `complex1` sources actually receive two different parts of the table entry. `preexp2` and `postlog2` convert to/from a fixed-point format which makes doing the exponent correction easier (again similar to nvidia). I suspect there are similar shenanigans going on with `preexp2` since in my tests it sometimes would return identical values for two different inputs, hence probably different uses of `preexp2` are getting different values to compensate for 32 bits not being enough. I haven't gotten the details nailed down, but I don't think we really have to.
Now, from this description, it should be clear that `preexp2` and the table-lookup opcodes are doing something quite weird. There's the further issue that `complex1` produces something that isn't supposed to be interpreted as a floating-point value in log2 mode, it's a fixed-point value that's supposed to be post-processed by `postlog2`. So sometimes it produces what would be an "invalid" floating-point value that would never be produced otherwise, i.e. either a denormalized value or a NaN with a non-standard payload. These get flushed to 0 and the standard NaN respectively when you try to do anything floating-point-y, and since a move in the add or mul slots is just adding -0 or multiplying by 1 respectively, a move between `complex1` and `postlog2` will break things. And of course, the same issue exists with a move between `preexp2` and anything, and a LUT opcode and anything. And `preexp2` and LUT opcodes are already magically producing multiple values anyways.
So, there are a few nodes we absolutely can't insert a move after:
- `preexp2`
- `*_impl`
- `complex1` when consumed by `postlog2`
Technically we can for `complex2`, but since `complex2` sometimes has `preexp2` as a source it sometimes has to be scheduled right before `complex1`. **All in all, we almost always have to make sure that these instructions occur in the same exact sequence they do in the blob.**
Some of these nodes we can easily guarantee to succeed if we schedule them first, namely `preexp2` (it's always a max node when scheduled that doesn't increase register pressure) and `*_impl` (it's in the complex slot, hence unaffected by max-node reservations). We're not so lucky with `complex2`, but I think we can add some extra reservation logic so that when we schedule `complex1` we reserve an extra next-max slot to be used by `complex2`. The biggest problem is guaranteeing `complex1` can succeed, which seems quite difficult. Maybe a better way would be to first try to schedule it, and then if it doesn't succeed, turn the `postlog2` into a move, put `postlog2` back on the ready list, and carry on to try again.https://gitlab.freedesktop.org/lima/mesa/-/issues/71PLBU commandstream analysis2020-01-04T21:42:17ZPriit LaesPLBU commandstream analysisBased on some analysis of bunch of [PLBU dumps](https://gist.github.com/plaes/e51001ff94e0f73f8554bbd137297344), I have noticed some patterns that might help.
Firstly, the opcodes seem to consist of two 32-bit integers. Now the second i...Based on some analysis of bunch of [PLBU dumps](https://gist.github.com/plaes/e51001ff94e0f73f8554bbd137297344), I have noticed some patterns that might help.
Firstly, the opcodes seem to consist of two 32-bit integers. Now the second integer's 4 top bits seems to specify the opcode of the operation, giving possible 16 operations:
* 0x0 - OP_DRAW
* 0x1 - OP_WRITE_REG ?? Write constant to register
* 0x2 - OP_ARRAY_ADDRESS
* 0x3 - OP_BLOCK_STRIDE
* 0x5 - OP_END_CMDSTREAM
* 0x6 - OP_SEMAPHORE
* 0x7 - OP_SCISSOR
* 0x8 - OP_RSW_VERTEX_ARRAY
* 0x9 - ??? (Similar to OP_WRITE_REG)
* 0xA - ??? (Similar to OP_WRITE_REG), always followed by 0xD operation
* 0xD - ??? unknown, always follow list of 0xA operations
* 0xF - ??? CMD_CONTINUE in luc's limadriver headers
I have not seen 0x4, 0xB, 0xC, 0xE and 0xF yet, though.
`0x1`, `0x9` and `0xa` seem to be really similar, with command format as `0xX00001??`, which means that it could be related to some kind of register/memory/store access.
Also, based on certain calls, it seems 0x9 and 0xa seem to target same memory locations, as the arguments are the same (from the `clear.single_buffer.second_draw.no_glClear` or `scissor.frame2` dumps:
```
('0x90000103', '0x10000080')
('0x90000104', '0x10000084')
('0x90000107', '0x10000088')
('0x90000108', '0x1000008c')
('0x90000105', '0x10000090')
('0x90000106', '0x10000094')
('0xa0000103', '0x10000080')
('0xa0000104', '0x10000084')
('0xa0000107', '0x10000088')
('0xa0000108', '0x1000008c')
('0xa0000105', '0x10000090')
('0xa0000106', '0x10000094')
```
Also, analysing the `scissor.frame1` and `scissor.frame2` dumps, I noticed following:
```
frame1:
...
('0xa0000103', '0x10000080')
('0xa0000104', '0x10000084')
('0xa0000107', '0x10000088')
('0xa0000108', '0x1000008c')
('0xa0000105', '0x10000090')
('0xa0000106', '0x10000094')
('0xd0000000', '0x00000000')
frame2:
('0x90000103', '0x10000080')
('0x90000104', '0x10000084')
('0x90000107', '0x10000088')
('0x90000108', '0x1000008c')
('0x90000105', '0x10000090')
('0x90000106', '0x10000094')
...
('0xa0000103', '0x10000080')
('0xa0000104', '0x10000084')
('0xa0000107', '0x10000088')
('0xa0000108', '0x1000008c')
('0xa0000105', '0x10000090')
('0xa0000106', '0x10000094')
('0xd0000000', '0x00000000')
```
Based on this data, it seems that 0xa is a store operation to GPU memory which could then mean that 0xd is flush or sync command, and 0x9 is load from memory.