DSC outputs garbage over DSI on SM8[12]50 boards

changed the description

mentioned in merge request !36 (merged)

changed the description

I've reuploaded the dumps with DSC HW blocks. Here's that chunk from griffin (blocks 2/3 omitted because they're unused and all zeroes):

====================dsc_1================
0x0 : 00000003 003080ca 04480a00 02240020
0x10 : 02240000 02130200 00000020 00000007
0x20 : 0000031b 0000000c 0319031b 180010e0
0x30 : 00000983 00002000 000cd6b6 0000000e
0x40 : 0000001c 0000002a 00000038 00000046
0x50 : 00000054 00000062 00000069 00000070
0x60 : 00000077 00000079 0000007b 0000007d
0x70 : 0000007e 00000000 00000000 00000001
0x80 : 00000001 00000003 00000003 00000003
0x90 : 00000003 00000003 00000003 00000005
0xa0 : 00000005 00000005 00000007 0000000c
0xb0 : 00000004 00000004 00000005 00000006
0xc0 : 00000007 00000007 00000007 00000008
0xd0 : 00000009 0000000a 0000000b 0000000c
0xe0 : 0000000d 0000000d 0000000f 00000002
0xf0 : 00000000 00000000 0000003e 0000003c
0x100 : 0000003a 00000038 00000038 00000038
0x110 : 00000036 00000036 00000034 00000034
0x120 : 00000034 00000034 00000018 00000000
0x130 : 00000000 80000000 00000000 00000000
====================dsc_0================
0x0 : 00000003 003080ca 04480a00 02240020
0x10 : 02240000 02130200 00000020 00000007
0x20 : 0000031b 0000000c 0319031b 180010e0
0x30 : 00000983 00002000 000cd6b6 0000000e
0x40 : 0000001c 0000002a 00000038 00000046
0x50 : 00000054 00000062 00000069 00000070
0x60 : 00000077 00000079 0000007b 0000007d
0x70 : 0000007e 00000000 00000000 00000001
0x80 : 00000001 00000003 00000003 00000003
0x90 : 00000003 00000003 00000003 00000005
0xa0 : 00000005 00000005 00000007 0000000c
0xb0 : 00000004 00000004 00000005 00000006
0xc0 : 00000007 00000007 00000007 00000008
0xd0 : 00000009 0000000a 0000000b 0000000c
0xe0 : 0000000d 0000000d 0000000f 00000002
0xf0 : 00000000 00000000 0000003e 0000003c
0x100 : 0000003a 00000038 00000038 00000038
0x110 : 00000036 00000036 00000034 00000034
0x120 : 00000034 00000034 00000018 00000000
0x130 : 00000000 80000000 00000000 00000000

And PDX206:

====================dsc_1================
0x0 : 00000003 003880ca 043809d8 021c001e
0x10 : 021c0000 020e0200 00000020 00000007
0x20 : 000002e3 0000000c 03500364 180010f0
0x30 : 00000983 00002000 000cd6b6 0000000e
0x40 : 0000001c 0000002a 00000038 00000046
0x50 : 00000054 00000062 00000069 00000070
0x60 : 00000077 00000079 0000007b 0000007d
0x70 : 0000007e 00000000 00000000 00000001
0x80 : 00000001 00000003 00000003 00000003
0x90 : 00000003 00000003 00000003 00000005
0xa0 : 00000005 00000005 00000007 0000000c
0xb0 : 00000004 00000004 00000005 00000006
0xc0 : 00000007 00000007 00000007 00000008
0xd0 : 00000009 0000000a 0000000b 0000000c
0xe0 : 0000000d 0000000d 0000000f 00000002
0xf0 : 00000000 00000000 0000003e 0000003c
0x100 : 0000003a 00000038 00000038 00000038
0x110 : 00000036 00000036 00000034 00000034
0x120 : 00000034 00000034 00000018 00000000
0x130 : 00000000 80000000 00000000 00000000
====================dsc_0================
0x0 : 00000003 003880ca 043809d8 021c001e
0x10 : 021c0000 020e0200 00000020 00000007
0x20 : 000002e3 0000000c 03500364 180010f0
0x30 : 00000983 00002000 000cd6b6 0000000e
0x40 : 0000001c 0000002a 00000038 00000046
0x50 : 00000054 00000062 00000069 00000070
0x60 : 00000077 00000079 0000007b 0000007d
0x70 : 0000007e 00000000 00000000 00000001
0x80 : 00000001 00000003 00000003 00000003
0x90 : 00000003 00000003 00000003 00000005
0xa0 : 00000005 00000005 00000007 0000000c
0xb0 : 00000004 00000004 00000005 00000006
0xc0 : 00000007 00000007 00000007 00000008
0xd0 : 00000009 0000000a 0000000b 0000000c
0xe0 : 0000000d 0000000d 0000000f 00000002
0xf0 : 00000000 00000000 0000003e 0000003c
0x100 : 0000003a 00000038 00000038 00000038
0x110 : 00000036 00000036 00000034 00000034
0x120 : 00000034 00000034 00000018 00000000
0x130 : 00000000 80000000 00000000 00000000

And submitted the addition of these DSC blocks to the display register snapshot in https://lore.kernel.org/linux-arm-msm/27c57fa8-8fcc-1339-8290-547ecceeb2f0@linaro.org/T/#t!

Thanks @MarijnS95 , I have acked that change.

Thanks

Abhinav

Hey, unable to assign myself but I can take a look into this once I bring up DSC on SM8350.

Thanks,

Jessica Zhang

Thanks! I can also take a look myself if someone gives me hints on the magic and unnecessarily-private status registers; I may simply be missing something stupid or the DSI clocks/timings may be totally off.

assigned to @jesszhan

bump!

We will finalize the DSC helper patches which are on the list first and then re-test this as that fixes the math and calculations. Hence for the bug, lets wait till we have R-b on the foll series first:

All 3 are pretty close to the finish line.

Thanks

Abhinav

Those math "fixes" don't have any positive effect on the devices listed above, and regress on SDM845 unfortunately.

Corruption is now fixed on SM8150 (Sony Xperia 1)

I managed to fix the corruption on the Sony Xperia 1, and bisected it down to a few patches and fixes:

~~drm/msm/dpu: separate DSC flush update out of interface~~ EDIT: I found this to be a simple hidden bugfix adding the missing BIT() around DSC_IDX when writing to CTL_FLUSH: https://lore.kernel.org/linux-arm-msm/r5rbbaz5gms5d2wdheuvqoij4ld5qiyz2kxrjjqkpyzy4v2zdq@44q2zgkrxpgt/;
Replace hdisplay /= 3; with DIV_ROUND_UP(hdisplay, 3) in (original DSC series) drm/msm/dsi: Add support for DSC configuration;
- Same fix needs to be applied to drm/msm/dsi: update hdisplay calculation for dsi_timing_setup, which returns a value identical to hdisplay anyway;
Now the whole screen is updating but fully corrupted, for which we need the following fix: drm/msm/dpu: Fix slice_last_group_size calculation.

I will share my review and Tested-by on these patches soon to expedite them. There is no regression Sony Xperia XZ3 (Akatsuki, sdm845).

There are however a few things I have learned and would like to share to get a better grip on the whole DSC series:

`dsc->slice_count` is not `slice_per_pkt`

I already mentioned this in https://lore.kernel.org/linux-arm-msm/zuviqzvqnw5cgn3mg7ha55kqggmlxageijryoyv7q4eh3d2h6t@lhegcj7ixd4z/, but the original DSC series and followup fixes got this totally wrong. Unfortunately the fixes are spread across the multiple DSC series in flux right now, and don't even cover all of it.

What's worse, the patches bluntly remove dsc->slice_count for being wrong and pretend slice_per_pkt = 1 whereas the panels discussed in this issue both have slice_per_pkt = 2. Both those panels have a fortunate coincidence that dsc->slice_count == slice_per_pkt == 2, so I cannot take those fixes as they break it again. EDIT: Testing points out that, after pulling in the rest of these fixes, all these panels can operate on slice_per_pkt=1. I assume having 2 slices per pkt was simply found to be better / more efficient, hence it's set to 2 in DT?

Instead, I propose to create a separate series that:

Collects all the DSC fixes specfic to slice_count together, instead of scattering them across multiple series;
Gets rid of faulty dsc->slice_count usage everywhere;
~~Introduces slice_per_pkt handling, maybe together with the "fixes" (i.e. replace the parameter) for proper atomicity~~ EDIT: may not be needed for now.;
Additionally there should be some validation that slice_count * slice_width == hdisplay.

I am willing to prepare and send that series (taking original authorship/reported-by's) as I have the ability to test on slice_per_pkt = 2.

What is the right pclk/byteclk?

While drm/msm/dsi: Adjust pclk rate for compression nicely reduces (title suggestion!) the clocks on the Xperia XZ3, it results in not being able to reach 60Hz on the Xperia 1 and has noticeable corruption at the top of the screen. Without that patch, at least in 2.5k mode, the panel reaches its full 60Hz potential. I have yet to work out the right timing for its 4k mode.

There have been mentions of an "(in?)efficiency factor" downstream to adjust the clocks rather than manually taking porches into account, as cmdmode doesn't use porches.

We have a similar issue on a "Samsung SOFEF01" panel (no DSC, just cmdmode): it works fine at 60Hz on The Sony Xperia 10II (sm6125) but resulted in the same 30 fps break on the Sony Xperia 5 (sibling of the Xperia 1, same board on SM8150). I bumped the porches (copied from another panel) simply to get the clocks to run higher and reach 60Hz, but that isn't a proper fix: https://github.com/SoMainline/linux/commit/085ecdc32a04d471f10c6721e8ba07b3cf9947a8

I will reply to that patch with my findings and see where we can go next.

Hi @MarijnS95

Thanks for the summary and tests and also your support.

I am fine with replacing hdisplay /= 3; with DIV_ROUND_UP(hdisplay, 3). We can fix that up in the original patch itself.

So are you saying that your display is broken only after that?

It seems like from your explanation, your display was always working with slice_per_pkt as 2 and making slice_count as 1. That was always incorrectly working then so its not so much as a fix in the display driver code but unfortunately wrong interpretation of those terms in the panel driver because you did not have a proper reference to use while making your panel driver changes.

So, I would not really say that Fixes are all over the place but the definition of "what is the fix and where its being fixed is important". If the panel driver was working with incorrect parameters so far, we cannot really say that the DPU driver was wrong all this while but rather an incorrect interpretation of the DSC parameters by the panel driver but ofcourse, the original series needs to take some of the blame for it for not making the distinction of those parameters clear and myself need to take blame for acking that fix of yours alone without looking into what was programmed in your panel driver.

I noted in the series that you referred to "slice_per_pkt" as a QC hardware specific thing. This is not true. As I wrote in the email, please refer to Figure 40 One Line Containing One Packet with Data from One or More Compressed Slices and Figure 41 One Line Containing More than One Compressed Pixel Stream Packet of the DSI 1.3 spec. It explains this in detail. So this should not be treated or thought of as a QC specific term or concept.

So I think lets not account for slice_per_pkt in the existing series and if all the panel drivers are working even with slice_per_pkt as 1 and the correct slice_count let them continue to work that way.

Once slice_per_pkt support is properly added, we will fix up all the math wherever relevant in the same series. We will check if any of our panels support slice_per_pkt as 2 so that we don't have to depend on your validation for that.

Let me know if you have any concerns with that.

Regarding pixel clock, I had pointed out to jessica that her math was slightly wrong.

The fps being computed is incorrect

int fps = DIV_ROUND_UP(pclk_rate, mode->htotal * mode->vtotal);

This uses the reduced h_active to compute the fps again but thats wrong. The fps should still be computed with drm_mode_refresh() with the full h_total. Let me know if that fixes the issue for you and we will upload that as well in the existing series.

Thanks

Abhinav

I am fine with replacing hdisplay /= 3; with DIV_ROUND_UP(hdisplay, 3). We can fix that up in the original patch itself.

Shall I send a patch before or after drm/msm/dsi: update hdisplay calculation for dsi_timing_setup? I believe this change should get proper attribution instead of being squashed as a drive-by fix into that.

So are you saying that your display is broken only after that?

No. I am saying that this change makes the corruption look different (whole panel updates, instead of just the top as shown in the video). Then, when the third bullet-point is applied, everything is fixed.

It seems like from your explanation, your display was always working with slice_per_pkt as 2 and making slice_count as 1.

Not entirely, allow me to re-specify. But first, note again that the patches to get the Xperia 1 and Xperia 5 II (phones used in the original report) are unrelated to slice_count! I just happened to find this discrepancy along the way.

On the Sony Xperia XZ3, we erroneously set slice_count = 1 because slice-per-pkt downstream is 1. It should have been 2, because of hdisplay / slice_width = 2 = slice_count, but it worked nevertheless. I even wrote in https://lore.kernel.org/linux-arm-msm/20221221231943.1961117-3-marijn.suijten@somainline.org/ (the patch that was "wrong", though not as wrong as the original) that both were miraculously working.

That was always incorrectly working then

Exactly, it was incorrectly working on the Xperia 1 because there slice_count == slice_per_pkt == 2. However, if all broken DPU/DSI code is removed that pretends dsc->slice_count is slice_per_pkt, my Xperia 1 works too (with Jessica's patch we're assuming slice_per_pkt = 1 for now).

so its not so much as a fix in the display driver code but unfortunately wrong interpretation of those terms in the panel driver because you did not have a proper reference to use while making your panel driver changes.

The slice_count = 1 -> 2 is definitely a fix in my panel code. However, the wrong use of dsc->slice_count upstream where slice_per_pkt should have been used is definitely driver code. Your team already sent two fixes (and more need to come!) so I wouldn't say "its not so much as a fix in the display driver code"...

We should bundle these in the same series and make the fix complete. I can help with that if you want.

So, I would not really say that Fixes are all over the place but the definition of "what is the fix and where its being fixed is important".

They are spread across two series and missing a few other wrong uses of dsc->slice_count.

If the panel driver was working with incorrect parameters so far, we cannot really say that the DPU driver was wrong all this while but rather an incorrect interpretation of the DSC parameters by the panel driver but ofcourse, the original series needs to take some of the blame for it for not making the distinction of those parameters clear

Same as above.

and myself need to take blame for acking that fix of yours alone without looking into what was programmed in your panel driver.

For the record, all of the original DSC patches were doing this wrong. My patch was just one extra case on top that was assuming slice_per_pkt downstream was equivalent to dsc->slice_count upstream. It has no relevance to what my panel driver was doing, everyone was just making the wrong assumption that slice_per_pkt == dsc->slice_count during porting.

I noted in the series that you referred to "slice_per_pkt" as a QC hardware specific thing. This is not true. As I wrote in the email, please refer to Figure 40 One Line Containing One Packet with Data from One or More Compressed Slices and Figure 41 One Line Containing More than One Compressed Pixel Stream Packet of the DSI 1.3 spec. It explains this in detail. So this should not be treated or thought of as a QC specific term or concept.

Thanks, I'll take a closer look at the DSI spec.

So I think lets not account for slice_per_pkt in the existing series and if all the panel drivers are working even with slice_per_pkt as 1 and the correct slice_count let them continue to work that way.

Sounds good, but we should first fix/remove all wrong uses of dsc->slice_count (and add some extra validation to it as suggested) before I'll review/ack the patches, as said above.

Once slice_per_pkt support is properly added, we will fix up all the math wherever relevant in the same series. We will check if any of our panels support slice_per_pkt as 2 so that we don't have to depend on your validation for that.

Sounds good, though having more validation is always better (my panel seems to accept both, so I don't know how valid any of the testing is).

Regarding pixel clock, I had pointed out to jessica that her math was slightly wrong.

The fps being computed is incorrect

int fps = DIV_ROUND_UP(pclk_rate, mode->htotal * mode->vtotal);

This uses the reduced h_active to compute the fps again but thats wrong. The fps should still be computed with drm_mode_refresh() with the full h_total. Let me know if that fixes the issue for you and we will upload that as well in the existing series.

I'll make some changes to the pclk calculation and get back to you. In the meantime, I think we should base that patch on top of https://lore.kernel.org/linux-arm-msm/20230118130031.2345941-1-dmitry.baryshkov@linaro.org/.

Changing the code to drm_mode_refresh() makes no difference: both result in 60 fps.

Downstream includes the bitclk value in DT and uses that to calculate pclk and the byte clock:

https://github.com/sonyxperiadev/kernel/blob/35133b2631cca25a61d44d6c5eb82d11c23b7909/arch/arm64/boot/dts/somc/dsi-panel-griffin.dtsi#L185

TL;DR (kidding) Several questions:

Are there any panels that require slices_per_pkt > 1?
If we ignore slices_per_pkt for now and later want to introduce it. Is it going to be a localized change, or is it going to be spread all over the DSI-DSC code?

Hi @MarijnS95,

I saw your comment on the drm/msm/dsi: Adjust pclk rate for compression patch:

Note that I cannot get the 4k mode working at 60Hz on one of my panels (30Hz works with minor corruption), regardless of this patch.

So, just to check my understanding, Sony Xperia XZ3 is working with the changes you mentioned above. But the pclk reduction is exposing an issue with the Sony Xperia 1?

I'm a little confused if you mean that the patch is causing the regression or just exposing the issue, since the wording in your previous comment in the MR says:

drm/msm/dsi: Adjust pclk rate for compression... results in not being able to reach 60Hz on the Xperia 1

But the comment on the patch says that the issue happens "regardless of this patch"

@jesszhan the device (Xperia 1) and panel in question has two modes:

4k mode: Always stuck at 30 fps and slight corruption, with or without your patch;
2.5k mode: Stuck at 30 fps and slight corruption with your adjust pclk rate for compression patch, after dropping it the panel runs at 60fps without corruption.

The downstream timings can be seen here: https://github.com/sonyxperiadev/kernel/blob/35133b2631cca25a61d44d6c5eb82d11c23b7909/arch/arm64/boot/dts/somc/dsi-panel-griffin.dtsi

Hi @MarijnS95,

I've discussed this with Abhinav and we think this pclk issue is due the upstream not having the concept of transfer time for command mode panels.

Currently in upstream, we don't distinguish command vs video mode as command mode doesn't have the concept of porches like you mentioned above.

However, in downstream, command mode takes transfer time into consideration when calculating pixel and mdss clock rates. The transfer time adjustment is defined in the device tree downstream (here is an example DTSI), which would not fly upstream.

If we want to add upstream support for transfer time, we can try to calculate this transfer time value by doing transfer_time = 1 / vrefresh_rate then adding that into the pclk_rate calculation. However, to test that change, I will have to find a way to measure FPS on my test device (which I will look into... would appreciate any suggestions).

FWIW, the current math is correct for DSC. This pclk calculation issue seems to be specifically related to command mode panels not taking transfer time into account.

(Also, feel free to keep this bug open to track this pclk issue.)

Thanks,

Jessica Zhang

@jesszhan I'd say,kmscube provides a good enough FPS measurement.

Hi @lumag, I was able to get a good FPS reading with kmscube. Thanks for the suggestion!

Small update with v3, because the above seems to be quite confusing. This is on the Sony Xperia 1, while testing with modetest -M msm -s 32:#0 -v and including the output of /d/clk/disp_cc_mdss_pclk0_clk/clk_rate:

Resolution:	1096x2560	1096x2560 pclk	1644x3840	1644x3840 pclk
With pclk patch	Stuck at 30fps with light corruption	67907520	Stuck at 30fps with light corruption	144668140
Without pclk patch	Stable 60fps without corruption	181085987	Only white noise/garbage visible ¹	398763990
Downstream `qcom,mdss-dsi-panel-clockrate`	976285440		976285440

It's been said that transfer time might not been taken into account yet, so I'd like to investigate how much of an impact that has on these DSC calculations and if we can make them work properly from the get-go.

modetest reports 60Hz refreshing, confirmed by dsi_isr raising at 60Hz in /proc/interrutps

Hi @MarijnS95,

Can you grab the pclk value with and without the patch from the debugfs for reference?

Thanks,

Jessica Zhang

Thanks for checking this out @jesszhan: the pclk values from /d/clk/disp_cc_mdss_pclk0_clk/clk_rate have been added to the table above.

Hey @MarijnS95,

We calculated the expected clock rate (for 1096x2560) with and without DSC enabled based on hdisplay information given in the DTSI, and it seems that with DSC enabled, the correct pclk rate should be 67.8 MHz, which is close to the pclk rate you reported above. We also weren't sure why, for a command mode panel, the reduced pclk rate is affecting the FPS.

Just wondering, what rate is your tearcheck interrupt firing at? Can you add a log in dpu_encoder_phys_cmd_te_rd_ptr_irq() and let me know? It might be possible that the tearcheck irq could be throttling the FPS.

Finally, can you elaborate on what you meant by this note (ex. which dsi isr specifically are you referring to):

modetest reports 60Hz refreshing, confirmed by dsi_isr raising at 60Hz in /proc/interrutps

Thanks,

Jessica Zhang

@jesszhan can you share the calculations that you used, so that everyone can validate this work?

Then I can also reproduce that for the 1644x3840 mode, which is also broken.

the correct pclk rate should be 67.8 MHz, which is close to the pclk rate you reported above

That is the rate I reported with your patch that breaks the panel, so this does not confirm that the calculation in your patch is correct.

Just wondering, what rate is your tearcheck interrupt firing at? Can you add a log in dpu_encoder_phys_cmd_te_rd_ptr_irq() and let me know? It might be possible that the tearcheck irq could be throttling the FPS.

Will let you know once I unshelve this device but in short: yes that is expected. If we are too slow sending data to the panel (pclk too low), the panel will wait another frame (AFAIK, instead of sending TE as soon as the whole frame has arrived) and hence trigger at exactly half the rate?

Finally, can you elaborate on what you meant by this note:

modetest reports 60Hz refreshing, confirmed by dsi_isr raising at 60Hz in /proc/interrutps

When in 1644x3840 mode there is only white garbage shown on the display, but the DSI host and panel seem to agree that the signal resembles 60FPS successfully (otherwise I expect 30Hz as said above).

ex. which dsi isr specifically are you referring to

https://github.com/torvalds/linux/blob/48b1320a674e1ff5de2fad8606bee38f724594dc/drivers/gpu/drm/msm/dsi/dsi_host.c#L1956-L1958

Hi @MarijnS95 ,

Here's the math I'm using:

frame_time_us = mult_frac(1000, 1000, (timing->refresh_rate));

// nslices = 2
nslices = (timing->h_active)/(dsc->slice_width);
/* (slice width x bit-per-pixel + packet overhead) x
 * number of slices x height x fps / lane
 */

// bits_per_line = ((548 * 8) + 0) * 2 = 8768
bits_per_line = ((dsc->slice_width * dsc->bpp) +
		packet_overhead) * nslices;

// bits_per_line = 8768 / 4 = 2192
bits_per_line = bits_per_line / (config->num_data_lanes);

// min_bitclk_hz = (2192 * 2560 * 60) = 336691200
min_bitclk_hz = (bits_per_line * timing->v_active * timing->refresh_rate);

...

// pixel_clk_khz = 336691200 * 4 = 1346764800
pixel_clk_khz = pclk_rate_hz * config->num_data_lanes;

// note: config->bpp value is taken from "qcom,mdss-dsi-bpp" entry in panel DTS
// pixel_clk_khz = 1346764800 / 24 = 56115200
do_div(pixel_clk_khz, config->bpp);
display_mode->pixel_clk_khz = pixel_clk_khz;

(Note: we are ignoring packet overhead and transfer time adjustments, as upstream doesn't have those concepts)

Here is a downstream reference for the calculation.

Thanks,

Jessica Zhang

[EDIT: Added comments detailing the pclk rate calculations for 1096x2560 case]

Hey @MarijnS95 ,

Edited my above comment to add the math for 1096x2560 case. The values I used were taken from the panel driver and DTSI. The downstream calculations don't take porches into account, so they will match the resulting pclk_rate if you remove the porches from the upstream math. Please let me know if you have any questions about it.

Can you also try applying this patch (which adds a bump_ratio factor to the pclk_rate adjustment) and see at what bump_ratio value the issue is fixed for you? I would recommend starting at bump_ratio = 105.

0001-DEBUG-Add-bump_ratio-to-dsi_adjust_compressed_pclk.patch

On top of this, can you also let me know what rate the TE rd_ptr_irq is firing at?

Thanks,

Jessica Zhang

Hi @jesszhan thanks for all this info: seems like newer/different downstream kernels include the math in clearer form than my old 4.14 codebase.

Replies will be fragmented and skipping parts as the big solution appears to have been DSI burst mode below. We still have a few points to figure out for properly calculating the pclk though.

Your current pclk patch only scales the hdisplay portion of htotal by "dividing" it by 3 (via hdisplay * bpp / (bits_per_component * 3)). That means the porches are currently left intact, which we can assume are used in place of packet_overhead (hardcoded to 56 in the kernel you linked).

Do you want to keep that as it is, or use the new calculation in dsi_host.c immediately?

That's where the 67.8MHz vs 67.9MHz discrepancy on the panel must come from in your comment above. However, per the code shared, pclk of 56.8MHz matches neither, can you explain?

Few other notes:

pixel_clk_khz must be in Hz otherwise the value you calculated is 56GHz;

In the last snippet of your calculation you seem to have used the value of min_bitclk_hz in place of pclk_rate_hz, skipping parts of the downstream kernel;
The way downstream calculates this makes absolutely no sense!

min_bitclk_hz = 336691200
dsi_transfer_time_us = 14000 # or DEFAULT_MDP_TRANSFER_TIME
frame_time_us = 16666
pclk_rate_hz = min_bitclk_hz * dsi_transfer_time_us / frame_time_us # 282820608
pixel_clk_hz = pclk_rate_hz * dsi_lanes / 24 = # 47136768

Even if we were to use downstream qcom,mdss-dsi-panel-clockrate the values don't come close:

min_bitclk_hz = 336691200
dsi_transfer_time_us = 14000 # or DEFAULT_MDP_TRANSFER_TIME
frame_time_us = 16666
clk_rate_hz = 976285440 # qcom,mdss-dsi-panel-clockrate
dsi_transfer_time_us = frame_time_us * min_bitclk_hz // clk_rate_hz
pclk_rate_hz = min_bitclk_hz * dsi_transfer_time_us / frame_time_us # 116102503
pixel_clk_hz = pclk_rate_hz * dsi_lanes / 24 = # 19350417

What is the difference between pclk_rate and pixel_clk?
With bits_per_line you're referring to the number of DSI lanes, right?

Finally, do you know how to really work with qcom,mdss-dsi-panel-clockrate, or should I just take clktree dumps on downstream to see how close we can get? Mostly to confirm that the mainline pclk goes as low as downstream instead of wasting power because of - for example - hardcoding unused porches in drm_display_mode.

Hi @MarijnS95 ,

Your current pclk patch only scales the hdisplay portion of htotal by "dividing" it by 3 (via hdisplay * bpp / (bits_per_component * 3)). That means the porches are currently left intact, which we can assume are used in place of packet_overhead (hardcoded to 56 in the kernel you linked).

Correct, we can use the inclusion of the porches as a way to make up for the fact that both packet_overhead and transfer_time are not taken into consideration in the upstream math.

Do you want to keep that as it is, or use the new calculation in dsi_host.c immediately?

I plan to keep the pclk calculations I have in the current pclk patch as is, since it matches the downstream math (sans porches).

That's where the 67.8MHz vs 67.9MHz discrepancy on the panel must come from in your comment above. However, per the code shared, pclk of 56.8MHz matches neither, can you explain?

The 56.8MHz is from calculating the pclk_rate without porches while the 67.8MHz was derived by calculating the pclk_rate with porches.

The 67.8MHz vs 67.9MHz discrepancy is due to rounding -- in my hand calculations, I used the exact value of hdisplay * bpp / (3 * bpc) while the code does a DIV_ROUND_UP().

pixel_clk_khz must be in Hz otherwise the value you calculated is 56GHz;

Yes, the final value I have in the snippets will be in Hz. Sorry if the variable name caused confusion -- I kept the *_khz suffix to make it easier to compare with the reference code, but left out the final conversion into khz that happens later in the code.

In the last snippet of your calculation you seem to have used the value of min_bitclk_hz in place of pclk_rate_hz, skipping parts of the downstream kernel

Yes, this is because we are skipping the transfer_time calculations since upstream also doesn't take transfer_time into consideration.

What is the difference between pclk_rate and pixel_clk

pclk_rate would be the clock per data lane (as the initial min_bitclk_hz has a divide by num_data_lanes in the calculation) while pixel_clk is just the clock value.

With bits_per_line you're referring to the number of DSI lanes, right?

The variable name seems to not include the "per lanes" aspect of the value, but yes the value itself measures the bits per line per DSI data lane.

Finally, do you know how to really work with qcom,mdss-dsi-panel-clockrate, or should I just take clktree dumps on downstream to see how close we can get?

qcom,mdss-dsi-panel-clockrate node was specifically for the RFI feature to override the clock rate calculations for a specific board/product. We currently skip that part of the calculation, as we don't support RFI in upstream.

You're free to take clk dumps on downstream and compare, but I think implementing something equivalent upstream would be out of the scope of this patch/series.

The 56.8MHz is from calculating the pclk_rate without porches while the 67.8MHz was derived by calculating the pclk_rate with porches.

10MHz, or ±20%, is quite a lot for these small porches. I don't get anywhere close when embedding downstream's packet_overhead=56 either, and neither value makes the panel happy.

Is transfer_time going to significantly bump the clocks then?

The variable name seems to not include the "per lanes" aspect of the value, but yes the value itself measures the bits per line per DSI data lane.

Never mind, it is indeed per line as you are calculating the number of bits per chunk here, and then multiplying that by the number of slices which equals one scanline width.

pclk_rate would be the clock per data lane (as the initial min_bitclk_hz has a divide by num_data_lanes in the calculation) while pixel_clk is just the clock value.

Yes it is clear that pclk_rate is multiplied by 4 again to construct pixel_clk (meaning that pclk_rate ticks for all lanes at once, but pixel_clk ticks once for every lane) but also that pclk_rate is still in bits whereas pixel_clk has been divided by the number of bytes, and strangely so the bpp in the original picture (RGB888=24), not the number of bytes in the DSC signal (bpp=8)?

It seems to just be an oddly named variable in the end since it is never actually used anywhere else once it has been used to obtain pixel_clk.

You're free to take clk dumps on downstream and compare, but I think implementing something equivalent upstream would be out of the scope of this patch/series.

No way should have have hardcoded clock values like that upstream, that is the inverse of what I'm asking/suggesting. Instead, there seems to be no trivial way to calculate pixel_clk out of qcom,mdss-dsi-panel-clockrate in the way downstream does it, meaning I'd have to boot downstream kernels and take clk dumps to figure out what it ended up with.

This value would only be used to compare and confirm how close upstream is on the various panels.

Is transfer_time going to significantly bump the clocks then?

Yes, the transfer time makes up for this -- I'll include the transfer time calculations here:

// frame_time_us = 1000 * 1000 / 60 = 16666.6666
frame_time_us = mult_frac(1000, 1000, (timing->refresh_rate));

// nslices = 2
nslices = (timing->h_active)/(dsc->slice_width);

// bits_per_line = ((548 * 8) + 56) * 2 = 8880
bits_per_line = ((dsc->slice_width * dsc->bpp) +
		packet_overhead) * nslices;

// bits_per_line = 8880 / 4 = 2220
bits_per_line = bits_per_line / (config->num_data_lanes);

// min_bitclk_hz = (2220 * 2560 * 60) = 340992000
min_bitclk_hz = (bits_per_line * timing->v_active * timing->refresh_rate);

...

// the final calculated min_threshold_us (1211.19833)
//     is not greater than the frame_threshold_us
if (min_threshold_us > frame_threshold_us)
    frame_threshold_us = min_threshold_us;

// dsi_transfer_time_us = 16666.66 - 2666 = 14000.6666
timing->dsi_transfer_time_us = frame_time_us - frame_threshold_us;

// pclk_rate_hz = 340992000 * 16666.6666 = 5683199977267.2
pclk_rate_hz =  min_bitclk_hz * frame_time_us;

// pclk_rate_hz = 5683199977267.2 / 14000.6666 = 405923527.76026
do_div(pclk_rate_hz, timing->dsi_transfer_time_us);

// pixel_clk_khz = 405923527.76026 * 4 = 1623694111.041
pixel_clk_khz = pclk_rate_hz * config->num_data_lanes;

// note: config->bpp value is taken from "qcom,mdss-dsi-bpp" entry in panel DTS
// pixel_clk_khz = 67653921.293375
do_div(pixel_clk_khz, config->bpp);

display_mode->pixel_clk_khz = pixel_clk_khz;

Note: I'm using the default frame_threshold_time_us. As with my previous calculation, there might be some 0.1 MHz difference due to rounding.

pixel_clk ticks once for every lane) but also that pclk_rate is still in bits whereas pixel_clk has been divided by the number of bytes, and strangely so the bpp in the original picture (RGB888=24), not the number of bytes in the DSC signal (bpp=8)?

The reason we're dividing by 24 here is that pclk is that pclk should be derived from byte clk so we need to convert min_bitclk_hz to bytes by doing the divide by 24.

This value would only be used to compare and confirm how close upstream is on the various panels

Sure, just wanted to note that qcom,mdss-dsi-panel-clockrate is for a specific feature and not necessarily something that can be used as a reference value of what the pclk rate should be.

Yes, the transfer time makes up for this [..]

Note: I'm using the default frame_threshold_time_us. As with my previous calculation, there might be some 0.1 MHz difference due to rounding.

Thank you, I'll cross-check this with my sources ASAP to make sure we're doing the same thing up- and downstream.

The reason we're dividing by 24 here is that pclk is that pclk should be derived from byte clk so we need to convert min_bitclk_hz to bytes by doing the divide by 24.

The question is why you're dividing by 24, and not by 8 because the signal is DSC and contains 8 bits per pixel (a reduction of 3). To cope with the bitclk/byteclk later multiplying by dsi_get_bpp() again? Is it a spec thing that the pclk can be that much slower?

Also it is quite interesting that mainline calculates it in reverse: pclk first, then derive the byteclk (and bitclk) later (which indeed does the opposite of what you do here: byte_clk = pclk * bpp / (8 * lanes)

Sure, just wanted to note that qcom,mdss-dsi-panel-clockrate is for a specific feature and not necessarily something that can be used as a reference value of what the pclk rate should be.

That is why I am asking and intending to check (by booting downstream) whether it is put to use or a bogus value, when instead a calculation akin to the above is used.

The question is why you're dividing by 24, and not by 8 because the signal is DSC and contains 8 bits per pixel (a reduction of 3). To cope with the bitclk/byteclk later multiplying by dsi_get_bpp() again? Is it a spec thing that the pclk can be that much slower?

Got it, sorry for the confusion.

The reason we're keeping the source bits per pixel in the denominator is because using the target bits per pixel in the numerator is somewhat of a simplification of the pclk math.

So in general, pclk = bit_clk * num_lanes / bits_per_pixel.

In the case of uncompressed pclk, this calculation becomes:

pclk = (source_bits_per_pixel * h_active * v_total * fps) * num_lanes / source_bits_per_pixel.

To accommodate for compression, we could theoretically calculate pclk as:

pclk = (source_bits_per_pixel * (h_active / 3) * vtotal * fps) * num_lanes / source_bits_per_pixel

(Note that source_bits_per_pixel cancels out for both the compressed and uncompressed calculations above.)

The downstream code simplifies this math by switching out the source_bits_per_pixel in the numerator with target_bits_per_pixel to account for compression instead of plugging in h_active / 3 like we did above.

The resulting math becomes:

pclk = (target_bits_per_pixel * h_active * vtotal * fps) * num_lanes / source_bits_per_pixel

@jesszhan Thanks; ~~the downstream math (you call that "simplified"??) makes much more sense than hardcoding a magic h_total / 3~~ (EDIT: it is _down_stream that uses /3, upstream properly uses source/target_bits_per_pixel): it is not like you're sending 3 times less pixels, instead the ratio between source_bits_per_pixel and target_bits_per_pixel is 3:1 but that does not always need to be true? After all, dsc->bpp is a variable to the system, and so is dsi_get_bpp.

This also means that the pixel clock now no longer ticks once per pixel, but a third of the time. I found the following snippet in the DSC spec:

The DSC bitstream and decoding process are designed to facilitate decoding of 3 pixels/clock in practical hardware decoder implementations. Hardware encoder implementations are possible at 1 pixel/clock. Decoder implementations can be designed to process one, three, or perhaps other numbers of pixels per clock. Coding in Native 4:2:0 or 4:2:2 mode enables implementations with approximately double the throughput (e.g., 2 pixels/clock for an encoder or 6 pixels/clock for a decoder). Additional throughput, in terms of pixels per clock, might be obtained by encoding and decoding multiple slices in parallel, which is facilitated by using multiple slices per line.

Does that imply that our decoder should always process three pixels per clocktick?

Note that your example seems to have lost num_lanes when you plugged bit_clk into the pclk calculation: that should cancel out again in the pclk. And s/vtotal/v_active.

@jesszhan now I'm puzzled too. At least about this piece: pclk = bit_clk * num_lanes / bits_per_pixel. From what I understand, pclk = htotal * vtotal * vrefresh, no hidden math here.

Then this line is incorrect: pclk = (source_bits_per_pixel * h_active * v_total * fps) * num_lanes / source_bits_per_pixel.. There should be no * num_lanes.

Next, I dont't understand this phrase:

The downstream code simplifies this math by switching out the source_bits_per_pixel in the numerator with target_bits_per_pixel to account for compression instead of plugging in h_active / 3 like we did above.

Is the target_bits_per_pixel equal to the FP value of dsc->bpp ? Or is it some other bits_per_pixel value?

Then this line is incorrect: pclk = (source_bits_per_pixel * h_active * v_total * fps) * num_lanes / source_bits_per_pixel.. There should be no * num_lanes.

Yeah as pointed out by me there should have been a / num_lanes when bit_Clk was plugged into this equation, so that both would cancel out.

Is the target_bits_per_pixel equal to the FP value of dsc->bpp ? Or is it some other bits_per_pixel value?

Yes. The source is a "fake" RGB888 bpp=24 format (the input plane to our DSC block can be anything else afaik, and/or maybe this is the "panel/DrIC" format after it decompresses the signal), while we transfer bpp=8 for DSC (not sure how much that means given that the compressed signal isn't per pixel, just that it gives a nice input for a fixed data size).

Hi @MarijnS95 ,

Regarding the pclk FPS issue, can you try this patch: 0001-DEBUG-Enable-BURST_MODE-for-DSI.patch

We were able to recreate the FPS drop you reported for the 2.5k mode but were able to greatly improve the FPS by enabling burst mode. Overall, it seems that the reduction of the pclk rate has exposed some issue with the upstream driver -- the DSC clock adjustment in itself isn't wrong (as you can see with the math above).

FWIW, after looking at internal documentation and downstream code, it seems that burst mode is enabled by default, so it's good to set either way.

Thanks,

Jessica Zhang

Hi Jessica. It seems you have found the missing piece of the puzzle. I agree that downstream sets this bit unconditionally as well, but I did move it out of dsi_update_dsc_timing() into the cmd-mode path of dsi_timing_setup() because it applies to uncompressed streams as well, and tested it on a variety of devices/panels and modes:

Device	Mode	DSC	Result with pclk patch	Result with DSI burst-mode
Xperia 5	1080x2520@60	NO	30Hz, tear at the top	Fluctuating at 30-40Hz, but tear at the top is gone! (this is without enlarged porches mentioned in the respective patch)
Xperia XZ3	1440x2880@60	YES	60Hz, no tearing	60Hz, no tearing
Xperia 1	1096x2560@60	YES	30Hz, tear at the top	60Hz, no tearing
Xperia 1	1644x3840@60	YES	30Hz, tear at the top	60Hz, no tearing
Xperia 5 II	1080x2520@60	YES	Fluctuating at 40-48Hz	59.3Hz, no tearing
Xperia 5 II	1080x2520@120	YES	Fluctuating at 45-52Hz	Fluctuating at 116-118Hz
Xperia 5 III	1080x2520@60	YES (v1.2)	Fluctuating at 48-52Hz	60Hz
Xperia 5 III	1080x2520@120	YES (v1.2)	Fluctuating at 48-59Hz, tearing at the top	Stable 119.2Hz

Note that this might be the first time you see me mention the Xperia 5 III: it uses an identical panel to the Xperia 5 II, namely Samsung's sofef03-m DrIC with amb609vp01 panel/glass/assembly (and the results are unexpectedly identical).

So all in all, only the Xperia 5 II seems to still struggle a little bit in 120Hz-mode with this patch, but overall this solves everything

Your patch raises a few followup questions (for my understanding):

Is there any power implication in setting burst mode?
Are there any interesting bits previously set in this register (by e.g. the bootloader), that its value needs to be read back? Otherwise I think we should forcefully overwrite the register to not leave any unknowns in there;
Newer kernels have this feature behind not qcom,mdss-dsi-mdp-idle-ctrl-en: https://github.com/sonyxperiadev/kernel-techpack-display-driver/blob/fa8a4bf142d57bf792d13c10c900884257cd0f02/msm/dsi/dsi_ctrl_hw_cmn.c#L743-L747 Do we need to gate this behind idle control at some point? It seems nothing in my downstreams needs/sets it for now.

I will reply to the pclk issue separately in the respective thread. Thanks again!

This is really great! Regarding the qcom,mdss-dsi-mdp-idle-ctrl-en. I found this commit.

Hi everyone. First of all, thanks for all your amazing efforts on drm/msm.

I have been playing with my Xiaomi Mi 11 Ultra recently, which has SM8350. It features a 10-bit 120Hz display panel using DSC v1.1. But I encountered really weird issues with latest Linux kernel.

After enabling the panel driver, the device can't even boot. It will reboot to Qualcomm crashdump mode at some point in the boot process.

At first I thought it could be an issue with my broken panel driver, so I changed the compatible string in DT to match another DSC panel driver, panel-visionox-r66451 from linux-next. However it still goes to crashdump mode.

I managed to get memory dump with edl.py tool and extracted dmesg from it. And it looks like it's stuck somewhere in dpu_encoder_trigger_kickoff_pending()

[drm:dsi_mgr_bridge_mode_set [msm]] set mode: "1080x2340": 120 345830 1080 1175 1176 1216 2340 2365 2366 2370 0x48 0x0x0
[drm:dpu_crtc_atomic_begin [msm]] crtc8181
Internal error: synchronous external abort: 0000000096000010 [#1] PREEMPT SMPMP
Modules linked in: panel_k2(+) fastrpc qrtr_smd ucsi_glink pmic_glink_altmode qcom_battmgr qcom_spmi_adc5 qcom_vadc_common snd_soc_cs35l41_i2c(+) snd_soc_cs35l41 qcom_spmi_temp_alarm ir_spi snd_soc_cs35l41_lib rc_core snd_soc_wm_adsp cs_dsp crct10dif_ce msm drm_dp_aux_bus gpu_sched drm_display_helper drm_kms_helper ipa qcom_q6v5_pas qcom_pil_info qcom_q6v5 qcom_sysmon qcom_rng qcom_common mdt_loader nvmem_reboot_mode qrtr pmic_glink ip_tables x_tables ipv6c
Hardware name: Xiaomi Mi 11 Ultra (DT)
pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : dpu_reg_read+0x8/0x10 [msm]
lr : dpu_hw_ctl_clear_pending_flush+0x1c/0x38 [msm]
sp : ffff800083daaf00
x29: ffff800083daaf00 x28: 0000000000000001 x27: ffff23f81f46f2d0c
x26: 0000000000000020 x25: 0000000000000070 x24: ffff23f82c4c8e08c
x23: ffff23f82c4c8d08 x22: ffff23f805e6c400 x21: ffff23f805e6f000c
x20: ffff23f82c4c8c80 x19: ffff23f805e6f000 x18: ffffffffffffffffc
x17: 3320303231203a22 x16: ffffd7608fd8f018 x15: ffff800083daa9e0c
x14: 0000000000000000 x13: 313863747263205d x12: 5d6d736d5b206e69c
x11: 6765625f63696d6f x10: 74615f637472635f x9 : 5d6d736d5b206e69c
x8 : 6765625f63696d6f x7 : 74615f637472635f x6 : 000000000000000cc
x5 : ffff23fa7e87acc8 x4 : 0000000000000924 x3 : ffff23f81f46f000c
x2 : 0000000000000438 x1 : ffff800083617018 x0 : ffff800083617000c
Call trace:
 dpu_reg_read+0x8/0x10 [msm]
 dpu_encoder_trigger_kickoff_pending+0x4c/0xd0 [msm]
 dpu_crtc_atomic_begin+0xe0/0x26c [msm]
 drm_atomic_helper_commit_planes+0x74/0x274 [drm_kms_helper]
 msm_atomic_commit_tail+0x194/0x50c [msm]

... garbage data below

I've tried both v6.5-rc3 and next-20230719. But they seem to have the same problem.

full log (directly extracted from RAM): https://pastebin.com/ykp10KFd

The only read happening in dpu_hw_ctl_clear_pending_flush() is the CTL_FLUSH register for a tracing event. That register should be readable right in the middle of a kickoff. It seems to be the first register that is touched within a commit, though. Afaik typically synchronous external aborts could indicate unclocked/ungated access, somehow?

I also have a 1080x2520@120Hz panel on SM8350, and it works flawlessly with next-20230724 (all modules builtin though). Lots of my own commits on top though, but I can try on an as-barebones-as-possible -next if needed.

Note that I'll likely close this issue soon as all problems discussed so far have been resolved (and pclk is "pending"). The next issue is DSC on dual-DSI active-CTL hardware, which is a totally different beast.

Hmm okay, thank you. So I guess there's something wrong with the device itself.

Note that I'll likely close this issue soon as all problems discussed so far have been resolved

EDIT: Yes I guess I should open a separate issue for my problem :)

mentioned in issue #31 (closed)

Closing as "all" the issues on SM8[1234]50 discussed here have been fixed! Will probably create new issues to figure out dual-DSI + active-CTL + DSC if I cannot figure it out on my own. And anticipating discussed pclk improvements will be provided at some point in the future.

EDIT: I'm also slowly but surely preparing the corresponding panel drivers for submission and inclusion in the mainline kernel, now that DPU/DSI is ready for it.

closed

DSC outputs garbage over DSI on SM8[12]50 boards

Context

Source

SM8150 (Sony Xperia 1)

SM8250 (Sony Xperia 5II)

Bonus: tearcheck in INTF instead of PP block

Designs

Child items ...

Activity

Corruption is now fixed on SM8150 (Sony Xperia 1)

`dsc->slice_count` is not `slice_per_pkt`

What is the right pclk/byteclk?

Admin message

Admin message

DSC outputs garbage over DSI on SM8[12]50 boards

Context

Source

SM8150 (Sony Xperia 1)

SM8250 (Sony Xperia 5II)

Bonus: tearcheck in INTF instead of PP block

Activity

Corruption is now fixed on SM8150 (Sony Xperia 1)

dsc->slice_count is not slice_per_pkt

What is the right pclk/byteclk?

`dsc->slice_count` is not `slice_per_pkt`