Due to an influx of spam, we have had to impose restrictions on new accounts. Please see this wiki page for instructions on how to get full permissions. Sorry for the inconvenience.
Admin message
The migration is almost done, at least the rest should happen in the background. There are still a few technical difference between the old cluster and the new ones, and they are summarized in this issue. Please pay attention to the TL:DR at the end of the comment.
After an initial batch of fixes Display Stream Compression works flawlessly on SDM845 with command mode panels. Unfortunately the same cannot be said for SM8[12]50-based boards/phones.
Even a second set of fixes (which doesn't affect SDM845) doesn't address the issues that we're seeing. On SM8150 the output is garbled, and it is unclear whether DPU is sending the wrong thing or the panel is configured wrongly to interpret it.
This is what it looks like when booting into framebuffer console until it switched to msmdrmfb. The first keypress is starting modetest -M msm -v -s 32:1096x2560, the second is ctrl+c to make it clear and exit (note how it blanks the whole screen, which would have been half the screen before https://lore.kernel.org/linux-arm-msm/20221213232207.113607-5-marijn.suijten@somainline.org/).
Note that I am also finalizing my patches that bring INTF_TE support to DPU; these have been part of my tree at various stages of testing DSC but have not had any effect. As mentioned above they do fix tearing and odd framerates on all DPU >= 5.0.0 platforms including a Sony Xperia 5 (SM8150) with a regular non-DSC commandmode panel (same for SM6125 and a bunch of other boards/devices).
Edited
Designs
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related.
Learn more.
Thanks! I can also take a look myself if someone gives me hints on the magic and unnecessarily-private status registers; I may simply be missing something stupid or the DSI clocks/timings may be totally off.
We will finalize the DSC helper patches which are on the list first and then re-test this as that fixes the math and calculations. Hence for the bug, lets wait till we have R-b on the foll series first:
What's worse, the patches bluntly remove dsc->slice_count for being wrong and pretend slice_per_pkt = 1whereas the panels discussed in this issue both have slice_per_pkt = 2. Both those panels have a fortunate coincidence that dsc->slice_count == slice_per_pkt == 2, so I cannot take those fixes as they break it again.EDIT: Testing points out that, after pulling in the rest of these fixes, all these panels can operate on slice_per_pkt=1. I assume having 2 slices per pkt was simply found to be better / more efficient, hence it's set to 2 in DT?
Instead, I propose to create a separate series that:
Collects all the DSC fixes specfic to slice_count together, instead of scattering them across multiple series;
Gets rid of faulty dsc->slice_count usage everywhere;
Introduces slice_per_pkt handling, maybe together with the "fixes" (i.e. replace the parameter) for proper atomicityEDIT: may not be needed for now.;
Additionally there should be some validation that slice_count * slice_width == hdisplay.
I am willing to prepare and send that series (taking original authorship/reported-by's) as I have the ability to test on slice_per_pkt = 2.
What is the right pclk/byteclk?
While drm/msm/dsi: Adjust pclk rate for compression nicely reduces (title suggestion!) the clocks on the Xperia XZ3, it results in not being able to reach 60Hz on the Xperia 1 and has noticeable corruption at the top of the screen. Without that patch, at least in 2.5k mode, the panel reaches its full 60Hz potential. I have yet to work out the right timing for its 4k mode.
There have been mentions of an "(in?)efficiency factor" downstream to adjust the clocks rather than manually taking porches into account, as cmdmode doesn't use porches.
We have a similar issue on a "Samsung SOFEF01" panel (no DSC, just cmdmode): it works fine at 60Hz on The Sony Xperia 10II (sm6125) but resulted in the same 30 fps break on the Sony Xperia 5 (sibling of the Xperia 1, same board on SM8150). I bumped the porches (copied from another panel) simply to get the clocks to run higher and reach 60Hz, but that isn't a proper fix: https://github.com/SoMainline/linux/commit/085ecdc32a04d471f10c6721e8ba07b3cf9947a8
I will reply to that patch with my findings and see where we can go next.
Thanks for the summary and tests and also your support.
I am fine with replacing hdisplay /= 3; with DIV_ROUND_UP(hdisplay, 3). We can fix that up in the original patch itself.
So are you saying that your display is broken only after that?
It seems like from your explanation, your display was always working with slice_per_pkt as 2 and making slice_count as 1. That was always incorrectly working then so its not so much as a fix in the display driver code but unfortunately wrong interpretation of those terms in the panel driver because you did not have a proper reference to use while making your panel driver changes.
So, I would not really say that Fixes are all over the place but the definition of "what is the fix and where its being fixed is important". If the panel driver was working with incorrect parameters so far, we cannot really say that the DPU driver was wrong all this while but rather an incorrect interpretation of the DSC parameters by the panel driver but ofcourse, the original series needs to take some of the blame for it for not making the distinction of those parameters clear and myself need to take blame for acking that fix of yours alone without looking into what was programmed in your panel driver.
I noted in the series that you referred to "slice_per_pkt" as a QC hardware specific thing. This is not true. As I wrote in the email, please refer to Figure 40 One Line Containing One Packet with Data from One or More Compressed Slices and Figure 41 One Line Containing More than One Compressed Pixel Stream Packet of the DSI 1.3 spec. It explains this in detail. So this should not be treated or thought of as a QC specific term or concept.
So I think lets not account for slice_per_pkt in the existing series and if all the panel drivers are working even with slice_per_pkt as 1 and the correct slice_count let them continue to work that way.
Once slice_per_pkt support is properly added, we will fix up all the math wherever relevant in the same series. We will check if any of our panels support slice_per_pkt as 2 so that we don't have to depend on your validation for that.
Let me know if you have any concerns with that.
Regarding pixel clock, I had pointed out to jessica that her math was slightly wrong.
The fps being computed is incorrect
int fps = DIV_ROUND_UP(pclk_rate, mode->htotal * mode->vtotal);
This uses the reduced h_active to compute the fps again but thats wrong. The fps should still be computed with drm_mode_refresh() with the full h_total. Let me know if that fixes the issue for you and we will upload that as well in the existing series.
So are you saying that your display is broken only after that?
No. I am saying that this change makes the corruption look different (whole panel updates, instead of just the top as shown in the video). Then, when the third bullet-point is applied, everything is fixed.
It seems like from your explanation, your display was always working with slice_per_pkt as 2 and making slice_count as 1.
Not entirely, allow me to re-specify. But first, note again that the patches to get the Xperia 1 and Xperia 5 II (phones used in the original report) are unrelated to slice_count! I just happened to find this discrepancy along the way.
On the Sony Xperia XZ3, we erroneously set slice_count = 1 because slice-per-pkt downstream is 1. It should have been 2, because of hdisplay / slice_width = 2 = slice_count, but it worked nevertheless. I even wrote in https://lore.kernel.org/linux-arm-msm/20221221231943.1961117-3-marijn.suijten@somainline.org/ (the patch that was "wrong", though not as wrong as the original) that both were miraculously working.
That was always incorrectly working then
Exactly, it was incorrectly working on the Xperia 1 because there slice_count == slice_per_pkt == 2. However, if all broken DPU/DSI code is removed that pretends dsc->slice_count is slice_per_pkt, my Xperia 1 works too (with Jessica's patch we're assuming slice_per_pkt = 1 for now).
so its not so much as a fix in the display driver code but unfortunately wrong interpretation of those terms in the panel driver because you did not have a proper reference to use while making your panel driver changes.
The slice_count = 1 -> 2 is definitely a fix in my panel code. However, the wrong use of dsc->slice_count upstream where slice_per_pkt should have been used is definitely driver code. Your team already sent two fixes (and more need to come!) so I wouldn't say "its not so much as a fix in the display driver code"...
We should bundle these in the same series and make the fix complete. I can help with that if you want.
So, I would not really say that Fixes are all over the place but the definition of "what is the fix and where its being fixed is important".
They are spread across two series and missing a few other wrong uses of dsc->slice_count.
If the panel driver was working with incorrect parameters so far, we cannot really say that the DPU driver was wrong all this while but rather an incorrect interpretation of the DSC parameters by the panel driver but ofcourse, the original series needs to take some of the blame for it for not making the distinction of those parameters clear
Same as above.
and myself need to take blame for acking that fix of yours alone without looking into what was programmed in your panel driver.
For the record, all of the original DSC patches were doing this wrong. My patch was just one extra case on top that was assuming slice_per_pkt downstream was equivalent to dsc->slice_count upstream. It has no relevance to what my panel driver was doing, everyone was just making the wrong assumption that slice_per_pkt == dsc->slice_count during porting.
I noted in the series that you referred to "slice_per_pkt" as a QC hardware specific thing. This is not true. As I wrote in the email, please refer to Figure 40 One Line Containing One Packet with Data from One or More Compressed Slices and Figure 41 One Line Containing More than One Compressed Pixel Stream Packet of the DSI 1.3 spec. It explains this in detail. So this should not be treated or thought of as a QC specific term or concept.
Thanks, I'll take a closer look at the DSI spec.
So I think lets not account for slice_per_pkt in the existing series and if all the panel drivers are working even with slice_per_pkt as 1 and the correct slice_count let them continue to work that way.
Sounds good, but we should first fix/remove all wrong uses of dsc->slice_count (and add some extra validation to it as suggested) before I'll review/ack the patches, as said above.
Once slice_per_pkt support is properly added, we will fix up all the math wherever relevant in the same series. We will check if any of our panels support slice_per_pkt as 2 so that we don't have to depend on your validation for that.
Sounds good, though having more validation is always better (my panel seems to accept both, so I don't know how valid any of the testing is).
Regarding pixel clock, I had pointed out to jessica that her math was slightly wrong.
The fps being computed is incorrect
int fps = DIV_ROUND_UP(pclk_rate, mode->htotal * mode->vtotal);
This uses the reduced h_active to compute the fps again but thats wrong. The fps should still be computed with drm_mode_refresh() with the full h_total. Let me know if that fixes the issue for you and we will upload that as well in the existing series.
Are there any panels that require slices_per_pkt > 1?
If we ignore slices_per_pkt for now and later want to introduce it. Is it going to be a localized change, or is it going to be spread all over the DSI-DSC code?
I saw your comment on the drm/msm/dsi: Adjust pclk rate for compression patch:
Note that I cannot get the 4k mode working at 60Hz on one of my panels
(30Hz works with minor corruption), regardless of this patch.
So, just to check my understanding, Sony Xperia XZ3 is working with the changes you mentioned above. But the pclk reduction is exposing an issue with the Sony Xperia 1?
I'm a little confused if you mean that the patch is causing the regression or just exposing the issue, since the wording in your previous comment in the MR says:
drm/msm/dsi: Adjust pclk rate for compression... results in not being able to reach 60Hz on the Xperia 1
But the comment on the patch says that the issue happens "regardless of this patch"
@jesszhan the device (Xperia 1) and panel in question has two modes:
4k mode: Always stuck at 30 fps and slight corruption, with or without your patch;
2.5k mode: Stuck at 30 fps and slight corruption with your adjust pclk rate for compression patch, after dropping it the panel runs at 60fps without corruption.
I've discussed this with Abhinav and we think this pclk issue is due the upstream not having the concept of transfer time for command mode panels.
Currently in upstream, we don't distinguish command vs video mode as command mode doesn't have the concept of porches like you mentioned above.
However, in downstream, command mode takes transfer time into consideration when calculating pixel and mdss clock rates. The transfer time adjustment is defined in the device tree downstream (here is an example DTSI), which would not fly upstream.
If we want to add upstream support for transfer time, we can try to calculate this transfer time value by doing transfer_time = 1 / vrefresh_rate then adding that into the pclk_rate calculation. However, to test that change, I will have to find a way to measure FPS on my test device (which I will look into... would appreciate any suggestions).
FWIW, the current math is correct for DSC. This pclk calculation issue seems to be specifically related to command mode panels not taking transfer time into account.
(Also, feel free to keep this bug open to track this pclk issue.)
Small update with v3, because the above seems to be quite confusing. This is on the Sony Xperia 1, while testing with modetest -M msm -s 32:#0 -v and including the output of /d/clk/disp_cc_mdss_pclk0_clk/clk_rate:
It's been said that transfer time might not been taken into account yet, so I'd like to investigate how much of an impact that has on these DSC calculations and if we can make them work properly from the get-go.
modetest reports 60Hz refreshing, confirmed by dsi_isr raising at 60Hz in /proc/interrutps
We calculated the expected clock rate (for 1096x2560) with and without DSC enabled based on hdisplay information given in the DTSI, and it seems that with DSC enabled, the correct pclk rate should be 67.8 MHz, which is close to the pclk rate you reported above. We also weren't sure why, for a command mode panel, the reduced pclk rate is affecting the FPS.
Just wondering, what rate is your tearcheck interrupt firing at? Can you add a log in dpu_encoder_phys_cmd_te_rd_ptr_irq() and let me know? It might be possible that the tearcheck irq could be throttling the FPS.
Finally, can you elaborate on what you meant by this note (ex. which dsi isr specifically are you referring to):
modetest reports 60Hz refreshing, confirmed by dsi_isr raising at 60Hz in /proc/interrutps
@jesszhan can you share the calculations that you used, so that everyone can validate this work?
Then I can also reproduce that for the 1644x3840 mode, which is also broken.
the correct pclk rate should be 67.8 MHz, which is close to the pclk rate you reported above
That is the rate I reported with your patch that breaks the panel, so this does not confirm that the calculation in your patch is correct.
Just wondering, what rate is your tearcheck interrupt firing at? Can you add a log in dpu_encoder_phys_cmd_te_rd_ptr_irq() and let me know? It might be possible that the tearcheck irq could be throttling the FPS.
Will let you know once I unshelve this device but in short: yes that is expected. If we are too slow sending data to the panel (pclk too low), the panel will wait another frame (AFAIK, instead of sending TE as soon as the whole frame has arrived) and hence trigger at exactly half the rate?
Finally, can you elaborate on what you meant by this note:
modetest reports 60Hz refreshing, confirmed by dsi_isr raising at 60Hz in /proc/interrutps
When in 1644x3840 mode there is only white garbage shown on the display, but the DSI host and panel seem to agree that the signal resembles 60FPS successfully (otherwise I expect 30Hz as said above).
ex. which dsi isr specifically are you referring to
Edited my above comment to add the math for 1096x2560 case. The values I used were taken from the panel driver and DTSI. The downstream calculations don't take porches into account, so they will match the resulting pclk_rate if you remove the porches from the upstream math. Please let me know if you have any questions about it.
Can you also try applying this patch (which adds a bump_ratio factor to the pclk_rate adjustment) and see at what bump_ratio value the issue is fixed for you? I would recommend starting at bump_ratio = 105.
Hi @jesszhan thanks for all this info: seems like newer/different downstream kernels include the math in clearer form than my old 4.14 codebase.
Replies will be fragmented and skipping parts as the big solution appears to have been DSI burst mode below. We still have a few points to figure out for properly calculating the pclk though.
Your current pclk patch only scales the hdisplay portion of htotal by "dividing" it by 3 (via hdisplay * bpp / (bits_per_component * 3)). That means the porches are currently left intact, which we can assume are used in place of packet_overhead (hardcoded to 56 in the kernel you linked).
Do you want to keep that as it is, or use the new calculation in dsi_host.c immediately?
That's where the 67.8MHz vs 67.9MHz discrepancy on the panel must come from in your comment above. However, per the code shared, pclk of 56.8MHz matches neither, can you explain?
Few other notes:
pixel_clk_khz must be in Hz otherwise the value you calculated is 56GHz;
In the last snippet of your calculation you seem to have used the value of min_bitclk_hz in place of pclk_rate_hz, skipping parts of the downstream kernel;
The way downstream calculates this makes absolutely no sense!
min_bitclk_hz=336691200dsi_transfer_time_us=14000# or DEFAULT_MDP_TRANSFER_TIMEframe_time_us=16666pclk_rate_hz=min_bitclk_hz*dsi_transfer_time_us/frame_time_us# 282820608pixel_clk_hz=pclk_rate_hz*dsi_lanes/24=# 47136768
Even if we were to use downstream qcom,mdss-dsi-panel-clockrate the values don't come close:
min_bitclk_hz=336691200dsi_transfer_time_us=14000# or DEFAULT_MDP_TRANSFER_TIMEframe_time_us=16666clk_rate_hz=976285440# qcom,mdss-dsi-panel-clockratedsi_transfer_time_us=frame_time_us*min_bitclk_hz//clk_rate_hzpclk_rate_hz=min_bitclk_hz*dsi_transfer_time_us/frame_time_us# 116102503pixel_clk_hz=pclk_rate_hz*dsi_lanes/24=# 19350417
What is the difference between pclk_rate and pixel_clk?
With bits_per_line you're referring to the number of DSI lanes, right?
Finally, do you know how to really work with qcom,mdss-dsi-panel-clockrate, or should I just take clktree dumps on downstream to see how close we can get? Mostly to confirm that the mainline pclk goes as low as downstream instead of wasting power because of - for example - hardcoding unused porches in drm_display_mode.
Your current pclk patch only scales the hdisplay portion of htotal by "dividing" it by 3 (via hdisplay * bpp / (bits_per_component * 3)). That means the porches are currently left intact, which we can assume are used in place of packet_overhead (hardcoded to 56 in the kernel you linked).
Correct, we can use the inclusion of the porches as a way to make up for the fact that both packet_overhead and transfer_time are not taken into consideration in the upstream math.
Do you want to keep that as it is, or use the new calculation in dsi_host.c immediately?
I plan to keep the pclk calculations I have in the current pclk patch as is, since it matches the downstream math (sans porches).
That's where the 67.8MHz vs 67.9MHz discrepancy on the panel must come from in your comment above. However, per the code shared, pclk of 56.8MHz matches neither, can you explain?
The 56.8MHz is from calculating the pclk_rate without porches while the 67.8MHz was derived by calculating the pclk_rate with porches.
The 67.8MHz vs 67.9MHz discrepancy is due to rounding -- in my hand calculations, I used the exact value of hdisplay * bpp / (3 * bpc) while the code does a DIV_ROUND_UP().
pixel_clk_khz must be in Hz otherwise the value you calculated is 56GHz;
Yes, the final value I have in the snippets will be in Hz. Sorry if the variable name caused confusion -- I kept the *_khz suffix to make it easier to compare with the reference code, but left out the final conversion into khz that happens later in the code.
In the last snippet of your calculation you seem to have used the value of min_bitclk_hz in place of pclk_rate_hz, skipping parts of the downstream kernel
Yes, this is because we are skipping the transfer_time calculations since upstream also doesn't take transfer_time into consideration.
What is the difference between pclk_rate and pixel_clk
pclk_rate would be the clock per data lane (as the initial min_bitclk_hz has a divide by num_data_lanes in the calculation) while pixel_clk is just the clock value.
With bits_per_line you're referring to the number of DSI lanes, right?
The variable name seems to not include the "per lanes" aspect of the value, but yes the value itself measures the bits per line per DSI data lane.
Finally, do you know how to really work with qcom,mdss-dsi-panel-clockrate, or should I just take clktree dumps on downstream to see how close we can get?
qcom,mdss-dsi-panel-clockrate node was specifically for the RFI feature to override the clock rate calculations for a specific board/product. We currently skip that part of the calculation, as we don't support RFI in upstream.
You're free to take clk dumps on downstream and compare, but I think implementing something equivalent upstream would be out of the scope of this patch/series.
The 56.8MHz is from calculating the pclk_rate without porches while the 67.8MHz was derived by calculating the pclk_rate with porches.
10MHz, or ±20%, is quite a lot for these small porches. I don't get anywhere close when embedding downstream's packet_overhead=56 either, and neither value makes the panel happy.
Is transfer_time going to significantly bump the clocks then?
The variable name seems to not include the "per lanes" aspect of the value, but yes the value itself measures the bits per line per DSI data lane.
Never mind, it is indeed per line as you are calculating the number of bits per chunk here, and then multiplying that by the number of slices which equals one scanline width.
pclk_rate would be the clock per data lane (as the initial min_bitclk_hz has a divide by num_data_lanes in the calculation) while pixel_clk is just the clock value.
Yes it is clear that pclk_rate is multiplied by 4 again to construct pixel_clk (meaning that pclk_rate ticks for all lanes at once, but pixel_clk ticks once for every lane) but also that pclk_rate is still in bits whereas pixel_clk has been divided by the number of bytes, and strangely so the bpp in the original picture (RGB888=24), not the number of bytes in the DSC signal (bpp=8)?
It seems to just be an oddly named variable in the end since it is never actually used anywhere else once it has been used to obtain pixel_clk.
You're free to take clk dumps on downstream and compare, but I think implementing something equivalent upstream would be out of the scope of this patch/series.
No way should have have hardcoded clock values like that upstream, that is the inverse of what I'm asking/suggesting. Instead, there seems to be no trivial way to calculate pixel_clk out of qcom,mdss-dsi-panel-clockrate in the way downstream does it, meaning I'd have to boot downstream kernels and take clk dumps to figure out what it ended up with.
This value would only be used to compare and confirm how close upstream is on the various panels.
Note: I'm using the default frame_threshold_time_us. As with my previous calculation, there might be some 0.1 MHz difference due to rounding.
pixel_clk ticks once for every lane) but also that pclk_rate is still in bits whereas pixel_clk has been divided by the number of bytes, and strangely so the bpp in the original picture (RGB888=24), not the number of bytes in the DSC signal (bpp=8)?
The reason we're dividing by 24 here is that pclk is that pclk should be derived from byte clk so we need to convert min_bitclk_hz to bytes by doing the divide by 24.
This value would only be used to compare and confirm how close upstream is on the various panels
Sure, just wanted to note that qcom,mdss-dsi-panel-clockrate is for a specific feature and not necessarily something that can be used as a reference value of what the pclk rate should be.
Note: I'm using the default frame_threshold_time_us. As with my previous calculation, there might be some 0.1 MHz difference due to rounding.
Thank you, I'll cross-check this with my sources ASAP to make sure we're doing the same thing up- and downstream.
The reason we're dividing by 24 here is that pclk is that pclk should be derived from byte clk so we need to convert min_bitclk_hz to bytes by doing the divide by 24.
The question is why you're dividing by 24, and not by 8 because the signal is DSC and contains 8 bits per pixel (a reduction of 3). To cope with the bitclk/byteclk later multiplying by dsi_get_bpp() again? Is it a spec thing that the pclk can be that much slower?
Also it is quite interesting that mainline calculates it in reverse: pclk first, then derive the byteclk (and bitclk) later (which indeed does the opposite of what you do here: byte_clk = pclk * bpp / (8 * lanes)
Sure, just wanted to note that qcom,mdss-dsi-panel-clockrate is for a specific feature and not necessarily something that can be used as a reference value of what the pclk rate should be.
That is why I am asking and intending to check (by booting downstream) whether it is put to use or a bogus value, when instead a calculation akin to the above is used.
The question is why you're dividing by 24, and not by 8 because the signal is DSC and contains 8 bits per pixel (a reduction of 3). To cope with the bitclk/byteclk later multiplying by dsi_get_bpp() again? Is it a spec thing that the pclk can be that much slower?
Got it, sorry for the confusion.
The reason we're keeping the source bits per pixel in the denominator is because using the target bits per pixel in the numerator is somewhat of a simplification of the pclk math.
So in general, pclk = bit_clk * num_lanes / bits_per_pixel.
In the case of uncompressed pclk, this calculation becomes:
(Note that source_bits_per_pixel cancels out for both the compressed and uncompressed calculations above.)
The downstream code simplifies this math by switching out the source_bits_per_pixel in the numerator with target_bits_per_pixel to account for compression instead of plugging in h_active / 3 like we did above.
@jesszhan Thanks; the downstream math (you call that "simplified"??) makes much more sense than hardcoding a magic h_total / 3 (EDIT: it is _down_stream that uses /3, upstream properly uses source/target_bits_per_pixel): it is not like you're sending 3 times less pixels, instead the ratio between source_bits_per_pixel and target_bits_per_pixel is 3:1 but that does not always need to be true? After all, dsc->bpp is a variable to the system, and so is dsi_get_bpp.
This also means that the pixel clock now no longer ticks once per pixel, but a third of the time. I found the following snippet in the DSC spec:
The DSC bitstream and decoding process are designed to facilitate decoding of 3 pixels/clock
in practical hardware decoder implementations. Hardware encoder implementations are possible
at 1 pixel/clock. Decoder implementations can be designed to process one, three, or perhaps other
numbers of pixels per clock. Coding in Native 4:2:0 or 4:2:2 mode enables implementations with
approximately double the throughput (e.g., 2 pixels/clock for an encoder or 6 pixels/clock for
a decoder). Additional throughput, in terms of pixels per clock, might be obtained by encoding
and decoding multiple slices in parallel, which is facilitated by using multiple slices per line.
Does that imply that our decoder should always process three pixels per clocktick?
Note that your example seems to have lost num_lanes when you plugged bit_clk into the pclk calculation: that should cancel out again in the pclk. And s/vtotal/v_active.
@jesszhan now I'm puzzled too. At least about this piece: pclk = bit_clk * num_lanes / bits_per_pixel. From what I understand, pclk = htotal * vtotal * vrefresh, no hidden math here.
Then this line is incorrect: pclk = (source_bits_per_pixel * h_active * v_total * fps) * num_lanes / source_bits_per_pixel.. There should be no * num_lanes.
Next, I dont't understand this phrase:
The downstream code simplifies this math by switching out the source_bits_per_pixel in the numerator with target_bits_per_pixel to account for compression instead of plugging in h_active / 3 like we did above.
Is the target_bits_per_pixel equal to the FP value of dsc->bpp ? Or is it some other bits_per_pixel value?
Then this line is incorrect: pclk = (source_bits_per_pixel * h_active * v_total * fps) * num_lanes / source_bits_per_pixel.. There should be no * num_lanes.
Yeah as pointed out by me there should have been a / num_lanes when bit_Clk was plugged into this equation, so that both would cancel out.
Is the target_bits_per_pixel equal to the FP value of dsc->bpp ? Or is it some other bits_per_pixel value?
Yes. The source is a "fake" RGB888 bpp=24 format (the input plane to our DSC block can be anything else afaik, and/or maybe this is the "panel/DrIC" format after it decompresses the signal), while we transfer bpp=8 for DSC (not sure how much that means given that the compressed signal isn't per pixel, just that it gives a nice input for a fixed data size).
We were able to recreate the FPS drop you reported for the 2.5k mode but were able to greatly improve the FPS by enabling burst mode. Overall, it seems that the reduction of the pclk rate has exposed some issue with the upstream driver -- the DSC clock adjustment in itself isn't wrong (as you can see with the math above).
FWIW, after looking at internal documentation and downstream code, it seems that burst mode is enabled by default, so it's good to set either way.
Hi Jessica. It seems you have found the missing piece of the puzzle. I agree that downstream sets this bit unconditionally as well, but I did move it out of dsi_update_dsc_timing() into the cmd-mode path of dsi_timing_setup() because it applies to uncompressed streams as well, and tested it on a variety of devices/panels and modes:
Device
Mode
DSC
Result with pclk patch
Result with DSI burst-mode
Xperia 5
1080x2520@60
NO
30Hz, tear at the top
Fluctuating at 30-40Hz, but tear at the top is gone! (this is without enlarged porches mentioned in the respective patch)
Xperia XZ3
1440x2880@60
YES
60Hz, no tearing
60Hz, no tearing
Xperia 1
1096x2560@60
YES
30Hz, tear at the top
60Hz, no tearing
Xperia 1
1644x3840@60
YES
30Hz, tear at the top
60Hz, no tearing
Xperia 5 II
1080x2520@60
YES
Fluctuating at 40-48Hz
59.3Hz, no tearing
Xperia 5 II
1080x2520@120
YES
Fluctuating at 45-52Hz
Fluctuating at 116-118Hz
Xperia 5 III
1080x2520@60
YES (v1.2)
Fluctuating at 48-52Hz
60Hz
Xperia 5 III
1080x2520@120
YES (v1.2)
Fluctuating at 48-59Hz, tearing at the top
Stable 119.2Hz
Note that this might be the first time you see me mention the Xperia 5 III: it uses an identical panel to the Xperia 5 II, namely Samsung's sofef03-m DrIC with amb609vp01 panel/glass/assembly (and the results are unexpectedly identical).
So all in all, only the Xperia 5 II seems to still struggle a little bit in 120Hz-mode with this patch, but overall this solves everything
Your patch raises a few followup questions (for my understanding):
Is there any power implication in setting burst mode?
Are there any interesting bits previously set in this register (by e.g. the bootloader), that its value needs to be read back? Otherwise I think we should forcefully overwrite the register to not leave any unknowns in there;
Hi everyone. First of all, thanks for all your amazing efforts on drm/msm.
I have been playing with my Xiaomi Mi 11 Ultra recently, which has SM8350. It features a 10-bit 120Hz display panel using DSC v1.1. But I encountered really weird issues with latest Linux kernel.
After enabling the panel driver, the device can't even boot. It will reboot to Qualcomm crashdump mode at some point in the boot process.
At first I thought it could be an issue with my broken panel driver, so I changed the compatible string in DT to match another DSC panel driver, panel-visionox-r66451 from linux-next. However it still goes to crashdump mode.
I managed to get memory dump with edl.py tool and extracted dmesg from it. And it looks like it's stuck somewhere in dpu_encoder_trigger_kickoff_pending()
The only read happening in dpu_hw_ctl_clear_pending_flush() is the CTL_FLUSH register for a tracing event. That register should be readable right in the middle of a kickoff. It seems to be the first register that is touched within a commit, though. Afaik typically synchronous external aborts could indicate unclocked/ungated access, somehow?
I also have a 1080x2520@120Hz panel on SM8350, and it works flawlessly with next-20230724 (all modules builtin though). Lots of my own commits on top though, but I can try on an as-barebones-as-possible -next if needed.
Note that I'll likely close this issue soon as all problems discussed so far have been resolved (and pclk is "pending"). The next issue is DSC on dual-DSI active-CTL hardware, which is a totally different beast.
Closing as "all" the issues on SM8[1234]50 discussed here have been fixed! Will probably create new issues to figure out dual-DSI + active-CTL + DSC if I cannot figure it out on my own. And anticipating discussed pclk improvements will be provided at some point in the future.
EDIT: I'm also slowly but surely preparing the corresponding panel drivers for submission and inclusion in the mainline kernel, now that DPU/DSI is ready for it.