panfrost: Rip out madvise()

Closed Alyssa Rosenzweig requested to merge alyssa/mesa:panfrost/de-madvise into main

madvise is hurting performance significantly. Why? It incurs overhead in 5 different places!

  1. Every time we allocate from the BO cache, a madvise ioctl.
  2. ...and an mmap because we can't leave BOs mmaped if they're marked MADV_DONTNEED due to panfrost kernel badness.
  3. Every time we free to the BO cache, another madvise ioctl.
  4. ...and a munmap for the above issue.
  5. Every time we use BOs from the BO cache from the CPU, overhead in the kernel for MMU page fault handling to map in the freshly mmap'd BO.

It's a LOT more efficient to leave BOs mmap'd in the BO cache without madvising.

Furthermore, madvise() is not solving real problems for us.

  1. The mechanism is for intended to map to GL_APPLE_object_purgeable, but nobody uses that and it's not hooked up to Gallium.
  2. The madvise() integration is for the BO cache, but the BO cache already has an LRU eviction mechanism. As long as the application continues to render, idle BOs will be freed after ~1 second.
  3. The only real failure case of the LRU cache is on Android, where your app can be backgrounded but your memory preserved. In that case, in low memory situations, the kernel MIGHT be able to clean your backgrounded BO cache to avoid killing your process. But a theoretical use case for Android GLES is not worth taking performance over across the board. The proper implementaiton of this would involve userspace handling a "low memory" event and cleaning up itself, not speculatively/expansively marking the BO cache constantly even in the happy path.
  4. The mechanism is GL-only. It has no place in a properly constructed Vulkan construction. There are real memory footprint problems to be solved, yes, but if our solution can never work on Vulkan, it's not a good solution. ESPECIALLY so on Android, the one place where madvise might help more than it hurts, which is becoming dramatically more Vulkan.
  5. For Linux, if the panfrost shrinker kicks in, it's already game over for you. If you don't have much memory, you probably don't have much CPU either. And the shrinker is expensive... when it kicks in even on the bigger Chromebooks, it often noticeably freezes the desktop, leading users to report that event as a bug report citing the Purging messages in dmesg as "proof" of the bug. But those users aren't wrong: the shrinking behaviour is wrong. Because of the short window of LRU eviction, anything that the shrinker reaps from a continuously rendering process is less than 2 seconds old. That means the application/Mesa will (almost immediately) reallocate all of that memory, undoing the benefit of the shrinker, but losing the benefit of the BO cache, slowing down rendering. But then, the addition allocations will trigger the same low memory condition as there was a few frames ago, triggering the shrinker again, and so on, bringing the system to a crawl until a bigger hammer (like the OOM killer) kicks in.

Given that madvise() doesn't seem to do much for us, and is causing a LOT of different problems ranging from horrible performance to instability, I think we should remove the Mesa usage and see what happens. Worst case we revert.

Dolphin trace performance improved, since Dolphin is heavily CPU bound. Optimizing Dolphin to be playable on Panfrost is what prompted this... This isn't about synthetic benchmarks.

But if you like synthetic benchmarks...

WebGL Aquarium at 5000 fish from 17fps to 24fps fish.

Some drawoverhead subscores doubled. Yes, doubled. Yes, really.

I've been trying to rip out this code since Jan 2021. In the past 26 months I have not found a single good technical reason to keep this other than Android politics.

Signed-off-by: Alyssa Rosenzweig

Merge request reports