Skip to content

llvmpipe: fixed race condition in lp_rast_destroy that causes a crash on windows

Dario Mylonopoulos requested to merge ramenguy99/mesa:main into main

This merge requests fixes issue #7099 (closed). As pointed out in the issue there is a race condition happening on windows when destroying the llvmpipe rasterizer that often causes an hard crash (null pointer dereference) in the rasterizer worker threads.

The current synchronization mechanism uses the same semaphore to signal to the main thread that a work iteration (rasterization of a scene) is completed and to signal that the worker threads exited. However, with the current implementation the work iteration semaphore is never waited on after submitting a scene for rasterization, causing its value to grow unbounded during multiple invocations. On destruction the main thread attempts to wait for the worker threads to exit, but the wait function immediately returns since the semaphore's value has already been incremented multiple times.

One option would be to have the main thread wait after every invocation of the rasterizer. In fact, the function lp_rast_finish does exactly this, but it is never called. My assumption is that this currently does not happen because it would stall the main thread while the workers run asynchronously. Synchronization is currently happening directly on a fence on the scene which signals when rasterization is complete. Waiting after every submission could be implemented by adding this call at Line 1131 of lp_rast.c:

   else {
      /* threaded rendering! */
      lp_scene_enqueue(rast->full_scenes, scene);

      /* signal the threads that there's work to do */
      for (unsigned i = 0; i < rast->num_threads; i++) {
         util_semaphore_signal(&rast->tasks[i].work_ready);
      }
      /* New: synchronize here */
      lp_rast_finish(rast);
   }

The fix implemented in this MR is much simpler, instead of reusing the same semaphore for multiple purposes, a new semaphore per worker thread is allocated on windows to signal exiting. This fixes the crash by ensuring that worker threads exit safely before freeing up their resources.

The current implementation could be further cleaned up by removing the work_done semaphore which is currently never waited on and the unused wait function.

Edited by Dario Mylonopoulos

Merge request reports