Skip to content

pps: reduce cpu overhead

Chia-I Wu requested to merge olv/mesa:pps-overhead into main

The last commit affects all data sources (freedreno, intel, and panfrost). The first 4 are specific to freedreno.

I configured pps-producer to sample at 1ms period. On qcom sc7180, pps-producer took >15% of cpu time when on little cores and >10% of cpu time when on big cores. perf said

-   98.44%     0.14%  pps-producer  pps-producer         [.] main
   - 98.30% main
      - 97.66% pps::GpuDataSource::trace_callback
         - 87.86% pps::GpuDataSource::trace
            - 54.11% pps::FreedrenoDriver::dump_perfcnt
               - 49.69% pps::FreedrenoDriver::collect_countables
                    37.34% pps::FreedrenoDriver::Countable::collect
                    4.23% cfree
                  + 3.21% operator new
                    1.80% pps::FreedrenoDriver::collect_countables
                    1.41% memcpy
                    0.65% std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_create
               + 3.97% msm_pipe_get_param
            + 26.74% __sched_setscheduler
            + 2.79% pps::add_samples
            + 0.85% __sched_getparam
            + 0.76% __sched_getscheduler
              0.75% pps::GpuDataSource::trace
              0.63% perfetto::TraceWriterImpl::NewTracePacket
         + 8.19% __nanosleep
           0.52% pps::GpuDataSource::trace_callback

Other than the first commit, this MR aims to get rid of __sched_setscheduler for all data sources. It does so by creating an RT thread to do the sampling and keep the main thread non-RT (edited: this is specific to freedreno. For intel and panfrost, the call is simply removed). It also paves the way for the main thread to wake up at a lower frequency, which can reduce the time in __nanosleep.

Edited by Chia-I Wu

Merge request reports