The last commit affects all data sources (freedreno, intel, and panfrost). The first 4 are specific to freedreno.
I configured pps-producer to sample at 1ms period. On qcom sc7180, pps-producer took >15% of cpu time when on little cores and >10% of cpu time when on big cores. perf said
- 98.44% 0.14% pps-producer pps-producer [.] main - 98.30% main - 97.66% pps::GpuDataSource::trace_callback - 87.86% pps::GpuDataSource::trace - 54.11% pps::FreedrenoDriver::dump_perfcnt - 49.69% pps::FreedrenoDriver::collect_countables 37.34% pps::FreedrenoDriver::Countable::collect 4.23% cfree + 3.21% operator new 1.80% pps::FreedrenoDriver::collect_countables 1.41% memcpy 0.65% std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_create + 3.97% msm_pipe_get_param + 26.74% __sched_setscheduler + 2.79% pps::add_samples + 0.85% __sched_getparam + 0.76% __sched_getscheduler 0.75% pps::GpuDataSource::trace 0.63% perfetto::TraceWriterImpl::NewTracePacket + 8.19% __nanosleep 0.52% pps::GpuDataSource::trace_callback
Other than the first commit, this MR aims to get rid of
__sched_setscheduler for all data sources. It does so by creating an RT thread to do the sampling and keep the main thread non-RT (edited: this is specific to freedreno. For intel and panfrost, the call is simply removed). It also paves the way for the main thread to wake up at a lower frequency, which can reduce the time in