pps: reduce cpu overhead
The last commit affects all data sources (freedreno, intel, and panfrost). The first 4 are specific to freedreno.
I configured pps-producer to sample at 1ms period. On qcom sc7180, pps-producer took >15% of cpu time when on little cores and >10% of cpu time when on big cores. perf said
- 98.44% 0.14% pps-producer pps-producer [.] main
- 98.30% main
- 97.66% pps::GpuDataSource::trace_callback
- 87.86% pps::GpuDataSource::trace
- 54.11% pps::FreedrenoDriver::dump_perfcnt
- 49.69% pps::FreedrenoDriver::collect_countables
37.34% pps::FreedrenoDriver::Countable::collect
4.23% cfree
+ 3.21% operator new
1.80% pps::FreedrenoDriver::collect_countables
1.41% memcpy
0.65% std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_create
+ 3.97% msm_pipe_get_param
+ 26.74% __sched_setscheduler
+ 2.79% pps::add_samples
+ 0.85% __sched_getparam
+ 0.76% __sched_getscheduler
0.75% pps::GpuDataSource::trace
0.63% perfetto::TraceWriterImpl::NewTracePacket
+ 8.19% __nanosleep
0.52% pps::GpuDataSource::trace_callback
Other than the first commit, this MR aims to get rid of __sched_setscheduler
for all data sources. It does so by creating an RT thread to do the sampling and keep the main thread non-RT (edited: this is specific to freedreno. For intel and panfrost, the call is simply removed). It also paves the way for the main thread to wake up at a lower frequency, which can reduce the time in __nanosleep
.