Skip to content

freedreno: Add some go-fast

Rob Clark requested to merge robclark/mesa:fd/go-fast into master

I came into possession of a collection of android (mostly) game traces, and discovered a few which had patterns that we handled poorly. In general they amount to lots of small blits, and/or compute jobs, and GPU running at low operating performance point. Using the WIP perfetto support (!9901 (merged)) it is easy to see that the overhead of submit per blit/compute is hurting us, and keeping the GPU starved enough that it's utilization doesn't get high enough to trigger moving to higher OPPs.

So this MR adds racing stripes, a big spoiler, and a fat exhaust.

It boils down to three main changes which build upon each other:

  1. Userspace fences: this cuts down substantially one the # of ioctls per frame, in particular getting rid of CPU_PREP:NOSYNC ioctls when allocating staging transfer buffers from the bo-cache.
  2. Deferred submits and submit merging: I've tried a few different approaches to reduce the # of submits per frame.. the challenge always is that when we are building cmdstream we often don't know what order we will be flushing it. The solution is to, if there are no externally visible effects (no fence, etc), to defer submits and merge them into a later submit. Thanks to the userspace fencing, we can easily tell what we need to flush for, e.g. readpix. (Note that I started out with per-fd_pipe deferred submit lists, but to handle cases where (for example) one context is used for texture uploads and another for draws using those textures, preserving the order is more straightforward with a single per-fd_device list.)
  3. Async submit-queue: this moves the overhead of submit ioctl off of the driver thread. And again uses the userspace fence tracking to know where CPU access to a bo needs to synchronize with the submit-queue.

Overall results:

Results: (52/∞ iterations)
./ragnarok_m_eternal_love:       +181.0% +/-   5.4% 
./real_cricket_20:               +171.6% +/-   6.5% 
./world_war_doh:                 +60.7% +/-   6.5% 
./sniper_3d:                     +43.7% +/-   3.7% 
./nba2k20_800:                   +23.0% +/-   1.9% 
./efootball_pes_2021:            +17.0% +/-   2.6% 
./world_of_tanks_blitz:           +7.0% +/-   1.5% 
./manhattan_10:                   +2.0% +/-   0.2% 
./fifa_mobile:                    +1.4% +/-   1.0% 
./lineage_m:                      +1.0% +/-   0.5% 
./manhattan_31:                   +0.7% +/-   0.1% 
./bus_simulator_indonesia:        +0.4% +/-   1.6% 
./lego_legacy:                    +0.3% +/-   0.1% 
./saint_seiya_awakening:          +0.3% +/-   0.3% 
./kartrider_rush:                 +0.3% +/-   0.3% 
./avakin_life:                    +0.3% +/-   0.4% 
./real_commando_secret_mission:   +0.2% +/-   0.2% 
./mobile_legends:                 +0.2% +/-   0.3% 
./raid_shadow_legends:            +0.2% +/-   0.1% 
./standoff_2:                     +0.1% +/-   0.1% 
./aztec_ruins:                    +0.1% +/-   0.2% 
./free_fire:                      +0.1% +/-   0.2% 
./cod_mobile:                     +0.1% +/-   0.2% 
./temple_run_2:                   +0.1% +/-   0.1% 
./real_gangster_crime:            +0.1% +/-   0.3% 
./dragon_ball_legends:            +0.1% +/-   0.2% 
./subway_surfers:                 +0.1% +/-   0.4% 
./extreme_car_driving_simulator:  +0.1% +/-   0.2% 
./google_maps:                    +0.1% +/-   0.1% 
./asphalt_8:                      +0.0% +/-   0.2% 
./temple_run_300:                 +0.0% +/-   0.1% 
./rise_of_kingdoms:               +0.0% +/-   0.4% 
./dragon_raja:                    +0.0% +/-   0.1% 
./one_punch_man:                  +0.0% +/-   0.1% 
./junes_journey:                  +0.0% +/-   0.2% 
./hill_climb_racing:              +0.0% +/-   0.1% 
./aliexpress:                     +0.0% +/-   0.0% 
./klondike_adventures:            +0.0% +/-   0.1% 
./clash_of_clans:                 +0.0% +/-   0.0% 
./whatsapp:                       +0.0% +/-   0.0% 
./messenger_lite:                 +0.0% +/-   0.0% 
./worms_zone_io:                  +0.0% +/-   0.0% 
./clash_royale:                   -0.0% +/-   0.0% 
./happy_color:                    -0.0% +/-   0.0% 
./hay_day:                        -0.0% +/-   0.1% 
./candy_crush_500:                -0.0% +/-   0.0% 
./shadow_fight_2:                 -0.0% +/-   0.2% 
./plants_vs_zombies_2:            -0.0% +/-   0.0% 
./fallout_shelter_online:         -0.0% +/-   0.1% 
./angry_birds_2_1500:             -0.0% +/-   0.3% 
./brawl_stars:                    -0.1% +/-   0.2% 
./arena_of_valor:                 -0.1% +/-   0.3% 
./talking_tom_hero_dash:          -0.1% +/-   0.3% 
./romancing_saga:                 -0.1% +/-   0.2% 
./magic_tiles_3:                  -0.1% +/-   0.3% 
./minecraft:                      -0.2% +/-   1.0% 
./coin_master:                    -0.3% +/-   0.6% 
./trex_200:                       -0.4% +/-   0.4% 
./fate_grand_order:               -0.6% +/-   0.5% 
./egypt_1500:                     -0.7% +/-   0.3% 
./rope_hero_vice_town:            -0.8% +/-   0.5% 
./eight_ball_pool:                -1.0% +/-   0.6% 
./among_us:                       -1.8% +/-   1.0% 
./hearthstone:                    -2.1% +/-   0.5% 
./league_of_legends_wild_rift:    -2.2% +/-   0.9% 
./marvel_contest_of_champions:    -2.4% +/-   2.0% 
./pubg_mobile_lite:               -4.2% +/-   0.2% 
./car_parking_multiplayer:        -4.5% +/-   0.6% 

(the losers seem to mostly be lower resolution traces which are ~120 fps and low gpu freq.. I need to take a closer look at them but I'm mostly not too concerned)

Merge request reports