Skip to content

WIP: CI of freedreno

Emma Anholt requested to merge anholt/mesa:ci-fd into master

This MR introduces CI of the freedreno driver on a306 with the GLES2 CTS on future MRs, as a step toward implementing the "Not Rocket Science Rule Of Software Engineering". It depends on !1496 (merged). I'm putting it up because folks on #dri-devel were interested.

TODO:

  • Bring up more than 2 db410cs to bring down runtime
    • (split 6 ways, GLES2 jobs are ~2.5 minutes each. If I can bring up a total of 8 boards, that will probably make freedreno not our limiting factor on the test stage)
  • Make sure db410cs have automatic docker image storage cleanup to not run out of disk
  • Do a bunch more builds to make sure our test results are reliable.
  • Add cheza to the farm
  • Verify the native ARM container build again (I don't think I've rebuilt since the last fix up to its build script)
  • Test containers are ~500M currently, can we shrink them?
  • Document what this test architecture demands of the test HW
    • kernel has to be mostly stable (a board panicing takes out that job, but doesn't impact other future jobs), GPU reset has to be absolutely stable (so one run's failure doesn't leak to others)
    • docker images are stored on the test devices, and docker container image store can't be located on NFS (docker suggests using).
    • You're running code from randos on the internet, on your boards, with access to /dev/dri, and only containers to protect you. Have a plan for mitigation of the impact there.
    • Networking only requires access to HTTPS from the device, no inbound access necessary
  • Document farm setup better for reproducing with other HW
    • db410c setup info here
    • I'm setting up my runners to be shared to make it pre-merge, but anyone could do something like this with their own HW on their own Mesa with private runners for test automation.
  • Document expectations about this change's impact on other devs. Some of my thoughts:
    • Test stage's time should be <5 minutes total per MR, no adding a platform to per-MR testing that exceeds that. Note how I'm doing 1/10th testing of armhf, this may be a stopgap for making sure slow, limited-quantity devices don't get completely broken.
    • HW testing should be max 1 spurious MR test failure per week, if it's flakier than that then that HW is not ready.
    • If a test lab goes down and all those jobs get stuck pending, you're encouraged to push through a MR to add '.' to those HW jobs to disable them until the lab owner can recover and document what went wrong and how it will be prevented in the future.
  • Debug salt-minion on db410c so we can turn that db410c wiki page into data.
  • Deduplicate cross container's VK-GL-CTS build with llvmpipe CI's
  • @flto is there some nice small, stable a2xx we should be including?
Edited by Emma Anholt

Merge request reports