bare-metal.rst 9.32 KB
Newer Older
1
2
Bare-metal CI
=============
3

Erik Faye-Lund's avatar
Erik Faye-Lund committed
4
The bare-metal scripts run on a system with gitlab-runner and Docker,
5
connected to potentially multiple bare-metal boards that run tests of
6
Mesa.  Currently "fastboot", "ChromeOS Servo", and POE-powered devices are
7
supported.
8
9

In comparison with LAVA, this doesn't involve maintaining a separate
10
web service with its own job scheduler and replicating jobs between the
Erik Faye-Lund's avatar
Erik Faye-Lund committed
11
two.  It also places more of the board support in Git, instead of
12
web service configuration.  On the other hand, the serial interactions
13
and bootloader support are more primitive.
14

15
16
Requirements (fastboot)
-----------------------
17
18
19
20
21

This testing requires power control of the DUTs by the gitlab-runner
machine, since this is what we use to reset the system and get back to
a pristine state at the start of testing.

22
23
We require access to the console output from the gitlab-runner system,
since that is how we get the final results back from the tests.  You
24
25
26
27
should probably have the console on a serial connection, so that you
can see bootloader progress.

The boards need to be able to have a kernel/initramfs supplied by the
28
29
gitlab-runner system, since Mesa often needs to update the kernel either for new
DRM functionality, or to fix kernel bugs.
30

31
32
33
34
35
36
37
38
The boards must have networking, so that we can extract the dEQP .xml results to
artifacts on GitLab, and so that we can download traces (too large for an
initramfs) for trace replay testing.  Given that we need networking already, and
our deqp/piglit/etc. payload is large, we use nfs from the x86 runner system
rather than initramfs.

See `src/freedreno/ci/gitlab-ci.yml` for an example of fastboot on DB410c and
DB820c (freedreno-a306 and freereno-a530).
39

40
41
Requirements (servo)
--------------------
42
43
44
45
46
47
48
49
50

For servo-connected boards, we can use the EC connection for power
control to reboot the board.  However, loading a kernel is not as easy
as fastboot, so we assume your bootloader can do TFTP, and that your
gitlab-runner mounts the runner's tftp directory specific to the board
at /tftp in the container.

Since we're going the TFTP route, we also use NFS root.  This avoids
packing the rootfs and sending it to the board as a ramdisk, which
Andres Gomez's avatar
Andres Gomez committed
51
52
means we can support larger rootfses (for piglit testing), at the cost
of needing more storage on the runner.
53
54
55
56

Telling the board about where its TFTP and NFS should come from is
done using dnsmasq on the runner host.  For example, this snippet in
the dnsmasq.conf.d in the google farm, with the gitlab-runner host we
57
call "servo"::
58

59
  dhcp-host=1c:69:7a:0d:a3:d3,10.42.0.10,set:servo
60

61
62
63
64
  # Fixed dhcp addresses for my sanity, and setting a tag for
  # specializing other DHCP options
  dhcp-host=a0:ce:c8:c8:d9:5d,10.42.0.11,set:cheza1
  dhcp-host=a0:ce:c8:c8:d8:81,10.42.0.12,set:cheza2
65

66
67
68
69
70
71
72
  # Specify the next server, watch out for the double ',,'.  The
  # filename didn't seem to get picked up by the bootloader, so we use
  # tftp-unique-root and mount directories like
  # /srv/tftp/10.42.0.11/jwerner/cheza as /tftp in the job containers.
  tftp-unique-root
  dhcp-boot=tag:cheza1,cheza1/vmlinuz,,10.42.0.10
  dhcp-boot=tag:cheza2,cheza2/vmlinuz,,10.42.0.10
73

74
75
  dhcp-option=tag:cheza1,option:root-path,/srv/nfs/cheza1
  dhcp-option=tag:cheza2,option:root-path,/srv/nfs/cheza2
76

77
78
79
See `src/freedreno/ci/gitlab-ci.yml` for an example of servo on cheza.  Note
that other servo boards in CI are managed using LAVA.

80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
Requirements (POE)
------------------

For boards with 30W or less power consumption, POE can be used for the power
control.  The parts list ends up looking something like (for example):

- x86-64 gitlab-runner machine with a mid-range CPU, and 3+ GB of SSD storage
  per board.  This can host at least 15 boards in our experience.
- Cisco 2960S gigabit ethernet switch with POE. (Cisco 3750G, 3560G, or 2960G
  were also recommended as reasonable-priced HW, but make sure the name ends in
  G, X, or S)
- POE splitters to power the boards (you can find ones that go to micro USB,
  USBC, and 5V barrel jacks at least)
- USB serial cables (Adafruit sells pretty reliable ones)
- A large powered USB hub for all the serial cables
- A pile of ethernet cables

You'll talk to the Cisco for configuration using its USB port, which provides a
serial terminal at 9600 baud.  You need to enable SNMP control, which we'll do
using a "mesaci" community name that the gitlab runner can access as its
authentication (no password) to configure.  To talk to the SNMP on the router,
you need to put an ip address on the default vlan (vlan 1).

Setting that up looks something like:

.. code-block: console

  Switch>
  Password:
  Switch#configure terminal
  Switch(config)#interface Vlan 1
  Switch(config-if)#ip address 10.42.0.2 255.255.0.0
  Switch(config-if)#end
  Switch(config)#snmp-server community mesaci RW
  Switch(config)#end
  Switch#copy running-config startup-config

With that set up, you should be able to power on/off a port with something like:

.. code-block: console

  % snmpset -v2c -r 3 -t 30 -cmesaci 10.42.0.2 1.3.6.1.4.1.9.9.402.1.2.1.1.1.1 i 1
  % snmpset -v2c -r 3 -t 30 -cmesaci 10.42.0.2 1.3.6.1.4.1.9.9.402.1.2.1.1.1.1 i 4

Note that the "1.3.6..." SNMP OID changes between switches.  The last digit
above is the interface id (port number).  You can probably find the right OID by
google, that was easier than figuring it out from finding the switch's MIB
database.  You can query the POE status from the switch serial using the `show
power inline` command.

Other than that, find the dnsmasq/tftp/nfs setup for your boards "servo" above.

See `src/broadcom/ci/gitlab-ci.yml` and `src/nouveau/ci/gitlab-ci.yml` for an
examples of POE for Raspberry Pi 3/4, and Jetson Nano.

135
136
Setup
-----
137

138
139
Each board will be registered in freedesktop.org GitLab.  You'll want
something like this to register a fastboot board:
140

141
142
143
144
145
146
147
148
149
150
151
152
153
.. code-block:: console

  sudo gitlab-runner register \
       --url https://gitlab.freedesktop.org \
       --registration-token $1 \
       --name MY_BOARD_NAME \
       --tag-list MY_BOARD_TAG \
       --executor docker \
       --docker-image "alpine:latest" \
       --docker-volumes "/dev:/dev" \
       --docker-network-mode "host" \
       --docker-privileged \
       --non-interactive
154

155
156
157
For a servo board, you'll need to also volume mount the board's NFS
root dir at /nfs and TFTP kernel directory at /tftp.

158
159
The registration token has to come from a freedesktop.org GitLab admin
going to https://gitlab.freedesktop.org/admin/runners
160

161
162
163
164
The name scheme for Google's lab is google-freedreno-boardname-n, and
our tag is something like google-freedreno-db410c.  The tag is what
identifies a board type so that board-specific jobs can be dispatched
into that pool.
165
166
167
168

We need privileged mode and the /dev bind mount in order to get at the
serial console and fastboot USB devices (--device arguments don't
apply to devices that show up after container start, which is the case
Eric Engestrom's avatar
Eric Engestrom committed
169
with fastboot, and the servo serial devices are actually links to
170
171
/dev/pts).  We use host network mode so that we can spin up a nginx
server to collect XML results for fastboot.
172
173

Once you've added your boards, you're going to need to add a little
174
175
more customization in ``/etc/gitlab-runner/config.toml``.  First, add
``concurrent = <number of boards>`` at the top ("we should have up to
176
this many jobs running managed by this gitlab-runner").  Then for each
177
board's runner, set ``limit = 1`` ("only 1 job served by this board at a
178
time").  Finally, add the board-specific environment variables
179
required by your bare-metal script, something like::
180

181
182
  [[runners]]
    name = "google-freedreno-db410c-1"
183
184
185
186
    environment = ["BM_SERIAL=/dev/ttyDB410c8", "BM_POWERUP=google-power-up.sh 8", "BM_FASTBOOT_SERIAL=15e9e390", "FDO_CI_CONCURRENT=4"]

The ``FDO_CI_CONCURRENT`` variable should be set to the number of CPU threads on
the board, which is used for auto-tuning of job parallelism.
187

188
189
Once you've updated your runners' configs, restart with ``sudo service
gitlab-runner restart``
190
191
192
193
194
195
196
197

Caching downloads
-----------------

To improve the runtime for downloading traces during traces job runs, you will
want a pass-through HTTP cache.  On your runner box, install nginx:

.. code-block:: console
Erik Faye-Lund's avatar
Erik Faye-Lund committed
198

199
200
201
202
  sudo apt install nginx libnginx-mod-http-lua

Add the server setup files:

203
.. literalinclude:: fdo-cache
204
   :name: /etc/nginx/sites-available/fdo-cache
205
   :caption: /etc/nginx/sites-available/fdo-cache
206

207
208
209
.. literalinclude:: uri-caching.conf
   :name: /etc/nginx/snippets/uri-caching.conf
   :caption: /etc/nginx/snippets/uri-caching.conf
210
211
212
213
214
215
216

Edit the listener addresses in fdo-cache to suit the ethernet interface that
your devices are on.

Enable the site and restart nginx:

.. code-block:: console
Erik Faye-Lund's avatar
Erik Faye-Lund committed
217

218
219
220
221
222
223
224
225
226
227
228
229
230
231
  sudo ln -s /etc/nginx/sites-available/fdo-cache /etc/nginx/sites-enabled/fdo-cache
  sudo service nginx restart

  # First download will hit the internet
  wget http://localhost/cache/?uri=https://minio-packet.freedesktop.org/mesa-tracie-public/itoral-gl-terrain-demo/demo.trace
  # Second download should be cached.
  wget http://localhost/cache/?uri=https://minio-packet.freedesktop.org/mesa-tracie-public/itoral-gl-terrain-demo/demo.trace

Now, set ``download-url`` in your ``traces-*.yml`` entry to something like
``http://10.42.0.1:8888/cache/?uri=https://minio-packet.freedesktop.org/mesa-tracie-public``
and you should have cached downloads for traces.  Add it to
``FDO_HTTP_CACHE_URI=`` in your ``config.toml`` runner environment lines and you
can use it for cached artifact downloads instead of going all the way to
freedesktop.org on each job.