Cluster improvements because of 16.0

I quickly wrote this on #freedesktop, but I think I should probably write it down here and request for comments.

Cc: @daniels @mupuf @emersion @anholt (no commitment required, but just some thoughts would be nice 😄 )

WHY?

IMO, we are having some technical debt on the current cluster, and given that gitlab 16.0 is almost out, we will need to have downtime to migrate postgreSQL. PostgresQL 12 is indeed being deprecated, and so we need to upgrade, which is an offline operation.

So I was thinking, why not think bigger, and try to solve the current little issues we are having at the moment (in no particular order, just as it comes out of my mind):

no ipv6 support (see #209 (closed))
heavy (too heavy?) use of wireguard in userspace which probably makes the ceph process kill each other because they are not fast enough
last time, Equinix asked us to create more servers on DC, like the runners. The current NY data center is under pressure, so migrating to a less pressured data center might help us
we are still using the hybrid mode gitlab registry, and host all the files on GCP, which induces quite some monthly costs
we are running k8s 1.24, e-o-l 2023-07-28
we are running everything on debian 11, which gets a little bit outdated (no cgroupv2, no podman by default)
many deployed components need to be upgraded: cert-manager, nginx, Equinix CCM, Kube-vip, Kilo, etc..

IPv6

As mentioned in #209 (closed), we can not migrate the existing cluster to IPv6. We need to recreate one.

It's not super urgent per se, but we will have to do it eventually IMO.

Wireguard

Encrypting everything is nice, but I actually wonder if it's required because we are currently running in a dedicated facility, so in theory, the runners can't snoop the traffic.

Also, we are currently working with the now deprecated userspace wg encryption, which needs to be switched to kernel encryption. The problem is that this migration is simple enough, but requires to put the cluster down entirely and start first all of the control planes, before doing the same for the agents.

See https://docs.k3s.io/installation/network-options#migrating-from-wireguard-or-ipsec-to-wireguard-native

Datacenter migration

The problem of migrating data center, is that we need to update the DNS because the public IP we use is bound to the NY datacenter. This is more of a political problem than a technical one, but some DNS holders are less likely to update their entry than others (think NetworkManager).

But if we add IPv6, we need to update the main DNS, so maybe that's a good excuse to change it.

While on this topic, I was wondering if we should not request 4 public IPs instead of 2.

We currently have:

one IP for gitlab and gitlab-pages
one IP for the rest

But the rest is growing quite a lot, with network heavy tasks, like s3 and registry.

So maybe we should have:

one IP for gitlab and gitlab-pages
one IP for s3.fd.o
one IP for registry.fd.o
one IP for the rest

hybrid mode gitlab registry

Few weeks (months?) ago, I've manually migrated to use the new registry backed by the postgres db all of the projects that are under a group. This duplicated some of the images on GCP, but that was not a lot (something like 1 TB IIRC).

So all groups now have online GC 🎉

The problem is that we currently have quite a few users still relying on the old gitlab installation, and everything is stored on GCP.

I'd like to:

migrate the currently registry migrated data from GCP to equinix
drop all the other registries, telling users to rebuild them if they need

Theoretically, everybody should be using ci-templates, so the user registry is not a big issue. Unless it is, but maybe with enough notifications, we can pre-migrate the users who request it, and/or get back the data after the fact, by still keeping a registry-gcp.fd.o that points at the GCP data. Users can just fetch data from that server and upload the tag to the new registry.

k8s 1.24

https://docs.gitlab.com/charts/installation/cloud/ says that k8s supported version is now 1.25, so we can leverage that. I don't think there are much breaking changes in 1.26, so we could switch to 1.26 instead.

Having a separate cluster and new IPs would allow for testing everything, before actually migrating everything

debian 11

I have been experiencing with Fedora CoreOS last week. I can manage to get the stable channel on local machines and on Equinix Metal servers.

The advantages are that it can automatically update itself, and it's an immutable distribution with a recent kernel and recent userspace (podman, docker, systemd...)

various updates

Having a new cluster makes testing way easier, as we can slowly migrate the workloads and update them one by one.

HOW?

IPv6

https://docs.k3s.io/installation/network-options#dual-stack-ipv4--ipv6-networking

But testss are required

Wireguard drop

My idea as of today is to use equinix hybrid bond. Given that we might have both gitlab k8s and runners in the same datacenter, some bad agent could snoop on the cluster data if it's not encrypted.

But if we add a new layer 2 vlan for in-cluster direct communication, we might be able to segregate the 2 types of servers.

My idea was to use --flannel-backend=vxlan, so that might create a new vxlan on top of the layer 2 vlan... not sure if this is too much or not and if it's working at all.

k8s 1.24

just deploy k3s 1.26 on the new cluster and test

Datacenter migration

This one is going to be the hard part once again.

S3 migration

IMO we can leverage https://rook.io/docs/rook/v1.10/Storage-Configuration/Object-Storage-RGW/ceph-object-multisite/

So basically we:

create a new ceph cluster on the new cluster.
mark the current S3 storage as multisite ready
Create the few S3 storage we use in the new cluster as a new zone in the multisite cluster
rince, wash
both storage should be magically in sync
on D-day, we mark the mirror as main, and drop the old one

gitaly migration

Manual, like always:

create new gitaly pods on the new cluster
register all the gitaly nodes in both clusters
migrate the repos through a rake task
monitor it so it doesn't break

indico migration

downtime required, like last time. But shouldn't be too long given that the amount of data is pretty limited.

connect the 2 clusters

Last time I was using a manual connection based on ip routes and wireguard.

I'd like to leverage submariner to automatically declare the various services between clusters (because manually doing it sucks as some time it breaks).

Note: last time I did not managed to get submariner working. Maybe it was because of the firewall rules...

Fedora CoreOS as a base distro image

We are currently running Debian 11, and even if it works well, it's getting a little bit outdated.

We can install Fedora CoreOS via IPXE on Equinix Metal. I am currently experiencing the thing at https://gitlab.freedesktop.org/bentiss/helm-gitlab-config/.

There is some more work to do, but the deployment can be automated for quite a lot in a slightly cleaner way than having a generated cloud-init through a python script.

It is however longer to bootstrap because we need at least 3 reboots: ipxe boot on CoreOS live to be able to install, then switch to the installed version, then reboot after layering the packages (or we can try the live layering, but the reboot is way safer).

The automatic updates should probably be deactivated, or carefully tweak to not happen at the same time. But it could be interesting to have a monthly automatic reboot of all servers if it works.

On the plus side, we could also have SELinux on both the runners and the K8s servers, which might prevent sandbox escape too.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information