Skip to content

platform: avoid routes resync for routes that we don't track

Íñigo Huguet requested to merge ih/rt6_replace_resync into main

Summary

NetworkManager was doing complete IPv6 routes cache resyncs if an external program did ip route change for any route, even if that route is not tracked by NM. This was leading to 100% CPU usage from NM if that route updates were frequent, like some routing daemons do. Fix that.

Purpose

When we recibe a Netlink message with a "route change" event, normally we just ignore it if it's a route that we don't track (i.e. because of the route protocol).

However, it's not that easy if it has the NLM_F_REPLACE flag because that means that it might be replacing another route. If the kernel has similar routes which are candidates for the replacement, it's hard for NM to guess which one of those is being replaced (as the kernel doesn't have a "route ID" or similar field to indicate it). Moreover, the kernel might choose to replace a route that we don't have on cache, so we know nothing about it.

It is important to note that we cannot just discard Netlink messages of routes that we don't track if they has the NLM_F_REPLACE. For example, if we are tracking a route with proto=static, we might receive a replace message, changing that route to proto=other_proto_that_we_dont_track. We need to process that message and remove the route from our cache.

As NM doesn't know what route is being replaced, trying to guess will lead to errors that will leave the cache in an inconsistent state. Because of that, it just do a cache resync for the routes.

For IPv4 there was an optimization to this: if we don't have in the cache any route candidate for the replacement there are only 2 possible options: either add the new route to the cache or discard it if we are not interested on it. We don't need a resync for that.

This commit is extending that optimization to IPv6 routes. There is no reason why it shouldn't work in the same way than with IPv4. This optimization will only work well as long as we find potential candidate routes in the same way than the kernel (comparing the same fields). NM calls to this "comparing by WEAK_ID". But this can also happen with IPv4 routes.

It is worth it to enable this optimization because there are routing daemons using custom routing protocols that makes tens or hundreds of updates per second. If they use NLM_F_REPLACE, this caused NM to do a resync hundreds of times per second leading to a 100% CPU usage: https://issues.redhat.com/browse/RHEL-26195

An additional but smaller optimization is done in this commit: if we receive a route message for routes that we don't track AND doesn't have the NLM_F_REPLACE flag, we can ignore the entire message, thus avoiding the memory allocation of the nmp_object. That nmp_object was going to be ignored later, anyway, so better to avoid these allocations that, with the routing daemon of the above's example, can happen hundreds of times per second.

With this changes, the CPU usage doing ip route replace 300 times/s drops from 100% to 1%. Doing ip route replace as fast as possible, without any rate limitting, still keeps NM with a 3% CPU usage in the system that I have used to test.

Resolves: https://issues.redhat.com/browse/RHEL-26195

Checklist

Please read https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/blob/main/CONTRIBUTING.md before opening the merge request. In particular, check that:

  • the subject for all commits is concise and explicative
  • the message for all commits explains the reason for the change
  • the source is properly formatted
  • any relevant documentation is up to date
  • you have added unit tests if applicable
  • the NEWS file is updated when the change deserves to be mentioned, for example for new features, behavior changes, API deprecations, etc.
Edited by Íñigo Huguet

Merge request reports