Skip to content

DBusHash: Recalculate bucket used if the table is rebuilt

Simon McVittie requested to merge smcv/dbus:hash-assertion into master

Hash buckets are simply entries in an array owned by the hash table, so every time the hash table's array of buckets is reallocated, we must invalidate all pointers to buckets and recalculate them to point into the new array of buckets. This was not always done. Luckily, we appear to have avoided causing any actual memory corruption like this.

The only place where we reallocate the array of buckets is in rebuild_table(), which is only called by add_allocated_entry(), which is only called by add_entry(), which is only called by find_generic_function() when create_if_not_found is true. find_generic_function(), in turn, is only called by the table->find_function() implementations.

The table->find_function() implementations have an optional "out" parameter which returns a pointer to the hash bucket in which the returned entry would be found. It is set in find_generic_function() for existing entries, or in add_allocated_entry() if a new entry is created; after that it is returned through callers unchanged until the caller of table->find_function() is reached. The only callers that make use of the "out" parameter in practice are _dbus_hash_iter_lookup(), to populate a DBusHashIter, and the _dbus_hash_table_remove_TYPE() family, to pass it to remove_entry().

We can ignore the _dbus_hash_table_remove_TYPE() family for two reasons: they call the find function with create_if_not_found set to FALSE, which never reallocates the hash table, and they do not store the pointer to the bucket in the long-term. So we only need to consider _dbus_hash_iter_lookup().

It is documented to be unsafe to add hash entries while a DBusHashIter is open, and only adding a hash entry can trigger rebuild_table(); so we can assume that if _dbus_hash_iter_lookup() returns a valid bucket, it remains valid forever.

The remaining case that must be considered is whether reallocation can occur after setting the "out" parameter for the bucket, but before returning it to _dbus_hash_iter_lookup(). We can see that it can: we call rebuild_table() after recalculating the correct bucket. If we do, and it actually causes a rebuild, then we must recalculate the bucket accordingly.

Looking at the worst-case impact of this bug, if it is going to cause any problem, it would only be when _dbus_hash_iter_lookup() is called with create_if_not_found set true. This makes three uses of the bucket: it stores it in the DBusHashTableIter, it calculates the next bucket by finding the offset of the bucket in table->buckets and advancing by one pointer, and it makes an assertion that should be tautologous, enforcing that the next bucket corresponds to what it should.

When running under the AddressSanitizer, which makes allocations in widely spaced regions of memory, on a 32-bit platform, we could (and indeed do) find that the tautologous assertion fails. The current bucket returned from the "out" parameter is a pointer into the old value of table->buckets. If it's far enough before or after the new table->buckets in the address space, then the offset in next_bucket could overflow a 32-bit integer, resulting in the assertion no longer being true.

The second commit of this branch adds extra assertions, which reproduce the bug even without AddressSanitizer.

In production code without assertions, the impact is that the ->bucket and ->next_bucket members of the DBusHashIter can be invalid. They are used in _dbus_hash_iter_next() and _dbus_hash_iter_remove_entry(). However, the only callers of _dbus_hash_iter_lookup() outside test code are in bus/containers.c, and neither calls either of those functions, so we dodge that bullet.

Edited by Simon McVittie

Merge request reports