Skip to content

Improve wifi caching by a sticky set of key APs and a dynamic dBm bin width

Teemu Ikonen requested to merge tpikonen/geoclue:wifi-key-hysteresis into master

Another approach to improving wifi cache performance. A stack of patches, but the relevant changes are in the last two commits.

First, implement hysteresis to the set of APs used in cache keys. Add an AP to the set if it has signal above CACHE_KEY_INCLUDE_DBM (currently -70), but remove only if it drops below CACHE_KEY_EXCLUDE_DBM (currently -120, so APs are dropped only when they disappear).

Second, adjust the quantization factor, i.e. dbm_bin_width for signal values in cache keys as a function of the AP set size.

EDIT: Let's consider a stationary receiver which sees a (more or less) fixed set of APs. The signal received from an AP in each scan varies randomly, but has some mean and variance. We get a cache hit when the signals in the AP set are close to the signals in a previously performed query. If a scan is not close to an existing cached one, we perform a new query and cache it.

Two signals R_1 and R_2 are considered to be close when they fall in to the same signal bin, that is R_1/w = R_2/w, where w is the bin width, R_1, R_2, and w are integers and integer division is used.

When caching, we use the set (b_1, b_2, ..., b_n) of bin numbers b_n = R_n/w as a key (paired with the MACs of the APs, but that's not relevant now) and store a location.

Assume that the AP signals have 99% probability of falling into a range with a width s around their mean and that this range is equal for all APs (note: I haven't verified this equality of signal variances, but YOLO). In the code this width is called CACHE_WIFI_SIGNAL_VARIANCE.

When we run scans for long enough, the APs fill all the bins in range [M_n - s/2, M_n + s/2], where M_n is the mean signal from AP n . The cache will then contain C elements (in the code this is CACHE_TARGET_SIZE).

The probability of a cache hit at any moment is proportional to 1/C, but the 'constant' of proportionality goes up as the cache is filled.

The number of APs visible is of course dependent on location. It is very large on city centers and smaller on suburban and rural regions. When the number of observed APs is large, the effect of a single AP signal value on location becomes smaller. When there are just few APs, the signal values are more important, although triangulation with wifi signal values does not seem to work too well.

Independently of the number of visible APs, we want the cache hit probability to stay constant, which means a constant C.

There are two ways of limiting the number of cache elements occupied. One is to limit the number N of (AP, signal) pairs used as cache keys, an approach I tried in !113 (closed). This has the problem that detecting movement depends on signals of the chosen APs changing deterministically. In practice the signal strength seems to be dominated by random variation.

The second way is to dynamically change the definition of signal closeness (or bin width) as a function of the number of visible (or eligible) APs, which is what this MR is about.

The number of cache elements occupied after a 'long' time (the cache is 'full' so that a cache miss is unlikely) is

C = B^N

where B is the number of signal bins which a single AP has occupied with high (99%) probability

With an AP signal range s and bin width w, we have

B = s/w + 1

so that

C = (s/w + 1)^N

Solving for w gets us

w(N) = \frac{s}{\exp(\log(C)/N) - 1}

which after rounding is the definition of dbm_bin_width used in this MR.

Edited by Teemu Ikonen

Merge request reports