Skip to content
Snippets Groups Projects
  1. Jan 29, 2023
    • Davide Zini's avatar
      block, bfq: inject I/O to underutilized actuators · 2d31c684
      Davide Zini authored
      
      The main service scheme of BFQ for sync I/O is serving one sync
      bfq_queue at a time, for a while. In particular, BFQ enforces this
      scheme when it deems the latter necessary to boost throughput or
      to preserve service guarantees. Unfortunately, when BFQ enforces
      this policy, only one actuator at a time gets served for a while,
      because each bfq_queue contains I/O only for one actuator. The
      other actuators may remain underutilized.
      
      Actually, BFQ may serve (inject) extra I/O, taken from other
      bfq_queues, in parallel with that of the in-service queue. This
      injection mechanism may provide the ground for dealing also with
      the above actuator-underutilization problem. Yet BFQ does not take
      the actuator load into account when choosing which queue to pick
      extra I/O from. In addition, BFQ may happen to inject extra I/O
      only when the in-service queue is temporarily empty.
      
      In view of these facts, this commit extends the
      injection mechanism in such a way that the latter:
      (1) takes into account also the actuator load;
      (2) checks such a load on each dispatch, and injects I/O for an
          underutilized actuator, if there is one and there is I/O for it.
      
      To perform the check in (2), this commit introduces a load
      threshold, currently set to 4.  A linear scan of each actuator is
      performed, until an actuator is found for which the following two
      conditions hold: the load of the actuator is below the threshold,
      and there is at least one non-in-service queue that contains I/O
      for that actuator. If such a pair (actuator, queue) is found, then
      the head request of that queue is returned for dispatch, instead
      of the head request of the in-service queue.
      
      We have set the threshold, empirically, to the minimum possible
      value for which an actuator is fully utilized, or close to be
      fully utilized. By doing so, injected I/O 'steals' as few
      drive-queue slots as possibile to the in-service queue. This
      reduces as much as possible the probability that the service of
      I/O from the in-service bfq_queue gets delayed because of slot
      exhaustion, i.e., because all the slots of the drive queue are
      filled with I/O injected from other queues (NCQ provides for 32
      slots).
      
      This new mechanism also counters actuator underutilization in the
      case of asymmetric configurations of bfq_queues. Namely if there
      are few bfq_queues containing I/O for some actuators and many
      bfq_queues containing I/O for other actuators. Or if the
      bfq_queues containing I/O for some actuators have lower weights
      than the other bfq_queues.
      
      Reviewed-by: default avatarDamien Le Moal <damien.lemoal@opensource.wdc.com>
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarDavide Zini <davidezini2@gmail.com>
      Link: https://lore.kernel.org/r/20230103145503.71712-8-paolo.valente@linaro.org
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      2d31c684
  2. Dec 15, 2022
  3. Nov 02, 2022
  4. Nov 01, 2022
  5. Aug 22, 2022
  6. Jun 27, 2022
  7. Feb 18, 2022
  8. Feb 17, 2022
  9. Aug 18, 2021
  10. Mar 25, 2021
  11. Jan 25, 2021
  12. Aug 18, 2020
    • Dmitry Monakhov's avatar
      bfq: fix blkio cgroup leakage v4 · 2de791ab
      Dmitry Monakhov authored
      
      Changes from v1:
          - update commit description with proper ref-accounting justification
      
      commit db37a34c ("block, bfq: get a ref to a group when adding it to a service tree")
      introduce leak forbfq_group and blkcg_gq objects because of get/put
      imbalance.
      In fact whole idea of original commit is wrong because bfq_group entity
      can not dissapear under us because it is referenced by child bfq_queue's
      entities from here:
       -> bfq_init_entity()
          ->bfqg_and_blkg_get(bfqg);
          ->entity->parent = bfqg->my_entity
      
       -> bfq_put_queue(bfqq)
          FINAL_PUT
          ->bfqg_and_blkg_put(bfqq_group(bfqq))
          ->kmem_cache_free(bfq_pool, bfqq);
      
      So parent entity can not disappear while child entity is in tree,
      and child entities already has proper protection.
      This patch revert commit db37a34c ("block, bfq: get a ref to a group when adding it to a service tree")
      
      bfq_group leak trace caused by bad commit:
      -> blkg_alloc
         -> bfq_pq_alloc
           -> bfqg_get (+1)
      ->bfq_activate_bfqq
        ->bfq_activate_requeue_entity
          -> __bfq_activate_entity
             ->bfq_get_entity
               ->bfqg_and_blkg_get (+1)  <==== : Note1
      ->bfq_del_bfqq_busy
        ->bfq_deactivate_entity+0x53/0xc0 [bfq]
          ->__bfq_deactivate_entity+0x1b8/0x210 [bfq]
            -> bfq_forget_entity(is_in_service = true)
      	 entity->on_st_or_in_serv = false   <=== :Note2
      	 if (is_in_service)
      	     return;  ==> do not touch reference
      -> blkcg_css_offline
       -> blkcg_destroy_blkgs
        -> blkg_destroy
         -> bfq_pd_offline
          -> __bfq_deactivate_entity
               if (!entity->on_st_or_in_serv) /* true, because (Note2)
      		return false;
       -> bfq_pd_free
          -> bfqg_put() (-1, byt bfqg->ref == 2) because of (Note2)
      So bfq_group and blkcg_gq  will leak forever, see test-case below.
      
      ##TESTCASE_BEGIN:
      #!/bin/bash
      
      max_iters=${1:-100}
      #prep cgroup mounts
      mount -t tmpfs cgroup_root /sys/fs/cgroup
      mkdir /sys/fs/cgroup/blkio
      mount -t cgroup -o blkio none /sys/fs/cgroup/blkio
      
      # Prepare blkdev
      grep blkio /proc/cgroups
      truncate -s 1M img
      losetup /dev/loop0 img
      echo bfq > /sys/block/loop0/queue/scheduler
      
      grep blkio /proc/cgroups
      for ((i=0;i<max_iters;i++))
      do
          mkdir -p /sys/fs/cgroup/blkio/a
          echo 0 > /sys/fs/cgroup/blkio/a/cgroup.procs
          dd if=/dev/loop0 bs=4k count=1 of=/dev/null iflag=direct 2> /dev/null
          echo 0 > /sys/fs/cgroup/blkio/cgroup.procs
          rmdir /sys/fs/cgroup/blkio/a
          grep blkio /proc/cgroups
      done
      ##TESTCASE_END:
      
      Fixes: db37a34c ("block, bfq: get a ref to a group when adding it to a service tree")
      Tested-by: default avatarOleksandr Natalenko <oleksandr@natalenko.name>
      Signed-off-by: default avatarDmitry Monakhov <dmtrmonakhov@yandex-team.ru>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      2de791ab
  13. Feb 03, 2020
  14. Jan 22, 2020
  15. Sep 06, 2019
  16. Apr 30, 2019
  17. Apr 10, 2019
    • Paolo Valente's avatar
      block, bfq: fix use after free in bfq_bfqq_expire · eed47d19
      Paolo Valente authored
      
      The function bfq_bfqq_expire() invokes the function
      __bfq_bfqq_expire(), and the latter may free the in-service bfq-queue.
      If this happens, then no other instruction of bfq_bfqq_expire() must
      be executed, or a use-after-free will occur.
      
      Basing on the assumption that __bfq_bfqq_expire() invokes
      bfq_put_queue() on the in-service bfq-queue exactly once, the queue is
      assumed to be freed if its refcounter is equal to one right before
      invoking __bfq_bfqq_expire().
      
      But, since commit 9dee8b3b ("block, bfq: fix queue removal from
      weights tree") this assumption is false. __bfq_bfqq_expire() may also
      invoke bfq_weights_tree_remove() and, since commit 9dee8b3b
      ("block, bfq: fix queue removal from weights tree"), also
      the latter function may invoke bfq_put_queue(). So __bfq_bfqq_expire()
      may invoke bfq_put_queue() twice, and this is the actual case where
      the in-service queue may happen to be freed.
      
      To address this issue, this commit moves the check on the refcounter
      of the queue right around the last bfq_put_queue() that may be invoked
      on the queue.
      
      Fixes: 9dee8b3b ("block, bfq: fix queue removal from weights tree")
      Reported-by: default avatarDmitrii Tcvetkov <demfloro@demfloro.ru>
      Reported-by: default avatarDouglas Anderson <dianders@chromium.org>
      Tested-by: default avatarDmitrii Tcvetkov <demfloro@demfloro.ru>
      Tested-by: default avatarDouglas Anderson <dianders@chromium.org>
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      eed47d19
  18. Apr 08, 2019
  19. Apr 01, 2019
    • Paolo Valente's avatar
      block, bfq: do not idle for lowest-weight queues · fb53ac6c
      Paolo Valente authored
      
      In most cases, it is detrimental for throughput to plug I/O dispatch
      when the in-service bfq_queue becomes temporarily empty (plugging is
      performed to wait for the possible arrival, soon, of new I/O from the
      in-service queue). There is however a case where plugging is needed
      for service guarantees. If a bfq_queue, say Q, has a higher weight
      than some other active bfq_queue, and is sync, i.e., contains sync
      I/O, then, to guarantee that Q does receive a higher share of the
      throughput than other lower-weight queues, it is necessary to plug I/O
      dispatch when Q remains temporarily empty while being served.
      
      For this reason, BFQ performs I/O plugging when some active bfq_queue
      has a higher weight than some other active bfq_queue. But this is
      unnecessarily overkill. In fact, if the in-service bfq_queue actually
      has a weight lower than or equal to the other queues, then the queue
      *must not* be guaranteed a higher share of the throughput than the
      other queues. So, not plugging I/O cannot cause any harm to the
      queue. And can boost throughput.
      
      Taking advantage of this fact, this commit does not plug I/O for sync
      bfq_queues with a weight lower than or equal to the weights of the
      other queues. Here is an example of the resulting throughput boost
      with the dbench workload, which is particularly nasty for BFQ. With
      the dbench test in the Phoronix suite, BFQ reaches its lowest total
      throughput with 6 clients on a filesystem with journaling, in case the
      journaling daemon has a higher weight than normal processes. Before
      this commit, the total throughput was ~80 MB/sec on a PLEXTOR PX-256M5,
      after this commit it is ~100 MB/sec.
      
      Tested-by: default avatarHolger Hoffstätte <holger@applied-asynchrony.com>
      Tested-by: default avatarOleksandr Natalenko <oleksandr@natalenko.name>
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      fb53ac6c
    • Konstantin Khlebnikov's avatar
      block/bfq: fix ifdef for CONFIG_BFQ_GROUP_IOSCHED=y · 42b1bd33
      Konstantin Khlebnikov authored
      
      Replace BFQ_GROUP_IOSCHED_ENABLED with CONFIG_BFQ_GROUP_IOSCHED.
      Code under these ifdefs never worked, something might be broken.
      
      Fixes: 0471559c ("block, bfq: add/remove entity weights correctly")
      Fixes: 73d58118 ("block, bfq: consider also ioprio classes in symmetry detection")
      Reviewed-by: default avatarHolger Hoffstätte <holger@applied-asynchrony.com>
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      42b1bd33
  20. Jan 31, 2019
    • Paolo Valente's avatar
      block, bfq: fix queue removal from weights tree · 9dee8b3b
      Paolo Valente authored
      
      bfq maintains an ordered list, through a red-black tree, of unique
      weights of active bfq_queues. This list is used to detect whether there
      are active queues with differentiated weights. The weight of a queue is
      removed from the list when both the following two conditions become
      true:
      
      (1) the bfq_queue is flagged as inactive
      (2) the has no in-flight request any longer;
      
      Unfortunately, in the rare cases where condition (2) becomes true before
      condition (1), the removal fails, because the function to remove the
      weight of the queue (bfq_weights_tree_remove) is rightly invoked in the
      path that deactivates the bfq_queue, but mistakenly invoked *before* the
      function that actually performs the deactivation (bfq_deactivate_bfqq).
      
      This commits moves the invocation of bfq_weights_tree_remove for
      condition (1) to after bfq_deactivate_bfqq. As a consequence of this
      move, it is necessary to add a further reference to the queue when the
      weight of a queue is added, because the queue might otherwise be freed
      before bfq_weights_tree_remove is invoked. This commit adds this
      reference and makes all related modifications.
      
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      9dee8b3b
    • Paolo Valente's avatar
      block, bfq: consider also ioprio classes in symmetry detection · 73d58118
      Paolo Valente authored
      
      In asymmetric scenarios, i.e., when some bfq_queue or bfq_group needs to
      be guaranteed a different bandwidth than other bfq_queues or bfq_groups,
      these service guaranteed can be provided only by plugging I/O dispatch,
      completely or partially, when the queue in service remains temporarily
      empty. A case where asymmetry is particularly strong is when some active
      bfq_queues belong to a higher-priority class than some other active
      bfq_queues. Unfortunately, this important case is not considered at all
      in the code for detecting asymmetric scenarios. This commit adds the
      missing logic.
      
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      73d58118
  21. Jan 14, 2019
    • Paolo Valente's avatar
      block, bfq: fix comments on __bfq_deactivate_entity · 5bf85908
      Paolo Valente authored
      
      Comments on function __bfq_deactivate_entity contains two imprecise or
      wrong statements:
      1) The function performs the deactivation of the entity.
      2) The function must be invoked only if the entity is on a service tree.
      
      This commits replaces both statements with the correct ones:
      1) The functions updates sched_data and service trees for the entity,
      so as to represent entity as inactive (which is only part of the steps
      needed for the deactivation of the entity).
      2) The function must be invoked on every entity being deactivated.
      
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      5bf85908
  22. Dec 07, 2018
    • Paolo Valente's avatar
      block, bfq: fix decrement of num_active_groups · ba7aeae5
      Paolo Valente authored
      
      Since commit '2d29c9f8 ("block, bfq: improve asymmetric scenarios
      detection")', if there are process groups with I/O requests waiting for
      completion, then BFQ tags the scenario as 'asymmetric'. This detection
      is needed for preserving service guarantees (for details, see comments
      on the computation * of the variable asymmetric_scenario in the
      function bfq_better_to_idle).
      
      Unfortunately, commit '2d29c9f8 ("block, bfq: improve asymmetric
      scenarios detection")' contains an error exactly in the updating of
      the number of groups with I/O requests waiting for completion: if a
      group has more than one descendant process, then the above number of
      groups, which is renamed from num_active_groups to a more appropriate
      num_groups_with_pending_reqs by this commit, may happen to be wrongly
      decremented multiple times, namely every time one of the descendant
      processes gets all its pending I/O requests completed.
      
      A correct, complete solution should work as follows. Consider a group
      that is inactive, i.e., that has no descendant process with pending
      I/O inside BFQ queues. Then suppose that num_groups_with_pending_reqs
      is still accounting for this group, because the group still has some
      descendant process with some I/O request still in
      flight. num_groups_with_pending_reqs should be decremented when the
      in-flight request of the last descendant process is finally completed
      (assuming that nothing else has changed for the group in the meantime,
      in terms of composition of the group and active/inactive state of
      child groups and processes). To accomplish this, an additional
      pending-request counter must be added to entities, and must be
      updated correctly.
      
      To avoid this additional field and operations, this commit resorts to
      the following tradeoff between simplicity and accuracy: for an
      inactive group that is still counted in num_groups_with_pending_reqs,
      this commit decrements num_groups_with_pending_reqs when the first
      descendant process of the group remains with no request waiting for
      completion.
      
      This simplified scheme provides a fix to the unbalanced decrements
      introduced by 2d29c9f8. Since this error was also caused by lack
      of comments on this non-trivial issue, this commit also adds related
      comments.
      
      Fixes: 2d29c9f8 ("block, bfq: improve asymmetric scenarios detection")
      Reported-by: default avatarSteven Barrett <steven@liquorix.net>
      Tested-by: default avatarSteven Barrett <steven@liquorix.net>
      Tested-by: default avatarLucjan Lucjanov <lucjan.lucjanov@gmail.com>
      Reviewed-by: default avatarFederico Motta <federico@willer.it>
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      ba7aeae5
  23. Oct 25, 2018
    • Federico Motta's avatar
      block, bfq: fix asymmetric scenarios detection · 98fa7a3e
      Federico Motta authored
      
      Since commit 2d29c9f8 ("block, bfq: improve asymmetric scenarios
      detection"), a scenario is defined asymmetric when one of the
      following conditions holds:
      - active bfq_queues have different weights
      - one or more group of entities (bfq_queue or other groups of entities)
        are active
      bfq grants fairness and low latency also in such asymmetric scenarios,
      by plugging the dispatching of I/O if the bfq_queue in service happens
      to be temporarily idle. This plugging may lower throughput, so it is
      important to do it only when strictly needed.
      
      By mistake, in commit '2d29c9f8' ("block, bfq: improve asymmetric
      scenarios detection") the num_active_groups counter was firstly
      incremented and subsequently decremented at any entity (group or
      bfq_queue) weight change.
      
      This is useless, because only transitions from active to inactive and
      vice versa matter for that counter. Unfortunately this is also
      incorrect in the following case: the entity at issue is a bfq_queue
      and it is under weight raising. In fact in this case there is a
      spurious increment of the num_active_groups counter.
      
      This spurious increment may cause scenarios to be wrongly detected as
      asymmetric, thus causing useless plugging and loss of throughput.
      
      This commit fixes this issue by simply removing the above useless and
      wrong increments and decrements.
      
      Fixes: 2d29c9f8 ("block, bfq: improve asymmetric scenarios detection")
      Tested-by: default avatarOleksandr Natalenko <oleksandr@natalenko.name>
      Signed-off-by: default avatarFederico Motta <federico@willer.it>
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      98fa7a3e
  24. Oct 13, 2018
    • Federico Motta's avatar
      block, bfq: improve asymmetric scenarios detection · 2d29c9f8
      Federico Motta authored
      
      bfq defines as asymmetric a scenario where an active entity, say E
      (representing either a single bfq_queue or a group of other entities),
      has a higher weight than some other entities.  If the entity E does sync
      I/O in such a scenario, then bfq plugs the dispatch of the I/O of the
      other entities in the following situation: E is in service but
      temporarily has no pending I/O request.  In fact, without this plugging,
      all the times that E stops being temporarily idle, it may find the
      internal queues of the storage device already filled with an
      out-of-control number of extra requests, from other entities. So E may
      have to wait for the service of these extra requests, before finally
      having its own requests served. This may easily break service
      guarantees, with E getting less than its fair share of the device
      throughput.  Usually, the end result is that E gets the same fraction of
      the throughput as the other entities, instead of getting more, according
      to its higher weight.
      
      Yet there are two other more subtle cases where E, even if its weight is
      actually equal to or even lower than the weight of any other active
      entities, may get less than its fair share of the throughput in case the
      above I/O plugging is not performed:
      1. other entities issue larger requests than E;
      2. other entities contain more active child entities than E (or in
         general tend to have more backlog than E).
      
      In the first case, other entities may get more service than E because
      they get larger requests, than those of E, served during the temporary
      idle periods of E.  In the second case, other entities get more service
      because, by having many child entities, they have many requests ready
      for dispatching while E is temporarily idle.
      
      This commit addresses this issue by extending the definition of
      asymmetric scenario: a scenario is asymmetric when
      - active entities representing bfq_queues have differentiated weights,
        as in the original definition
      or (inclusive)
      - one or more entities representing groups of entities are active.
      
      This broader definition makes sure that I/O plugging will be performed
      in all the above cases, provided that there is at least one active
      group.  Of course, this definition is very coarse, so it will trigger
      I/O plugging also in cases where it is not needed, such as, e.g.,
      multiple active entities with just one child each, and all with the same
      I/O-request size.  The reason for this coarse definition is just that a
      finer-grained definition would be rather heavy to compute.
      
      On the opposite end, even this new definition does not trigger I/O
      plugging in all cases where there is no active group, and all bfq_queues
      have the same weight.  So, in these cases some unfairness may occur if
      there are asymmetries in I/O-request sizes.  We made this choice because
      I/O plugging may lower throughput, and probably a user that has not
      created any group cares more about throughput than about perfect
      fairness.  At any rate, as for possible applications that may care about
      service guarantees, bfq already guarantees a high responsiveness and a
      low latency to soft real-time applications automatically.
      
      Signed-off-by: default avatarFederico Motta <federico@willer.it>
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      2d29c9f8
  25. Sep 14, 2018
    • Paolo Valente's avatar
      block, bfq: correctly charge and reset entity service in all cases · cbeb869a
      Paolo Valente authored
      
      BFQ schedules entities (which represent either per-process queues or
      groups of queues) as a function of their timestamps. In particular, as
      a function of their (virtual) finish times. The finish time of an
      entity is computed as a function of the budget assigned to the entity,
      assuming, tentatively, that the entity, once in service, will receive
      an amount of service equal to its budget. Then, when the entity is
      expired because it finishes to be served, this finish time is updated
      as a function of the actual service received by the entity. This
      allows the entity to be correctly charged with only the service
      received, and then to be correctly re-scheduled.
      
      Yet an entity may receive service also while not being the entity in
      service (in the scheduling environment of its parent entity), for
      several reasons. If the entity remains with no backlog while receiving
      this 'unofficial' service, then it is expired. Also on such an
      expiration, the finish time of the entity should be updated to account
      for only the service actually received by the entity. Unfortunately,
      such an update is not performed for an entity expiring without being
      the entity in service.
      
      In a similar vein, the service counter of the entity in service is
      reset when the entity is expired, to be ready to be used for next
      service cycle. This reset too should be performed also in case an
      entity is expired because it remains empty after receiving service
      while not being the entity in service. But in this case the reset is
      not performed.
      
      This commit performs the above update of the finish time and reset of
      the service received, also for an entity expiring while not being the
      entity in service.
      
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      cbeb869a
  26. Aug 16, 2018
  27. Jul 09, 2018
    • Paolo Valente's avatar
      block, bfq: fix service being wrongly set to zero in case of preemption · 9fae8dd5
      Paolo Valente authored
      
      If
      - a bfq_queue Q preempts another queue, because one request of Q
      arrives in time,
      - but, after this preemption, Q is not the queue that is set in service,
      then Q->entity.service is set to 0 when Q is eventually set in
      service. But Q should have continued receiving service with its old
      budget (which is why preemption has occurred) and its old service.
      
      This commit addresses this issue by resetting service on queue real
      expiration.
      
      Tested-by: default avatarHolger Hoffstätte <holger@applied-asynchrony.com>
      Tested-by: default avatarOleksandr Natalenko <oleksandr@natalenko.name>
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      9fae8dd5
    • Paolo Valente's avatar
      block, bfq: add/remove entity weights correctly · 0471559c
      Paolo Valente authored
      
      To keep I/O throughput high as often as possible, BFQ performs
      I/O-dispatch plugging (aka device idling) only when beneficial exactly
      for throughput, or when needed for service guarantees (low latency,
      fairness). An important case where the latter condition holds is when
      the scenario is 'asymmetric' in terms of weights: i.e., when some
      bfq_queue or whole group of queues has a higher weight, and thus has
      to receive more service, than other queues or groups. Without dispatch
      plugging, lower-weight queues/groups may unjustly steal bandwidth to
      higher-weight queues/groups.
      
      To detect asymmetric scenarios, BFQ checks some sufficient
      conditions. One of these conditions is that active groups have
      different weights. BFQ controls this condition by maintaining a
      special set of unique weights of active groups
      (group_weights_tree). To this purpose, in the function
      bfq_active_insert/bfq_active_extract BFQ adds/removes the weight of a
      group to/from this set.
      
      Unfortunately, the function bfq_active_extract may happen to be
      invoked also for a group that is still active (to preserve the correct
      update of the next queue to serve, see comments in function
      bfq_no_longer_next_in_service() for details). In this case, removing
      the weight of the group makes the set group_weights_tree
      inconsistent. Service-guarantee violations follow.
      
      This commit addresses this issue by moving group_weights_tree
      insertions from their previous location (in bfq_active_insert) into
      the function __bfq_activate_entity, and by moving group_weights_tree
      extractions from bfq_active_extract to when the entity that represents
      a group remains throughly idle, i.e., with no request either enqueued
      or dispatched.
      
      Tested-by: default avatarHolger Hoffstätte <holger@applied-asynchrony.com>
      Tested-by: default avatarOleksandr Natalenko <oleksandr@natalenko.name>
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      0471559c
Loading