Skip to content
Snippets Groups Projects
  1. Dec 27, 2022
  2. Dec 23, 2022
  3. Dec 21, 2022
  4. Dec 20, 2022
  5. Dec 18, 2022
    • Paul E. McKenney's avatar
      rcu: Don't assert interrupts enabled too early in boot · 3f6c3d29
      Paul E. McKenney authored
      
      The rcu_poll_gp_seq_end() and rcu_poll_gp_seq_end_unlocked() both check
      that interrupts are enabled, as they normally should be when waiting for
      an RCU grace period.  Except that it is legal to wait for grace periods
      during early boot, before interrupts have been enabled for the first time,
      and polling for grace periods is required to work during this time.
      This can result in false-positive lockdep splats in the presence of
      boot-time-initiated tracing.
      
      This commit therefore conditions those interrupts-enabled checks on
      rcu_scheduler_active having advanced past RCU_SCHEDULER_INACTIVE, by
      which time interrupts have been enabled.
      
      Reported-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Tested-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      3f6c3d29
  6. Dec 16, 2022
  7. Dec 15, 2022
  8. Dec 14, 2022
  9. Dec 13, 2022
  10. Dec 12, 2022
  11. Dec 10, 2022
    • Eduard Zingerman's avatar
      bpf: use check_ids() for active_lock comparison · 4ea2bb15
      Eduard Zingerman authored
      An update for verifier.c:states_equal()/regsafe() to use check_ids()
      for active spin lock comparisons. This fixes the issue reported by
      Kumar Kartikeya Dwivedi in [1] using technique suggested by Edward Cree.
      
      W/o this commit the verifier might be tricked to accept the following
      program working with a map containing spin locks:
      
        0: r9 = map_lookup_elem(...)  ; Returns PTR_TO_MAP_VALUE_OR_NULL id=1.
        1: r8 = map_lookup_elem(...)  ; Returns PTR_TO_MAP_VALUE_OR_NULL id=2.
        2: if r9 == 0 goto exit       ; r9 -> PTR_TO_MAP_VALUE.
        3: if r8 == 0 goto exit       ; r8 -> PTR_TO_MAP_VALUE.
        4: r7 = ktime_get_ns()        ; Unbound SCALAR_VALUE.
        5: r6 = ktime_get_ns()        ; Unbound SCALAR_VALUE.
        6: bpf_spin_lock(r8)          ; active_lock.id == 2.
        7: if r6 > r7 goto +1         ; No new information about the state
                                      ; is derived from this check, thus
                                      ; produced verifier states differ only
                                      ; in 'insn_idx'.
        8: r9 = r8                    ; Optionally make r9.id == r8.id.
        --- checkpoint ---            ; Assume is_state_visisted() creates a
                                      ; checkpoint here.
        9: bpf_spin_unlock(r9)        ; (a,b) active_lock.id == 2.
                                      ; (a) r9.id == 2, (b) r9.id == 1.
       10: exit(0)
      
      Consider two verification paths:
      (a) 0-10
      (b) 0-7,9-10
      
      The path (a) is verified first. If checkpoint is created at (8)
      the (b) would assume that (8) is safe because regsafe() does not
      compare register ids for registers of type PTR_TO_MAP_VALUE.
      
      [1] https://lore.kernel.org/bpf/20221111202719.982118-1-memxor@gmail.com/
      
      
      
      Reported-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Suggested-by: default avatarEdward Cree <ecree.xilinx@gmail.com>
      Signed-off-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Link: https://lore.kernel.org/r/20221209135733.28851-6-eddyz87@gmail.com
      
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      4ea2bb15
    • Eduard Zingerman's avatar
      bpf: states_equal() must build idmap for all function frames · 5dd9cdbc
      Eduard Zingerman authored
      verifier.c:states_equal() must maintain register ID mapping across all
      function frames. Otherwise the following example might be erroneously
      marked as safe:
      
      main:
          fp[-24] = map_lookup_elem(...)  ; frame[0].fp[-24].id == 1
          fp[-32] = map_lookup_elem(...)  ; frame[0].fp[-32].id == 2
          r1 = &fp[-24]
          r2 = &fp[-32]
          call foo()
          r0 = 0
          exit
      
      foo:
        0: r9 = r1
        1: r8 = r2
        2: r7 = ktime_get_ns()
        3: r6 = ktime_get_ns()
        4: if (r6 > r7) goto skip_assign
        5: r9 = r8
      
      skip_assign:                ; <--- checkpoint
        6: r9 = *r9               ; (a) frame[1].r9.id == 2
                                  ; (b) frame[1].r9.id == 1
      
        7: if r9 == 0 goto exit:  ; mark_ptr_or_null_regs() transfers != 0 info
                                  ; for all regs sharing ID:
                                  ;   (a) r9 != 0 => &frame[0].fp[-32] != 0
                                  ;   (b) r9 != 0 => &frame[0].fp[-24] != 0
      
        8: r8 = *r8               ; (a) r8 == &frame[0].fp[-32]
                                  ; (b) r8 == &frame[0].fp[-32]
        9: r0 = *r8               ; (a) safe
                                  ; (b) unsafe
      
      exit:
       10: exit
      
      While processing call to foo() verifier considers the following
      execution paths:
      
      (a) 0-10
      (b) 0-4,6-10
      (There is also path 0-7,10 but it is not interesting for the issue at
       hand. (a) is verified first.)
      
      Suppose that checkpoint is created at (6) when path (a) is verified,
      next path (b) is verified and (6) is reached.
      
      If states_equal() maintains separate 'idmap' for each frame the
      mapping at (6) for frame[1] would be empty and
      regsafe(r9)::check_ids() would add a pair 2->1 and return true,
      which is an error.
      
      If states_equal() maintains single 'idmap' for all frames the mapping
      at (6) would be { 1->1, 2->2 } and regsafe(r9)::check_ids() would
      return false when trying to add a pair 2->1.
      
      This issue was suggested in the following discussion:
      https://lore.kernel.org/bpf/CAEf4BzbFB5g4oUfyxk9rHy-PJSLQ3h8q9mV=rVoXfr_JVm8+1Q@mail.gmail.com/
      
      
      
      Suggested-by: default avatarAndrii Nakryiko <andrii.nakryiko@gmail.com>
      Signed-off-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Link: https://lore.kernel.org/r/20221209135733.28851-4-eddyz87@gmail.com
      
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      5dd9cdbc
    • Eduard Zingerman's avatar
      bpf: regsafe() must not skip check_ids() · 7c884339
      Eduard Zingerman authored
      
      The verifier.c:regsafe() has the following shortcut:
      
      	equal = memcmp(rold, rcur, offsetof(struct bpf_reg_state, parent)) == 0;
      	...
      	if (equal)
      		return true;
      
      Which is executed regardless old register type. This is incorrect for
      register types that might have an ID checked by check_ids(), namely:
       - PTR_TO_MAP_KEY
       - PTR_TO_MAP_VALUE
       - PTR_TO_PACKET_META
       - PTR_TO_PACKET
      
      The following pattern could be used to exploit this:
      
        0: r9 = map_lookup_elem(...)  ; Returns PTR_TO_MAP_VALUE_OR_NULL id=1.
        1: r8 = map_lookup_elem(...)  ; Returns PTR_TO_MAP_VALUE_OR_NULL id=2.
        2: r7 = ktime_get_ns()        ; Unbound SCALAR_VALUE.
        3: r6 = ktime_get_ns()        ; Unbound SCALAR_VALUE.
        4: if r6 > r7 goto +1         ; No new information about the state
                                      ; is derived from this check, thus
                                      ; produced verifier states differ only
                                      ; in 'insn_idx'.
        5: r9 = r8                    ; Optionally make r9.id == r8.id.
        --- checkpoint ---            ; Assume is_state_visisted() creates a
                                      ; checkpoint here.
        6: if r9 == 0 goto <exit>     ; Nullness info is propagated to all
                                      ; registers with matching ID.
        7: r1 = *(u64 *) r8           ; Not always safe.
      
      Verifier first visits path 1-7 where r8 is verified to be not null
      at (6). Later the jump from 4 to 6 is examined. The checkpoint for (6)
      looks as follows:
        R8_rD=map_value_or_null(id=2,off=0,ks=4,vs=8,imm=0)
        R9_rwD=map_value_or_null(id=2,off=0,ks=4,vs=8,imm=0)
        R10=fp0
      
      The current state is:
        R0=... R6=... R7=... fp-8=...
        R8=map_value_or_null(id=2,off=0,ks=4,vs=8,imm=0)
        R9=map_value_or_null(id=1,off=0,ks=4,vs=8,imm=0)
        R10=fp0
      
      Note that R8 states are byte-to-byte identical, so regsafe() would
      exit early and skip call to check_ids(), thus ID mapping 2->2 will not
      be added to 'idmap'. Next, states for R9 are compared: these are not
      identical and check_ids() is executed, but 'idmap' is empty, so
      check_ids() adds mapping 2->1 to 'idmap' and returns success.
      
      This commit pushes the 'equal' down to register types that don't need
      check_ids().
      
      Signed-off-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Link: https://lore.kernel.org/r/20221209135733.28851-2-eddyz87@gmail.com
      
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      7c884339
    • Daniel Bristot de Oliveira's avatar
      tracing/osnoise: Add preempt and/or irq disabled options · b5dce200
      Daniel Bristot de Oliveira authored
      The osnoise workload runs with preemption and IRQs enabled in such
      a way as to allow all sorts of noise to disturb osnoise's execution.
      hwlat tracer has a similar workload but works with irq disabled,
      allowing only NMIs and the hardware to generate noise.
      
      While thinking about adding an options file to hwlat tracer to
      allow the system to panic, and other features I was thinking
      to add, like having a tracepoint at each noise detection, it
      came to my mind that is easier to make osnoise and also do
      hardware latency detection than making hwlat "feature compatible"
      with osnoise.
      
      Other points are:
       - osnoise already has an independent cpu file.
       - osnoise has a more intuitive interface, e.g., runtime/period vs.
         window/width (and people often need help remembering what it is).
       - osnoise: tracepoints
       - osnoise stop options
       - osnoise options file itself
      
      Moreover, the user-space side (in rtla) is simplified by reusing the
      existing osnoise code.
      
      Finally, people have been asking me about using osnoise for hw latency
      detection, and I have to explain that it was sufficient but not
      necessary. These options make it sufficient and necessary.
      
      Adding a Suggested-by Clark, as he often asked me about this
      possibility.
      
      Link: https://lkml.kernel.org/r/d9c6c19135497054986900f94c8e47410b15316a.1670623111.git.bristot@kernel.org
      
      
      
      Cc: Suggested-by: Clark Williams <williams@redhat.com>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Bagas Sanjaya <bagasdotme@gmail.com>
      Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Signed-off-by: default avatarDaniel Bristot de Oliveira <bristot@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      b5dce200
    • Daniel Bristot de Oliveira's avatar
      tracing/osnoise: Add PANIC_ON_STOP option · 1603dda4
      Daniel Bristot de Oliveira authored
      Often the latency observed in a CPU is not caused by the work being done
      in the CPU itself, but by work done on another CPU that causes the
      hardware to stall all CPUs. In this case, it is interesting to know
      what is happening on ALL CPUs, and the best way to do this is via
      crash dump analysis.
      
      Add the PANIC_ON_STOP option to osnoise/timerlat tracers. The default
      behavior is having this option off. When enabled by the user, the system
      will panic after hitting a stop tracing condition.
      
      This option was motivated by a real scenario that Juri Lelli and I
      were debugging.
      
      Link: https://lkml.kernel.org/r/249ce4287c6725543e6db845a6e0df621dc67db5.1670623111.git.bristot@kernel.org
      
      
      
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Clark Williams <williams@redhat.com>
      Cc: Bagas Sanjaya <bagasdotme@gmail.com>
      Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Signed-off-by: default avatarDaniel Bristot de Oliveira <bristot@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      1603dda4
Loading