Skip to content
Snippets Groups Projects
  1. Dec 14, 2022
    • Masami Hiramatsu (Google)'s avatar
      tracing/probes: Add symstr type for dynamic events · b26a124c
      Masami Hiramatsu (Google) authored
      Add 'symstr' type for storing the kernel symbol as a string data
      instead of the symbol address. This allows us to filter the
      events by wildcard symbol name.
      
      e.g.
        # echo 'e:wqfunc workqueue.workqueue_execute_start symname=$function:symstr' >> dynamic_events
        # cat events/eprobes/wqfunc/format
        name: wqfunc
        ID: 2110
        format:
        	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
        	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
        	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
        	field:int common_pid;	offset:4;	size:4;	signed:1;
      
        	field:__data_loc char[] symname;	offset:8;	size:4;	signed:1;
      
        print fmt: " symname=\"%s\"", __get_str(symname)
      
      Note that there is already 'symbol' type which just change the
      print format (so it still stores the symbol address in the tracing
      ring buffer.) On the other hand, 'symstr' type stores the actual
      "symbol+offset/size" data as a string.
      
      Link: https://lore.kernel.org/all/166679930847.1528100.4124308529180235965.stgit@devnote3/
      
      
      
      Signed-off-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      b26a124c
    • wuqiang's avatar
      kprobes: kretprobe events missing on 2-core KVM guest · 3b7ddab8
      wuqiang authored
      Default value of maxactive is set as num_possible_cpus() for nonpreemptable
      systems. For a 2-core system, only 2 kretprobe instances would be allocated
      in default, then these 2 instances for execve kretprobe are very likely to
      be used up with a pipelined command.
      
      Here's the testcase: a shell script was added to crontab, and the content
      of the script is:
      
        #!/bin/sh
        do_something_magic `tr -dc a-z < /dev/urandom | head -c 10`
      
      cron will trigger a series of program executions (4 times every hour). Then
      events loss would be noticed normally after 3-4 hours of testings.
      
      The issue is caused by a burst of series of execve requests. The best number
      of kretprobe instances could be different case by case, and should be user's
      duty to determine, but num_possible_cpus() as the default value is inadequate
      especially for systems with small number of cpus.
      
      This patch enables the logic for preemption as default, thus increases the
      minimum of maxactive to 10 for nonpreemptable systems.
      
      Link: https://lore.kernel.org/all/20221110081502.492289-1-wuqiang.matt@bytedance.com/
      
      
      
      Signed-off-by: default avatarwuqiang <wuqiang.matt@bytedance.com>
      Reviewed-by: default avatarSolar Designer <solar@openwall.com>
      Acked-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      3b7ddab8
  2. Dec 10, 2022
  3. Nov 24, 2022
  4. Oct 18, 2022
    • Zheng Yejian's avatar
      tracing/histogram: Update document for KEYS_MAX size · a635beea
      Zheng Yejian authored
      
      After commit 4f36c2d8 ("tracing: Increase tracing map KEYS_MAX size"),
      'keys' supports up to three fields.
      
      Signed-off-by: default avatarZheng Yejian <zhengyejian1@huawei.com>
      Cc: stable@vger.kernel.org
      Acked-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Link: https://lore.kernel.org/r/20221017103806.2479139-1-zhengyejian1@huawei.com
      
      
      Signed-off-by: default avatarJonathan Corbet <corbet@lwn.net>
      a635beea
    • Christian Brauner's avatar
      attr: use consistent sgid stripping checks · ed5a7047
      Christian Brauner authored
      
      Currently setgid stripping in file_remove_privs()'s should_remove_suid()
      helper is inconsistent with other parts of the vfs. Specifically, it only
      raises ATTR_KILL_SGID if the inode is S_ISGID and S_IXGRP but not if the
      inode isn't in the caller's groups and the caller isn't privileged over the
      inode although we require this already in setattr_prepare() and
      setattr_copy() and so all filesystem implement this requirement implicitly
      because they have to use setattr_{prepare,copy}() anyway.
      
      But the inconsistency shows up in setgid stripping bugs for overlayfs in
      xfstests (e.g., generic/673, generic/683, generic/685, generic/686,
      generic/687). For example, we test whether suid and setgid stripping works
      correctly when performing various write-like operations as an unprivileged
      user (fallocate, reflink, write, etc.):
      
      echo "Test 1 - qa_user, non-exec file $verb"
      setup_testfile
      chmod a+rws $junk_file
      commit_and_check "$qa_user" "$verb" 64k 64k
      
      The test basically creates a file with 6666 permissions. While the file has
      the S_ISUID and S_ISGID bits set it does not have the S_IXGRP set. On a
      regular filesystem like xfs what will happen is:
      
      sys_fallocate()
      -> vfs_fallocate()
         -> xfs_file_fallocate()
            -> file_modified()
               -> __file_remove_privs()
                  -> dentry_needs_remove_privs()
                     -> should_remove_suid()
                  -> __remove_privs()
                     newattrs.ia_valid = ATTR_FORCE | kill;
                     -> notify_change()
                        -> setattr_copy()
      
      In should_remove_suid() we can see that ATTR_KILL_SUID is raised
      unconditionally because the file in the test has S_ISUID set.
      
      But we also see that ATTR_KILL_SGID won't be set because while the file
      is S_ISGID it is not S_IXGRP (see above) which is a condition for
      ATTR_KILL_SGID being raised.
      
      So by the time we call notify_change() we have attr->ia_valid set to
      ATTR_KILL_SUID | ATTR_FORCE. Now notify_change() sees that
      ATTR_KILL_SUID is set and does:
      
      ia_valid = attr->ia_valid |= ATTR_MODE
      attr->ia_mode = (inode->i_mode & ~S_ISUID);
      
      which means that when we call setattr_copy() later we will definitely
      update inode->i_mode. Note that attr->ia_mode still contains S_ISGID.
      
      Now we call into the filesystem's ->setattr() inode operation which will
      end up calling setattr_copy(). Since ATTR_MODE is set we will hit:
      
      if (ia_valid & ATTR_MODE) {
              umode_t mode = attr->ia_mode;
              vfsgid_t vfsgid = i_gid_into_vfsgid(mnt_userns, inode);
              if (!vfsgid_in_group_p(vfsgid) &&
                  !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
                      mode &= ~S_ISGID;
              inode->i_mode = mode;
      }
      
      and since the caller in the test is neither capable nor in the group of the
      inode the S_ISGID bit is stripped.
      
      But assume the file isn't suid then ATTR_KILL_SUID won't be raised which
      has the consequence that neither the setgid nor the suid bits are stripped
      even though it should be stripped because the inode isn't in the caller's
      groups and the caller isn't privileged over the inode.
      
      If overlayfs is in the mix things become a bit more complicated and the bug
      shows up more clearly. When e.g., ovl_setattr() is hit from
      ovl_fallocate()'s call to file_remove_privs() then ATTR_KILL_SUID and
      ATTR_KILL_SGID might be raised but because the check in notify_change() is
      questioning the ATTR_KILL_SGID flag again by requiring S_IXGRP for it to be
      stripped the S_ISGID bit isn't removed even though it should be stripped:
      
      sys_fallocate()
      -> vfs_fallocate()
         -> ovl_fallocate()
            -> file_remove_privs()
               -> dentry_needs_remove_privs()
                  -> should_remove_suid()
               -> __remove_privs()
                  newattrs.ia_valid = ATTR_FORCE | kill;
                  -> notify_change()
                     -> ovl_setattr()
                        // TAKE ON MOUNTER'S CREDS
                        -> ovl_do_notify_change()
                           -> notify_change()
                        // GIVE UP MOUNTER'S CREDS
           // TAKE ON MOUNTER'S CREDS
           -> vfs_fallocate()
              -> xfs_file_fallocate()
                 -> file_modified()
                    -> __file_remove_privs()
                       -> dentry_needs_remove_privs()
                          -> should_remove_suid()
                       -> __remove_privs()
                          newattrs.ia_valid = attr_force | kill;
                          -> notify_change()
      
      The fix for all of this is to make file_remove_privs()'s
      should_remove_suid() helper to perform the same checks as we already
      require in setattr_prepare() and setattr_copy() and have notify_change()
      not pointlessly requiring S_IXGRP again. It doesn't make any sense in the
      first place because the caller must calculate the flags via
      should_remove_suid() anyway which would raise ATTR_KILL_SGID.
      
      While we're at it we move should_remove_suid() from inode.c to attr.c
      where it belongs with the rest of the iattr helpers. Especially since it
      returns ATTR_KILL_S{G,U}ID flags. We also rename it to
      setattr_should_drop_suidgid() to better reflect that it indicates both
      setuid and setgid bit removal and also that it returns attr flags.
      
      Running xfstests with this doesn't report any regressions. We should really
      try and use consistent checks.
      
      Reviewed-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      ed5a7047
  5. Oct 10, 2022
  6. Oct 06, 2022
  7. Sep 29, 2022
  8. Sep 27, 2022
  9. Sep 09, 2022
  10. Aug 26, 2022
  11. Aug 22, 2022
  12. Jul 30, 2022
    • Daniel Bristot de Oliveira's avatar
      rv/monitor: Add the wwnr monitor · ccc319dc
      Daniel Bristot de Oliveira authored
      Per task wakeup while not running (wwnr) monitor.
      
      This model is broken, the reason is that a task can be running in the
      processor without being set as RUNNABLE. Think about a task about to
      sleep:
      
      1:      set_current_state(TASK_UNINTERRUPTIBLE);
      2:      schedule();
      
      And then imagine an IRQ happening in between the lines one and two,
      waking the task up. BOOM, the wakeup will happen while the task is
      running.
      
      Q: Why do we need this model, so?
      A: To test the reactors.
      
      Link: https://lkml.kernel.org/r/473c0fc39967250fdebcff8b620311c11dccad30.1659052063.git.bristot@kernel.org
      
      
      
      Cc: Wim Van Sebroeck <wim@linux-watchdog.org>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Marco Elver <elver@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: "Paul E. McKenney" <paulmck@kernel.org>
      Cc: Shuah Khan <skhan@linuxfoundation.org>
      Cc: Gabriele Paoloni <gpaoloni@redhat.com>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Clark Williams <williams@redhat.com>
      Cc: Tao Zhou <tao.zhou@linux.dev>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: linux-doc@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-trace-devel@vger.kernel.org
      Signed-off-by: default avatarDaniel Bristot de Oliveira <bristot@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      ccc319dc
    • Daniel Bristot de Oliveira's avatar
      rv/monitor: Add the wip monitor · 10bde81c
      Daniel Bristot de Oliveira authored
      The wakeup in preemptive (wip) monitor verifies if the
      wakeup events always take place with preemption disabled:
      
                           |
                           |
                           v
                         #==================#
                         H    preemptive    H <+
                         #==================#  |
                           |                   |
                           | preempt_disable   | preempt_enable
                           v                   |
          sched_waking   +------------------+  |
        +--------------- |                  |  |
        |                |  non_preemptive  |  |
        +--------------> |                  | -+
                         +------------------+
      
      The wakeup event always takes place with preemption disabled because
      of the scheduler synchronization. However, because the preempt_count
      and its trace event are not atomic with regard to interrupts, some
      inconsistencies might happen.
      
      The documentation illustrates one of these cases.
      
      Link: https://lkml.kernel.org/r/c98ca678df81115fddc04921b3c79720c836b18f.1659052063.git.bristot@kernel.org
      
      
      
      Cc: Wim Van Sebroeck <wim@linux-watchdog.org>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Marco Elver <elver@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: "Paul E. McKenney" <paulmck@kernel.org>
      Cc: Shuah Khan <skhan@linuxfoundation.org>
      Cc: Gabriele Paoloni <gpaoloni@redhat.com>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Clark Williams <williams@redhat.com>
      Cc: Tao Zhou <tao.zhou@linux.dev>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: linux-doc@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-trace-devel@vger.kernel.org
      Signed-off-by: default avatarDaniel Bristot de Oliveira <bristot@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      10bde81c
    • Daniel Bristot de Oliveira's avatar
      Documentation/rv: Add deterministic automata instrumentation documentation · b6172b51
      Daniel Bristot de Oliveira authored
      Add the da_monitor_instrumentation.rst. It describes the basics
      of RV monitor instrumentation.
      
      Link: https://lkml.kernel.org/r/0557d5c68e2fc252f2643c2cc5295a67e2b73277.1659052063.git.bristot@kernel.org
      
      
      
      Cc: Wim Van Sebroeck <wim@linux-watchdog.org>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Marco Elver <elver@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: "Paul E. McKenney" <paulmck@kernel.org>
      Cc: Shuah Khan <skhan@linuxfoundation.org>
      Cc: Gabriele Paoloni <gpaoloni@redhat.com>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Clark Williams <williams@redhat.com>
      Cc: Tao Zhou <tao.zhou@linux.dev>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: linux-doc@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-trace-devel@vger.kernel.org
      Signed-off-by: default avatarDaniel Bristot de Oliveira <bristot@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      b6172b51
    • Daniel Bristot de Oliveira's avatar
      Documentation/rv: Add deterministic automata monitor synthesis documentation · d57aff24
      Daniel Bristot de Oliveira authored
      Add the da_monitor_synthesis.rst introduces some concepts behind the
      Deterministic Automata (DA) monitor synthesis and interface.
      
      Link: https://lkml.kernel.org/r/7873bdb7b2e5d2bc0b2eb6ca0b324af9a0ba27a0.1659052063.git.bristot@kernel.org
      
      
      
      Cc: Wim Van Sebroeck <wim@linux-watchdog.org>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Marco Elver <elver@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: "Paul E. McKenney" <paulmck@kernel.org>
      Cc: Shuah Khan <skhan@linuxfoundation.org>
      Cc: Gabriele Paoloni <gpaoloni@redhat.com>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Clark Williams <williams@redhat.com>
      Cc: Tao Zhou <tao.zhou@linux.dev>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: linux-doc@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-trace-devel@vger.kernel.org
      Signed-off-by: default avatarDaniel Bristot de Oliveira <bristot@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      d57aff24
    • Daniel Bristot de Oliveira's avatar
      Documentation/rv: Add deterministic automaton documentation · 4041b9bb
      Daniel Bristot de Oliveira authored
      Add documentation about deterministic automaton and its possible
      representations (formal, graphic, .dot and C).
      
      Link: https://lkml.kernel.org/r/387edaed87630bd5eb37c4275045dfd229700aa6.1659052063.git.bristot@kernel.org
      
      
      
      Cc: Wim Van Sebroeck <wim@linux-watchdog.org>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Marco Elver <elver@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: "Paul E. McKenney" <paulmck@kernel.org>
      Cc: Shuah Khan <skhan@linuxfoundation.org>
      Cc: Gabriele Paoloni <gpaoloni@redhat.com>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Clark Williams <williams@redhat.com>
      Cc: Tao Zhou <tao.zhou@linux.dev>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: linux-doc@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-trace-devel@vger.kernel.org
      Signed-off-by: default avatarDaniel Bristot de Oliveira <bristot@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      4041b9bb
    • Daniel Bristot de Oliveira's avatar
      Documentation/rv: Add a basic documentation · ff0aaf67
      Daniel Bristot de Oliveira authored
      Add the runtime-verification.rst document, explaining the basics of RV
      and how to use the interface.
      
      Link: https://lkml.kernel.org/r/4be7d1a88ab1e2eb0767521e1ab52a149a154bc4.1659052063.git.bristot@kernel.org
      
      
      
      Cc: Wim Van Sebroeck <wim@linux-watchdog.org>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Marco Elver <elver@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: "Paul E. McKenney" <paulmck@kernel.org>
      Cc: Shuah Khan <skhan@linuxfoundation.org>
      Cc: Gabriele Paoloni <gpaoloni@redhat.com>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Clark Williams <williams@redhat.com>
      Cc: Tao Zhou <tao.zhou@linux.dev>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: linux-doc@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-trace-devel@vger.kernel.org
      Signed-off-by: default avatarDaniel Bristot de Oliveira <bristot@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      ff0aaf67
  13. Jul 24, 2022
  14. Jul 06, 2022
  15. Jun 30, 2022
  16. Jun 29, 2022
  17. May 27, 2022
  18. Apr 27, 2022
  19. Apr 02, 2022
  20. Mar 18, 2022
  21. Feb 24, 2022
  22. Feb 11, 2022
  23. Jan 17, 2022
  24. Jan 14, 2022
  25. Jan 13, 2022
    • Steven Rostedt's avatar
      tracing: Add test for user space strings when filtering on string pointers · 77360f9b
      Steven Rostedt authored
      Pingfan reported that the following causes a fault:
      
        echo "filename ~ \"cpu\"" > events/syscalls/sys_enter_openat/filter
        echo 1 > events/syscalls/sys_enter_at/enable
      
      The reason is that trace event filter treats the user space pointer
      defined by "filename" as a normal pointer to compare against the "cpu"
      string. The following bug happened:
      
       kvm-03-guest16 login: [72198.026181] BUG: unable to handle page fault for address: 00007fffaae8ef60
       #PF: supervisor read access in kernel mode
       #PF: error_code(0x0001) - permissions violation
       PGD 80000001008b7067 P4D 80000001008b7067 PUD 2393f1067 PMD 2393ec067 PTE 8000000108f47867
       Oops: 0001 [#1] PREEMPT SMP PTI
       CPU: 1 PID: 1 Comm: systemd Kdump: loaded Not tainted 5.14.0-32.el9.x86_64 #1
       Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
       RIP: 0010:strlen+0x0/0x20
       Code: 48 89 f9 74 09 48 83 c1 01 80 39 00 75 f7 31 d2 44 0f b6 04 16 44 88 04 11
             48 83 c2 01 45 84 c0 75 ee c3 0f 1f 80 00 00 00 00 <80> 3f 00 74 10 48 89 f8
             48 83 c0 01 80 38 00 75 f7 48 29 f8 c3 31
       RSP: 0018:ffffb5b900013e48 EFLAGS: 00010246
       RAX: 0000000000000018 RBX: ffff8fc1c49ede00 RCX: 0000000000000000
       RDX: 0000000000000020 RSI: ffff8fc1c02d601c RDI: 00007fffaae8ef60
       RBP: 00007fffaae8ef60 R08: 0005034f4ddb8ea4 R09: 0000000000000000
       R10: ffff8fc1c02d601c R11: 0000000000000000 R12: ffff8fc1c8a6e380
       R13: 0000000000000000 R14: ffff8fc1c02d6010 R15: ffff8fc1c00453c0
       FS:  00007fa86123db40(0000) GS:ffff8fc2ffd00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00007fffaae8ef60 CR3: 0000000102880001 CR4: 00000000007706e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       PKRU: 55555554
       Call Trace:
        filter_pred_pchar+0x18/0x40
        filter_match_preds+0x31/0x70
        ftrace_syscall_enter+0x27a/0x2c0
        syscall_trace_enter.constprop.0+0x1aa/0x1d0
        do_syscall_64+0x16/0x90
        entry_SYSCALL_64_after_hwframe+0x44/0xae
       RIP: 0033:0x7fa861d88664
      
      The above happened because the kernel tried to access user space directly
      and triggered a "supervisor read access in kernel mode" fault. Worse yet,
      the memory could not even be loaded yet, and a SEGFAULT could happen as
      well. This could be true for kernel space accessing as well.
      
      To be even more robust, test both kernel and user space strings. If the
      string fails to read, then simply have the filter fail.
      
      Note, TASK_SIZE is used to determine if the pointer is user or kernel space
      and the appropriate strncpy_from_kernel/user_nofault() function is used to
      copy the memory. For some architectures, the compare to TASK_SIZE may always
      pick user space or kernel space. If it gets it wrong, the only thing is that
      the filter will fail to match. In the future, this needs to be fixed to have
      the event denote which should be used. But failing a filter is much better
      than panicing the machine, and that can be solved later.
      
      Link: https://lore.kernel.org/all/20220107044951.22080-1-kernelfans@gmail.com/
      Link: https://lkml.kernel.org/r/20220110115532.536088fd@gandalf.local.home
      
      
      
      Cc: stable@vger.kernel.org
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Tom Zanussi <zanussi@kernel.org>
      Reported-by: default avatarPingfan Liu <kernelfans@gmail.com>
      Tested-by: default avatarPingfan Liu <kernelfans@gmail.com>
      Fixes: 87a342f5 ("tracing/filters: Support filtering for char * strings")
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      77360f9b
  26. Dec 10, 2021
  27. Nov 26, 2021
  28. Nov 18, 2021
Loading