Skip to content
Snippets Groups Projects
  1. Oct 13, 2023
    • Dan Clash's avatar
      audit,io_uring: io_uring openat triggers audit reference count underflow · 03adc61e
      Dan Clash authored
      An io_uring openat operation can update an audit reference count
      from multiple threads resulting in the call trace below.
      
      A call to io_uring_submit() with a single openat op with a flag of
      IOSQE_ASYNC results in the following reference count updates.
      
      These first part of the system call performs two increments that do not race.
      
      do_syscall_64()
        __do_sys_io_uring_enter()
          io_submit_sqes()
            io_openat_prep()
              __io_openat_prep()
                getname()
                  getname_flags()       /* update 1 (increment) */
                    __audit_getname()   /* update 2 (increment) */
      
      The openat op is queued to an io_uring worker thread which starts the
      opportunity for a race.  The system call exit performs one decrement.
      
      do_syscall_64()
        syscall_exit_to_user_mode()
          syscall_exit_to_user_mode_prepare()
            __audit_syscall_exit()
              audit_reset_context()
                 putname()              /* update 3 (decrement) */
      
      The io_uring worker thread performs one increment and two decrements.
      These updates can race with the system call decrement.
      
      io_wqe_worker()
        io_worker_handle_work()
          io_wq_submit_work()
            io_issue_sqe()
              io_openat()
                io_openat2()
                  do_filp_open()
                    path_openat()
                      __audit_inode()   /* update 4 (increment) */
                  putname()             /* update 5 (decrement) */
              __audit_uring_exit()
                audit_reset_context()
                  putname()             /* update 6 (decrement) */
      
      The fix is to change the refcnt member of struct audit_names
      from int to atomic_t.
      
      kernel BUG at fs/namei.c:262!
      Call Trace:
      ...
       ? putname+0x68/0x70
       audit_reset_context.part.0.constprop.0+0xe1/0x300
       __audit_uring_exit+0xda/0x1c0
       io_issue_sqe+0x1f3/0x450
       ? lock_timer_base+0x3b/0xd0
       io_wq_submit_work+0x8d/0x2b0
       ? __try_to_del_timer_sync+0x67/0xa0
       io_worker_handle_work+0x17c/0x2b0
       io_wqe_worker+0x10a/0x350
      
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/lkml/MW2PR2101MB1033FFF044A258F84AEAA584F1C9A@MW2PR2101MB1033.namprd21.prod.outlook.com/
      
      
      Fixes: 5bd2182d ("audit,io_uring,io-wq: add some basic audit support to io_uring")
      Signed-off-by: default avatarDan Clash <daclash@linux.microsoft.com>
      Link: https://lore.kernel.org/r/20231012215518.GA4048@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net
      
      
      Reviewed-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      03adc61e
  2. Aug 30, 2023
  3. Aug 15, 2023
  4. Aug 08, 2023
    • Gaosheng Cui's avatar
      audit: fix possible soft lockup in __audit_inode_child() · b59bc6e3
      Gaosheng Cui authored
      
      Tracefs or debugfs maybe cause hundreds to thousands of PATH records,
      too many PATH records maybe cause soft lockup.
      
      For example:
        1. CONFIG_KASAN=y && CONFIG_PREEMPTION=n
        2. auditctl -a exit,always -S open -k key
        3. sysctl -w kernel.watchdog_thresh=5
        4. mkdir /sys/kernel/debug/tracing/instances/test
      
      There may be a soft lockup as follows:
        watchdog: BUG: soft lockup - CPU#45 stuck for 7s! [mkdir:15498]
        Kernel panic - not syncing: softlockup: hung tasks
        Call trace:
         dump_backtrace+0x0/0x30c
         show_stack+0x20/0x30
         dump_stack+0x11c/0x174
         panic+0x27c/0x494
         watchdog_timer_fn+0x2bc/0x390
         __run_hrtimer+0x148/0x4fc
         __hrtimer_run_queues+0x154/0x210
         hrtimer_interrupt+0x2c4/0x760
         arch_timer_handler_phys+0x48/0x60
         handle_percpu_devid_irq+0xe0/0x340
         __handle_domain_irq+0xbc/0x130
         gic_handle_irq+0x78/0x460
         el1_irq+0xb8/0x140
         __audit_inode_child+0x240/0x7bc
         tracefs_create_file+0x1b8/0x2a0
         trace_create_file+0x18/0x50
         event_create_dir+0x204/0x30c
         __trace_add_new_event+0xac/0x100
         event_trace_add_tracer+0xa0/0x130
         trace_array_create_dir+0x60/0x140
         trace_array_create+0x1e0/0x370
         instance_mkdir+0x90/0xd0
         tracefs_syscall_mkdir+0x68/0xa0
         vfs_mkdir+0x21c/0x34c
         do_mkdirat+0x1b4/0x1d4
         __arm64_sys_mkdirat+0x4c/0x60
         el0_svc_common.constprop.0+0xa8/0x240
         do_el0_svc+0x8c/0xc0
         el0_svc+0x20/0x30
         el0_sync_handler+0xb0/0xb4
         el0_sync+0x160/0x180
      
      Therefore, we add cond_resched() to __audit_inode_child() to fix it.
      
      Fixes: 5195d8e2 ("audit: dynamically allocate audit_names when not enough space is in the names array")
      Signed-off-by: default avatarGaosheng Cui <cuigaosheng1@huawei.com>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      b59bc6e3
  5. Mar 01, 2023
    • Linus Torvalds's avatar
      capability: just use a 'u64' instead of a 'u32[2]' array · f122a08b
      Linus Torvalds authored
      
      Back in 2008 we extended the capability bits from 32 to 64, and we did
      it by extending the single 32-bit capability word from one word to an
      array of two words.  It was then obfuscated by hiding the "2" behind two
      macro expansions, with the reasoning being that maybe it gets extended
      further some day.
      
      That reasoning may have been valid at the time, but the last thing we
      want to do is to extend the capability set any more.  And the array of
      values not only causes source code oddities (with loops to deal with
      it), but also results in worse code generation.  It's a lose-lose
      situation.
      
      So just change the 'u32[2]' into a 'u64' and be done with it.
      
      We still have to deal with the fact that the user space interface is
      designed around an array of these 32-bit values, but that was the case
      before too, since the array layouts were different (ie user space
      doesn't use an array of 32-bit values for individual capability masks,
      but an array of 32-bit slices of multiple masks).
      
      So that marshalling of data is actually simplified too, even if it does
      remain somewhat obscure and odd.
      
      This was all triggered by my reaction to the new "cap_isidentical()"
      introduced recently.  By just using a saner data structure, it went from
      
      	unsigned __capi;
      	CAP_FOR_EACH_U32(__capi) {
      		if (a.cap[__capi] != b.cap[__capi])
      			return false;
      	}
      	return true;
      
      to just being
      
      	return a.val == b.val;
      
      instead.  Which is rather more obvious both to humans and to compilers.
      
      Cc: Mateusz Guzik <mjguzik@gmail.com>
      Cc: Casey Schaufler <casey@schaufler-ca.com>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Paul Moore <paul@paul-moore.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f122a08b
  6. Feb 07, 2023
  7. Jan 19, 2023
    • Christian Brauner's avatar
      fs: port xattr to mnt_idmap · 39f60c1c
      Christian Brauner authored
      
      Convert to struct mnt_idmap.
      
      Last cycle we merged the necessary infrastructure in
      256c8aed ("fs: introduce dedicated idmap type for mounts").
      This is just the conversion to struct mnt_idmap.
      
      Currently we still pass around the plain namespace that was attached to a
      mount. This is in general pretty convenient but it makes it easy to
      conflate namespaces that are relevant on the filesystem with namespaces
      that are relevent on the mount level. Especially for non-vfs developers
      without detailed knowledge in this area this can be a potential source for
      bugs.
      
      Once the conversion to struct mnt_idmap is done all helpers down to the
      really low-level helpers will take a struct mnt_idmap argument instead of
      two namespace arguments. This way it becomes impossible to conflate the two
      eliminating the possibility of any bugs. All of the vfs and all filesystems
      only operate on struct mnt_idmap.
      
      Acked-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      39f60c1c
  8. Oct 17, 2022
    • Ankur Arora's avatar
      audit: unify audit_filter_{uring(), inode_name(), syscall()} · 50979953
      Ankur Arora authored
      
      audit_filter_uring(), audit_filter_inode_name() are substantially
      similar to audit_filter_syscall(). Move the core logic to
      __audit_filter_op() which can be parametrized for all three.
      
      On a Skylakex system, getpid() latency (all results aggregated
      across 12 boot cycles):
      
               Min     Mean    Median   Max      pstdev
               (ns)    (ns)    (ns)     (ns)
      
       -    196.63   207.86  206.60  230.98      (+- 3.92%)
       +    183.73   196.95  192.31  232.49	   (+- 6.04%)
      
      Performance counter stats for 'bin/getpid' (3 runs) go from:
          cycles               805.58  (  +-  4.11% )
          instructions        1654.11  (  +-   .05% )
          IPC                    2.06  (  +-  3.39% )
          branches             430.02  (  +-   .05% )
          branch-misses          1.55  (  +-  7.09% )
          L1-dcache-loads      440.01  (  +-   .09% )
          L1-dcache-load-misses  9.05  (  +- 74.03% )
      to:
          cycles		 765.37  (  +-  6.66% )
          instructions        1677.07  (  +-  0.04% )
          IPC		           2.20  (  +-  5.90% )
          branches	         431.10  (  +-  0.04% )
          branch-misses	   1.60  (  +- 11.25% )
          L1-dcache-loads	 521.04  (  +-  0.05% )
          L1-dcache-load-misses  6.92  (  +- 77.60% )
      
      (Both aggregated over 12 boot cycles.)
      
      The increased L1-dcache-loads are due to some intermediate values now
      coming from the stack.
      
      The improvement in cycles is due to a slightly denser loop (the list
      parameter in the list_for_each_entry_rcu() exit check now comes from
      a register rather than a constant as before.)
      
      Signed-off-by: default avatarAnkur Arora <ankur.a.arora@oracle.com>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      50979953
    • Ankur Arora's avatar
      audit: cache ctx->major in audit_filter_syscall() · 06954599
      Ankur Arora authored
      
      ctx->major contains the current syscall number. This is, of course, a
      constant for the duration of the syscall. Unfortunately, GCC's alias
      analysis cannot prove that it is not modified via a pointer in the
      audit_filter_syscall() loop, and so always loads it from memory.
      
      In and of itself the load isn't very expensive (ops dependent on the
      ctx->major load are only used to determine the direction of control flow
      and have short dependence chains and, in any case the related branches
      get predicted perfectly in the fastpath) but still cache ctx->major
      in a local for two reasons:
      
      * ctx->major is in the first cacheline of struct audit_context and has
        similar alignment as audit_entry::list audit_entry. For cases
        with a lot of audit rules, doing this reduces one source of contention
        from a potentially busy cache-set.
      
      * audit_in_mask() (called in the hot loop in audit_filter_syscall())
        does cast manipulation and error checking on ctx->major:
      
           audit_in_mask(const struct audit_krule *rule, unsigned long val):
                   if (val > 0xffffffff)
                           return false;
      
                   word = AUDIT_WORD(val);
                   if (word >= AUDIT_BITMASK_SIZE)
                           return false;
      
                   bit = AUDIT_BIT(val);
      
                   return rule->mask[word] & bit;
      
        The clauses related to the rule need to be evaluated in the loop, but
        the rest is unnecessarily re-evaluated for every loop iteration.
        (Note, however, that most of these are cheap ALU ops and the branches
         are perfectly predicted. However, see discussion on cycles
         improvement below for more on why it is still worth hoisting.)
      
      On a Skylakex system change in getpid() latency (aggregated over
      12 boot cycles):
      
                   Min     Mean  Median     Max       pstdev
                  (ns)     (ns)    (ns)    (ns)
      
       -        201.30   216.14  216.22  228.46      (+- 1.45%)
       +        196.63   207.86  206.60  230.98      (+- 3.92%)
      
      Performance counter stats for 'bin/getpid' (3 runs) go from:
          cycles               836.89  (  +-   .80% )
          instructions        2000.19  (  +-   .03% )
          IPC                    2.39  (  +-   .83% )
          branches             430.14  (  +-   .03% )
          branch-misses          1.48  (  +-  3.37% )
          L1-dcache-loads      471.11  (  +-   .05% )
          L1-dcache-load-misses  7.62  (  +- 46.98% )
      
       to:
          cycles               805.58  (  +-  4.11% )
          instructions        1654.11  (  +-   .05% )
          IPC                    2.06  (  +-  3.39% )
          branches             430.02  (  +-   .05% )
          branch-misses          1.55  (  +-  7.09% )
          L1-dcache-loads      440.01  (  +-   .09% )
          L1-dcache-load-misses  9.05  (  +- 74.03% )
      
      (Both aggregated over 12 boot cycles.)
      
      instructions: we reduce around 8 instructions/iteration because some of
      the computation is now hoisted out of the loop (branch count does not
      change because GCC, for reasons unclear, only hoists the computations
      while keeping the basic-blocks.)
      
      cycles: improve by about 5% (in aggregate and looking at individual run
      numbers.) This is likely because we now waste fewer pipeline resources
      on unnecessary instructions which allows the control flow to
      speculatively execute further ahead shortening the execution of the loop
      a little. The final gating factor on the performance of this loop
      remains the long dependence chain due to the linked-list load.
      
      Signed-off-by: default avatarAnkur Arora <ankur.a.arora@oracle.com>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      06954599
  9. Aug 26, 2022
  10. Aug 25, 2022
  11. Aug 16, 2022
  12. Aug 04, 2022
    • Peilin Ye's avatar
      audit, io_uring, io-wq: Fix memory leak in io_sq_thread() and io_wqe_worker() · f482aa98
      Peilin Ye authored
      
      Currently @audit_context is allocated twice for io_uring workers:
      
        1. copy_process() calls audit_alloc();
        2. io_sq_thread() or io_wqe_worker() calls audit_alloc_kernel() (which
           is effectively audit_alloc()) and overwrites @audit_context,
           causing:
      
        BUG: memory leak
        unreferenced object 0xffff888144547400 (size 1024):
      <...>
          hex dump (first 32 bytes):
            00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00  ................
            00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          backtrace:
            [<ffffffff8135cfc3>] audit_alloc+0x133/0x210
            [<ffffffff81239e63>] copy_process+0xcd3/0x2340
            [<ffffffff8123b5f3>] create_io_thread+0x63/0x90
            [<ffffffff81686604>] create_io_worker+0xb4/0x230
            [<ffffffff81686f68>] io_wqe_enqueue+0x248/0x3b0
            [<ffffffff8167663a>] io_queue_iowq+0xba/0x200
            [<ffffffff816768b3>] io_queue_async+0x113/0x180
            [<ffffffff816840df>] io_req_task_submit+0x18f/0x1a0
            [<ffffffff816841cd>] io_apoll_task_func+0xdd/0x120
            [<ffffffff8167d49f>] tctx_task_work+0x11f/0x570
            [<ffffffff81272c4e>] task_work_run+0x7e/0xc0
            [<ffffffff8125a688>] get_signal+0xc18/0xf10
            [<ffffffff8111645b>] arch_do_signal_or_restart+0x2b/0x730
            [<ffffffff812ea44e>] exit_to_user_mode_prepare+0x5e/0x180
            [<ffffffff844ae1b2>] syscall_exit_to_user_mode+0x12/0x20
            [<ffffffff844a7e80>] do_syscall_64+0x40/0x80
      
      Then,
      
        3. io_sq_thread() or io_wqe_worker() frees @audit_context using
           audit_free();
        4. do_exit() eventually calls audit_free() again, which is okay
           because audit_free() does a NULL check.
      
      As suggested by Paul Moore, fix it by deleting audit_alloc_kernel() and
      redundant audit_free() calls.
      
      Fixes: 5bd2182d ("audit,io_uring,io-wq: add some basic audit support to io_uring")
      Suggested-by: default avatarPaul Moore <paul@paul-moore.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPeilin Ye <peilin.ye@bytedance.com>
      Acked-by: default avatarPaul Moore <paul@paul-moore.com>
      Link: https://lore.kernel.org/r/20220803222343.31673-1-yepeilin.cs@gmail.com
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f482aa98
  13. Jun 15, 2022
    • Christian Göttsche's avatar
      audit: free module name · ef79c396
      Christian Göttsche authored
      
      Reset the type of the record last as the helper `audit_free_module()`
      depends on it.
      
          unreferenced object 0xffff888153b707f0 (size 16):
            comm "modprobe", pid 1319, jiffies 4295110033 (age 1083.016s)
            hex dump (first 16 bytes):
              62 69 6e 66 6d 74 5f 6d 69 73 63 00 6b 6b 6b a5  binfmt_misc.kkk.
            backtrace:
              [<ffffffffa07dbf9b>] kstrdup+0x2b/0x50
              [<ffffffffa04b0a9d>] __audit_log_kern_module+0x4d/0xf0
              [<ffffffffa03b6664>] load_module+0x9d4/0x2e10
              [<ffffffffa03b8f44>] __do_sys_finit_module+0x114/0x1b0
              [<ffffffffa1f47124>] do_syscall_64+0x34/0x80
              [<ffffffffa200007e>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      Cc: stable@vger.kernel.org
      Fixes: 12c5e81d ("audit: prepare audit_context for use in calling contexts beyond syscalls")
      Signed-off-by: default avatarChristian Göttsche <cgzones@googlemail.com>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      ef79c396
  14. May 17, 2022
    • Julian Orth's avatar
      audit,io_uring,io-wq: call __audit_uring_exit for dummy contexts · 69e9cd66
      Julian Orth authored
      
      Not calling the function for dummy contexts will cause the context to
      not be reset. During the next syscall, this will cause an error in
      __audit_syscall_entry:
      
      	WARN_ON(context->context != AUDIT_CTX_UNUSED);
      	WARN_ON(context->name_count);
      	if (context->context != AUDIT_CTX_UNUSED || context->name_count) {
      		audit_panic("unrecoverable error in audit_syscall_entry()");
      		return;
      	}
      
      These problematic dummy contexts are created via the following call
      chain:
      
             exit_to_user_mode_prepare
          -> arch_do_signal_or_restart
          -> get_signal
          -> task_work_run
          -> tctx_task_work
          -> io_req_task_submit
          -> io_issue_sqe
          -> audit_uring_entry
      
      Cc: stable@vger.kernel.org
      Fixes: 5bd2182d ("audit,io_uring,io-wq: add some basic audit support to io_uring")
      Signed-off-by: default avatarJulian Orth <ju.orth@gmail.com>
      [PM: subject line tweaks]
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      69e9cd66
  15. Feb 22, 2022
  16. Feb 09, 2022
  17. Nov 22, 2021
  18. Oct 18, 2021
  19. Oct 04, 2021
    • Richard Guy Briggs's avatar
      audit: add OPENAT2 record to list "how" info · 571e5c0e
      Richard Guy Briggs authored
      Since the openat2(2) syscall uses a struct open_how pointer to communicate
      its parameters they are not usefully recorded by the audit SYSCALL record's
      four existing arguments.
      
      Add a new audit record type OPENAT2 that reports the parameters in its
      third argument, struct open_how with fields oflag, mode and resolve.
      
      The new record in the context of an event would look like:
      time->Wed Mar 17 16:28:53 2021
      type=PROCTITLE msg=audit(1616012933.531:184): proctitle=
        73797363616C6C735F66696C652F6F70656E617432002F746D702F61756469742D
        7465737473756974652D737641440066696C652D6F70656E617432
      type=PATH msg=audit(1616012933.531:184): item=1 name="file-openat2"
        inode=29 dev=00:1f mode=0100600 ouid=0 ogid=0 rdev=00:00
        obj=unconfined_u:object_r:user_tmp_t:s0 nametype=CREATE
        cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0
      type=PATH msg=audit(1616012933.531:184):
        item=0 name="/root/rgb/git/audit-testsuite/tests"
        inode=25 dev=00:1f mode=040700 ouid=0 ogid=0 rdev=00:00
        obj=unconfined_u:object_r:user_tmp_t:s0 nametype=PARENT
        cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0
      type=CWD msg=audit(1616012933.531:184):
        cwd="/root/rgb/git/audit-testsuite/tests"
      type=OPENAT2 msg=audit(1616012933.531:184):
        oflag=0100302 mode=0600 resolve=0xa
      type=SYSCALL msg=audit(1616012933.531:184): arch=c000003e syscall=437
        success=yes exit=4 a0=3 a1=7ffe315f1c53 a2=7ffe315f1550 a3=18
        items=2 ppid=528 pid=540 auid=0 uid=0 gid=0 euid=0 suid=0
        fsuid=0 egid=0 sgid=0 fsgid=0 tty=ttyS0 ses=1 comm="openat2"
        exe="/root/rgb/git/audit-testsuite/tests/syscalls_file/openat2"
        subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
        key="testsuite-1616012933-bjAUcEPO"
      
      Link: https://lore.kernel.org/r/d23fbb89186754487850367224b060e26f9b7181.1621363275.git.rgb@redhat.com
      
      
      Signed-off-by: default avatarRichard Guy Briggs <rgb@redhat.com>
      Acked-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      [PM: tweak subject, wrap example, move AUDIT_OPENAT2 to 1337]
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      571e5c0e
  20. Oct 01, 2021
  21. Sep 20, 2021
    • Paul Moore's avatar
      audit: add filtering for io_uring records · 67daf270
      Paul Moore authored
      
      This patch adds basic audit io_uring filtering, using as much of the
      existing audit filtering infrastructure as possible.  In order to do
      this we reuse the audit filter rule's syscall mask for the io_uring
      operation and we create a new filter for io_uring operations as
      AUDIT_FILTER_URING_EXIT/audit_filter_list[7].
      
      Thanks to Richard Guy Briggs for his review, feedback, and work on
      the corresponding audit userspace changes.
      
      Acked-by: default avatarRichard Guy Briggs <rgb@redhat.com>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      67daf270
    • Paul Moore's avatar
      audit,io_uring,io-wq: add some basic audit support to io_uring · 5bd2182d
      Paul Moore authored
      
      This patch adds basic auditing to io_uring operations, regardless of
      their context.  This is accomplished by allocating audit_context
      structures for the io-wq worker and io_uring SQPOLL kernel threads
      as well as explicitly auditing the io_uring operations in
      io_issue_sqe().  Individual io_uring operations can bypass auditing
      through the "audit_skip" field in the struct io_op_def definition for
      the operation; although great care must be taken so that security
      relevant io_uring operations do not bypass auditing; please contact
      the audit mailing list (see the MAINTAINERS file) with any questions.
      
      The io_uring operations are audited using a new AUDIT_URINGOP record,
      an example is shown below:
      
        type=UNKNOWN[1336] msg=audit(1631800225.981:37289):
          uring_op=19 success=yes exit=0 items=0 ppid=15454 pid=15681
          uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0
          subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
          key=(null)
      
      Thanks to Richard Guy Briggs for review and feedback.
      
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      5bd2182d
    • Paul Moore's avatar
      audit: prepare audit_context for use in calling contexts beyond syscalls · 12c5e81d
      Paul Moore authored
      
      This patch cleans up some of our audit_context handling by
      abstracting out the reset and return code fixup handling to dedicated
      functions.  Not only does this help make things easier to read and
      inspect, it allows for easier reuse by future patches.  We also
      convert the simple audit_context->in_syscall flag into an enum which
      can be used to by future patches to indicate a calling context other
      than the syscall context.
      
      Thanks to Richard Guy Briggs for review and feedback.
      
      Acked-by: default avatarRichard Guy Briggs <rgb@redhat.com>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      12c5e81d
  22. Sep 14, 2021
  23. Jun 11, 2021
  24. Jun 09, 2021
  25. May 10, 2021
  26. Mar 22, 2021
    • Paul Moore's avatar
      lsm: separate security_task_getsecid() into subjective and objective variants · 4ebd7651
      Paul Moore authored
      
      Of the three LSMs that implement the security_task_getsecid() LSM
      hook, all three LSMs provide the task's objective security
      credentials.  This turns out to be unfortunate as most of the hook's
      callers seem to expect the task's subjective credentials, although
      a small handful of callers do correctly expect the objective
      credentials.
      
      This patch is the first step towards fixing the problem: it splits
      the existing security_task_getsecid() hook into two variants, one
      for the subjective creds, one for the objective creds.
      
        void security_task_getsecid_subj(struct task_struct *p,
      				   u32 *secid);
        void security_task_getsecid_obj(struct task_struct *p,
      				  u32 *secid);
      
      While this patch does fix all of the callers to use the correct
      variant, in order to keep this patch focused on the callers and to
      ease review, the LSMs continue to use the same implementation for
      both hooks.  The net effect is that this patch should not change
      the behavior of the kernel in any way, it will be up to the latter
      LSM specific patches in this series to change the hook
      implementations and return the correct credentials.
      
      Acked-by: Mimi Zohar <zohar@linux.ibm.com> (IMA)
      Acked-by: default avatarCasey Schaufler <casey@schaufler-ca.com>
      Reviewed-by: default avatarRichard Guy Briggs <rgb@redhat.com>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      4ebd7651
  27. Mar 12, 2021
  28. Jan 28, 2021
  29. Jan 24, 2021
    • Christian Brauner's avatar
      commoncap: handle idmapped mounts · 71bc356f
      Christian Brauner authored
      When interacting with user namespace and non-user namespace aware
      filesystem capabilities the vfs will perform various security checks to
      determine whether or not the filesystem capabilities can be used by the
      caller, whether they need to be removed and so on. The main
      infrastructure for this resides in the capability codepaths but they are
      called through the LSM security infrastructure even though they are not
      technically an LSM or optional. This extends the existing security hooks
      security_inode_removexattr(), security_inode_killpriv(),
      security_inode_getsecurity() to pass down the mount's user namespace and
      makes them aware of idmapped mounts.
      
      In order to actually get filesystem capabilities from disk the
      capability infrastructure exposes the get_vfs_caps_from_disk() helper.
      For user namespace aware filesystem capabilities a root uid is stored
      alongside the capabilities.
      
      In order to determine whether the caller can make use of the filesystem
      capability or whether it needs to be ignored it is translated according
      to the superblock's user namespace. If it can be translated to uid 0
      according to that id mapping the caller can use the filesystem
      capabilities stored on disk. If we are accessing the inode that holds
      the filesystem capabilities through an idmapped mount we map the root
      uid according to the mount's user namespace. Afterwards the checks are
      identical to non-idmapped mounts: reading filesystem caps from disk
      enforces that the root uid associated with the filesystem capability
      must have a mapping in the superblock's user namespace and that the
      caller is either in the same user namespace or is a descendant of the
      superblock's user namespace. For filesystems that are mountable inside
      user namespace the caller can just mount the filesystem and won't
      usually need to idmap it. If they do want to idmap it they can create an
      idmapped mount and mark it with a user namespace they created and which
      is thus a descendant of s_user_ns. For filesystems that are not
      mountable inside user namespaces the descendant rule is trivially true
      because the s_user_ns will be the initial user namespace.
      
      If the initial user namespace is passed nothing changes so non-idmapped
      mounts will see identical behavior as before.
      
      Link: https://lore.kernel.org/r/20210121131959.646623-11-christian.brauner@ubuntu.com
      
      
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: linux-fsdevel@vger.kernel.org
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Acked-by: default avatarJames Morris <jamorris@linux.microsoft.com>
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      71bc356f
  30. Nov 25, 2020
    • Alex Shi's avatar
      audit: fix macros warnings · ba59eae7
      Alex Shi authored
      
      Some unused macros could cause gcc warning:
      kernel/audit.c:68:0: warning: macro "AUDIT_UNINITIALIZED" is not used
      [-Wunused-macros]
      kernel/auditsc.c:104:0: warning: macro "AUDIT_AUX_IPCPERM" is not used
      [-Wunused-macros]
      kernel/auditsc.c:82:0: warning: macro "AUDITSC_INVALID" is not used
      [-Wunused-macros]
      
      AUDIT_UNINITIALIZED and AUDITSC_INVALID are still meaningful and should
      be in incorporated.
      
      Just remove AUDIT_AUX_IPCPERM.
      
      Thanks comments from Richard Guy Briggs and Paul Moore.
      
      Signed-off-by: default avatarAlex Shi <alex.shi@linux.alibaba.com>
      Cc: Paul Moore <paul@paul-moore.com>
      Cc: Richard Guy Briggs <rgb@redhat.com>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: linux-audit@redhat.com
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      ba59eae7
Loading