Skip to content
Snippets Groups Projects
  1. Jul 03, 2024
  2. Aug 19, 2020
    • Kirill Tkhai's avatar
      ipc: Use generic ns_common::count · 137ec390
      Kirill Tkhai authored
      
      Switch over ipc namespaces to use the newly introduced common lifetime
      counter.
      
      Currently every namespace type has its own lifetime counter which is stored
      in the specific namespace struct. The lifetime counters are used
      identically for all namespaces types. Namespaces may of course have
      additional unrelated counters and these are not altered.
      
      This introduces a common lifetime counter into struct ns_common. The
      ns_common struct encompasses information that all namespaces share. That
      should include the lifetime counter since its common for all of them.
      
      It also allows us to unify the type of the counters across all namespaces.
      Most of them use refcount_t but one uses atomic_t and at least one uses
      kref. Especially the last one doesn't make much sense since it's just a
      wrapper around refcount_t since 2016 and actually complicates cleanup
      operations by having to use container_of() to cast the correct namespace
      struct out of struct ns_common.
      
      Having the lifetime counter for the namespaces in one place reduces
      maintenance cost. Not just because after switching all namespaces over we
      will have removed more code than we added but also because the logic is
      more easily understandable and we indicate to the user that the basic
      lifetime requirements for all namespaces are currently identical.
      
      Signed-off-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Acked-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Link: https://lore.kernel.org/r/159644978697.604812.16592754423881032385.stgit@localhost.localdomain
      
      
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      137ec390
  3. May 24, 2019
  4. May 15, 2019
    • Li Rongqing's avatar
      ipc: prevent lockup on alloc_msg and free_msg · d6a2946a
      Li Rongqing authored
      msgctl10 of ltp triggers the following lockup When CONFIG_KASAN is
      enabled on large memory SMP systems, the pages initialization can take a
      long time, if msgctl10 requests a huge block memory, and it will block
      rcu scheduler, so release cpu actively.
      
      After adding schedule() in free_msg, free_msg can not be called when
      holding spinlock, so adding msg to a tmp list, and free it out of
      spinlock
      
        rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
        rcu:     Tasks blocked on level-1 rcu_node (CPUs 16-31): P32505
        rcu:     Tasks blocked on level-1 rcu_node (CPUs 48-63): P34978
        rcu:     (detected by 11, t=35024 jiffies, g=44237529, q=16542267)
        msgctl10        R  running task    21608 32505   2794 0x00000082
        Call Trace:
         preempt_schedule_irq+0x4c/0xb0
         retint_kernel+0x1b/0x2d
        RIP: 0010:__is_insn_slot_addr+0xfb/0x250
        Code: 82 1d 00 48 8b 9b 90 00 00 00 4c 89 f7 49 c1 ee 03 e8 59 83 1d 00 48 b8 00 00 00 00 00 fc ff df 4c 39 eb 48 89 9d 58 ff ff ff <41> c6 04 06 f8 74 66 4c 8d 75 98 4c 89 f1 48 c1 e9 03 48 01 c8 48
        RSP: 0018:ffff88bce041f758 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
        RAX: dffffc0000000000 RBX: ffffffff8471bc50 RCX: ffffffff828a2a57
        RDX: dffffc0000000000 RSI: dffffc0000000000 RDI: ffff88bce041f780
        RBP: ffff88bce041f828 R08: ffffed15f3f4c5b3 R09: ffffed15f3f4c5b3
        R10: 0000000000000001 R11: ffffed15f3f4c5b2 R12: 000000318aee9b73
        R13: ffffffff8471bc50 R14: 1ffff1179c083ef0 R15: 1ffff1179c083eec
         kernel_text_address+0xc1/0x100
         __kernel_text_address+0xe/0x30
         unwind_get_return_address+0x2f/0x50
         __save_stack_trace+0x92/0x100
         create_object+0x380/0x650
         __kmalloc+0x14c/0x2b0
         load_msg+0x38/0x1a0
         do_msgsnd+0x19e/0xcf0
         do_syscall_64+0x117/0x400
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
        rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
        rcu:     Tasks blocked on level-1 rcu_node (CPUs 0-15): P32170
        rcu:     (detected by 14, t=35016 jiffies, g=44237525, q=12423063)
        msgctl10        R  running task    21608 32170  32155 0x00000082
        Call Trace:
         preempt_schedule_irq+0x4c/0xb0
         retint_kernel+0x1b/0x2d
        RIP: 0010:lock_acquire+0x4d/0x340
        Code: 48 81 ec c0 00 00 00 45 89 c6 4d 89 cf 48 8d 6c 24 20 48 89 3c 24 48 8d bb e4 0c 00 00 89 74 24 0c 48 c7 44 24 20 b3 8a b5 41 <48> c1 ed 03 48 c7 44 24 28 b4 25 18 84 48 c7 44 24 30 d0 54 7a 82
        RSP: 0018:ffff88af83417738 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
        RAX: dffffc0000000000 RBX: ffff88bd335f3080 RCX: 0000000000000002
        RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88bd335f3d64
        RBP: ffff88af83417758 R08: 0000000000000000 R09: 0000000000000000
        R10: 0000000000000001 R11: ffffed13f3f745b2 R12: 0000000000000000
        R13: 0000000000000002 R14: 0000000000000000 R15: 0000000000000000
         is_bpf_text_address+0x32/0xe0
         kernel_text_address+0xec/0x100
         __kernel_text_address+0xe/0x30
         unwind_get_return_address+0x2f/0x50
         __save_stack_trace+0x92/0x100
         save_stack+0x32/0xb0
         __kasan_slab_free+0x130/0x180
         kfree+0xfa/0x2d0
         free_msg+0x24/0x50
         do_msgrcv+0x508/0xe60
         do_syscall_64+0x117/0x400
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Davidlohr said:
       "So after releasing the lock, the msg rbtree/list is empty and new
        calls will not see those in the newly populated tmp_msg list, and
        therefore they cannot access the delayed msg freeing pointers, which
        is good. Also the fact that the node_cache is now freed before the
        actual messages seems to be harmless as this is wanted for
        msg_insert() avoiding GFP_ATOMIC allocations, and after releasing the
        info->lock the thing is freed anyway so it should not change things"
      
      Link: http://lkml.kernel.org/r/1552029161-4957-1-git-send-email-lirongqing@baidu.com
      
      
      Signed-off-by: default avatarLi RongQing <lirongqing@baidu.com>
      Signed-off-by: default avatarZhang Yu <zhangyu31@baidu.com>
      Reviewed-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d6a2946a
  5. Sep 09, 2017
  6. Oct 28, 2016
  7. Aug 02, 2016
  8. Nov 07, 2015
    • Davidlohr Bueso's avatar
      ipc,msg: drop dst nil validation in copy_msg · 5f2a2d5d
      Davidlohr Bueso authored
      
      d0edd852 ("ipc: convert invalid scenarios to use WARN_ON") relaxed the
      nil dst parameter check, originally being a full BUG_ON.  However, this
      check seems quite unnecessary when the only purpose is for
      ceckpoint/restore (MSG_COPY flag):
      
      o The copy variable is set initially to nil, apparently as a way of
        ensuring that prepare_copy is previously called.  Which is in fact done,
        unconditionally at the beginning of do_msgrcv.
      
      o There is no concurrency with 'copy' (stack allocated in do_msgrcv).
      
      Furthermore, any errors in 'copy' (and thus prepare_copy/copy_msg) should
      always handled by IS_ERR() family.  Therefore remove this check altogether
      as it can never occur with the current users.
      
      Signed-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5f2a2d5d
  9. Sep 10, 2015
    • Davidlohr Bueso's avatar
      ipc: convert invalid scenarios to use WARN_ON · d0edd852
      Davidlohr Bueso authored
      
      Considering Linus' past rants about the (ab)use of BUG in the kernel, I
      took a look at how we deal with such calls in ipc.  Given that any errors
      or corruption in ipc code are most likely contained within the set of
      processes participating in the broken mechanisms, there aren't really many
      strong fatal system failure scenarios that would require a BUG call.
      Also, if something is seriously wrong, ipc might not be the place for such
      a BUG either.
      
      1. For example, recently, a customer hit one of these BUG_ONs in shm
         after failing shm_lock().  A busted ID imho does not merit a BUG_ON,
         and WARN would have been better.
      
      2. MSG_COPY functionality of posix msgrcv(2) for checkpoint/restore.
         I don't see how we can hit this anyway -- at least it should be IS_ERR.
          The 'copy' arg from do_msgrcv is always set by calling prepare_copy()
         first and foremost.  We could also probably drop this check altogether.
          Either way, it does not merit a BUG_ON.
      
      3. No ->fault() callback for the fs getting the corresponding page --
         seems selfish to make the system unusable.
      
      Signed-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d0edd852
  10. Dec 04, 2014
  11. Nov 13, 2013
    • Mathias Krause's avatar
      ipc, msg: fix message length check for negative values · 4e9b45a1
      Mathias Krause authored
      
      On 64 bit systems the test for negative message sizes is bogus as the
      size, which may be positive when evaluated as a long, will get truncated
      to an int when passed to load_msg().  So a long might very well contain a
      positive value but when truncated to an int it would become negative.
      
      That in combination with a small negative value of msg_ctlmax (which will
      be promoted to an unsigned type for the comparison against msgsz, making
      it a big positive value and therefore make it pass the check) will lead to
      two problems: 1/ The kmalloc() call in alloc_msg() will allocate a too
      small buffer as the addition of alen is effectively a subtraction.  2/ The
      copy_from_user() call in load_msg() will first overflow the buffer with
      userland data and then, when the userland access generates an access
      violation, the fixup handler copy_user_handle_tail() will try to fill the
      remainder with zeros -- roughly 4GB.  That almost instantly results in a
      system crash or reset.
      
        ,-[ Reproducer (needs to be run as root) ]--
        | #include <sys/stat.h>
        | #include <sys/msg.h>
        | #include <unistd.h>
        | #include <fcntl.h>
        |
        | int main(void) {
        |     long msg = 1;
        |     int fd;
        |
        |     fd = open("/proc/sys/kernel/msgmax", O_WRONLY);
        |     write(fd, "-1", 2);
        |     close(fd);
        |
        |     msgsnd(0, &msg, 0xfffffff0, IPC_NOWAIT);
        |
        |     return 0;
        | }
        '---
      
      Fix the issue by preventing msgsz from getting truncated by consistently
      using size_t for the message length.  This way the size checks in
      do_msgsnd() could still be passed with a negative value for msg_ctlmax but
      we would fail on the buffer allocation in that case and error out.
      
      Also change the type of m_ts from int to size_t to avoid similar nastiness
      in other code paths -- it is used in similar constructs, i.e.  signed vs.
      unsigned checks.  It should never become negative under normal
      circumstances, though.
      
      Setting msg_ctlmax to a negative value is an odd configuration and should
      be prevented.  As that might break existing userland, it will be handled
      in a separate commit so it could easily be reverted and reworked without
      reintroducing the above described bug.
      
      Hardening mechanisms for user copy operations would have catched that bug
      early -- e.g.  checking slab object sizes on user copy operations as the
      usercopy feature of the PaX patch does.  Or, for that matter, detect the
      long vs.  int sign change due to truncation, as the size overflow plugin
      of the very same patch does.
      
      [akpm@linux-foundation.org: fix i386 min() warnings]
      Signed-off-by: default avatarMathias Krause <minipli@googlemail.com>
      Cc: Pax Team <pageexec@freemail.hu>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: Brad Spengler <spender@grsecurity.net>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: <stable@vger.kernel.org>	[ v2.3.27+ -- yes, that old ;) ]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4e9b45a1
  12. May 01, 2013
  13. Mar 08, 2013
  14. Jan 05, 2013
    • Stanislav Kinsbursky's avatar
      ipc: simplify message copying · 51eeacaa
      Stanislav Kinsbursky authored
      
      Remove the redundant and confusing fill_copy().  Also add copy_msg()
      check for error.  In this case exit from the function have to be done
      instead of break, because further code interprets any error as EAGAIN.
      
      Also define copy_msg() for the case when CONFIG_CHECKPOINT_RESTORE is
      disabled.
      
      Signed-off-by: default avatarStanislav Kinsbursky <skinsbursky@parallels.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: James Morris <jmorris@namei.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      51eeacaa
    • Stanislav Kinsbursky's avatar
      ipc: introduce message queue copy feature · 4a674f34
      Stanislav Kinsbursky authored
      
      This patch is required for checkpoint/restore in userspace.
      
      c/r requires some way to get all pending IPC messages without deleting
      them from the queue (checkpoint can fail and in this case tasks will be
      resumed, so queue have to be valid).
      
      To achive this, new operation flag MSG_COPY for sys_msgrcv() system call
      was introduced.  If this flag was specified, then mtype is interpreted as
      number of the message to copy.
      
      If MSG_COPY is set, then kernel will allocate dummy message with passed
      size, and then use new copy_msg() helper function to copy desired message
      (instead of unlinking it from the queue).
      
      Notes:
      
      1) Return -ENOSYS if MSG_COPY is specified, but
         CONFIG_CHECKPOINT_RESTORE is not set.
      
      Signed-off-by: default avatarStanislav Kinsbursky <skinsbursky@parallels.com>
      Cc: Serge Hallyn <serge.hallyn@canonical.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4a674f34
  15. Nov 20, 2012
    • Eric W. Biederman's avatar
      proc: Usable inode numbers for the namespace file descriptors. · 98f842e6
      Eric W. Biederman authored
      
      Assign a unique proc inode to each namespace, and use that
      inode number to ensure we only allocate at most one proc
      inode for every namespace in proc.
      
      A single proc inode per namespace allows userspace to test
      to see if two processes are in the same namespace.
      
      This has been a long requested feature and only blocked because
      a naive implementation would put the id in a global space and
      would ultimately require having a namespace for the names of
      namespaces, making migration and certain virtualization tricks
      impossible.
      
      We still don't have per superblock inode numbers for proc, which
      appears necessary for application unaware checkpoint/restart and
      migrations (if the application is using namespace file descriptors)
      but that is now allowd by the design if it becomes important.
      
      I have preallocated the ipc and uts initial proc inode numbers so
      their structures can be statically initialized.
      
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      98f842e6
  16. Feb 13, 2012
  17. Dec 09, 2011
  18. Mar 24, 2011
  19. Apr 07, 2009
    • Serge Hallyn's avatar
      namespaces: ipc namespaces: implement support for posix msqueues · 7eafd7c7
      Serge Hallyn authored
      
      Implement multiple mounts of the mqueue file system, and link it to usage
      of CLONE_NEWIPC.
      
      Each ipc ns has a corresponding mqueuefs superblock.  When a user does
      clone(CLONE_NEWIPC) or unshare(CLONE_NEWIPC), the unshare will cause an
      internal mount of a new mqueuefs sb linked to the new ipc ns.
      
      When a user does 'mount -t mqueue mqueue /dev/mqueue', he mounts the
      mqueuefs superblock.
      
      Posix message queues can be worked with both through the mq_* system calls
      (see mq_overview(7)), and through the VFS through the mqueue mount.  Any
      usage of mq_open() and friends will work with the acting task's ipc
      namespace.  Any actions through the VFS will work with the mqueuefs in
      which the file was created.  So if a user doesn't remount mqueuefs after
      unshare(CLONE_NEWIPC), mq_open("/ab") will not be reflected in "ls
      /dev/mqueue".
      
      If task a mounts mqueue for ipc_ns:1, then clones task b with a new ipcns,
      ipcns:2, and then task a is the last task in ipc_ns:1 to exit, then (1)
      ipc_ns:1 will be freed, (2) it's superblock will live on until task b
      umounts the corresponding mqueuefs, and vfs actions will continue to
      succeed, but (3) sb->s_fs_info will be NULL for the sb corresponding to
      the deceased ipc_ns:1.
      
      To make this happen, we must protect the ipc reference count when
      
      a) a task exits and drops its ipcns->count, since it might be dropping
         it to 0 and freeing the ipcns
      
      b) a task accesses the ipcns through its mqueuefs interface, since it
         bumps the ipcns refcount and might race with the last task in the ipcns
         exiting.
      
      So the kref is changed to an atomic_t so we can use
      atomic_dec_and_lock(&ns->count,mq_lock), and every access to the ipcns
      through ns = mqueuefs_sb->s_fs_info is protected by the same lock.
      
      Signed-off-by: default avatarCedric Le Goater <clg@fr.ibm.com>
      Signed-off-by: default avatarSerge E. Hallyn <serue@us.ibm.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7eafd7c7
    • Serge Hallyn's avatar
      namespaces: mqueue ns: move mqueue_mnt into struct ipc_namespace · 614b84cf
      Serge Hallyn authored
      
      Move mqueue vfsmount plus a few tunables into the ipc_namespace struct.
      The CONFIG_IPC_NS boolean and the ipc_namespace struct will serve both the
      posix message queue namespaces and the SYSV ipc namespaces.
      
      The sysctl code will be fixed separately in patch 3.  After just this
      patch, making a change to posix mqueue tunables always changes the values
      in the initial ipc namespace.
      
      Signed-off-by: default avatarCedric Le Goater <clg@fr.ibm.com>
      Signed-off-by: default avatarSerge E. Hallyn <serue@us.ibm.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      614b84cf
  20. Dec 13, 2006
    • Robert P. J. Day's avatar
      [PATCH] getting rid of all casts of k[cmz]alloc() calls · 5cbded58
      Robert P. J. Day authored
      
      Run this:
      
      	#!/bin/sh
      	for f in $(grep -Erl "\([^\)]*\) *k[cmz]alloc" *) ; do
      	  echo "De-casting $f..."
      	  perl -pi -e "s/ ?= ?\([^\)]*\) *(k[cmz]alloc) *\(/ = \1\(/" $f
      	done
      
      And then go through and reinstate those cases where code is casting pointers
      to non-pointers.
      
      And then drop a few hunks which conflicted with outstanding work.
      
      Cc: Russell King <rmk@arm.linux.org.uk>, Ian Molton <spyro@f2s.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Paul Fulghum <paulkf@microgate.com>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Karsten Keil <kkeil@suse.de>
      Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
      Cc: Jeff Garzik <jeff@garzik.org>
      Cc: James Bottomley <James.Bottomley@steeleye.com>
      Cc: Ian Kent <raven@themaw.net>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Neil Brown <neilb@cse.unsw.edu.au>
      Cc: Jaroslav Kysela <perex@suse.cz>
      Cc: Takashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      5cbded58
  21. Oct 03, 2006
  22. Apr 16, 2005
    • Linus Torvalds's avatar
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds authored
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4
Loading