Skip to content
Snippets Groups Projects
  1. Sep 13, 2023
  2. Aug 08, 2023
  3. May 25, 2023
  4. Mar 20, 2023
    • Paul Moore's avatar
      selinux: remove the runtime disable functionality · f22f9aaf
      Paul Moore authored
      
      After working with the larger SELinux-based distros for several
      years, we're finally at a place where we can disable the SELinux
      runtime disable functionality.  The existing kernel deprecation
      notice explains the functionality and why we want to remove it:
      
        The selinuxfs "disable" node allows SELinux to be disabled at
        runtime prior to a policy being loaded into the kernel.  If
        disabled via this mechanism, SELinux will remain disabled until
        the system is rebooted.
      
        The preferred method of disabling SELinux is via the "selinux=0"
        boot parameter, but the selinuxfs "disable" node was created to
        make it easier for systems with primitive bootloaders that did not
        allow for easy modification of the kernel command line.
        Unfortunately, allowing for SELinux to be disabled at runtime makes
        it difficult to secure the kernel's LSM hooks using the
        "__ro_after_init" feature.
      
      It is that last sentence, mentioning the '__ro_after_init' hardening,
      which is the real motivation for this change, and if you look at the
      diffstat you'll see that the impact of this patch reaches across all
      the different LSMs, helping prevent tampering at the LSM hook level.
      
      From a SELinux perspective, it is important to note that if you
      continue to disable SELinux via "/etc/selinux/config" it may appear
      that SELinux is disabled, but it is simply in an uninitialized state.
      If you load a policy with `load_policy -i`, you will see SELinux
      come alive just as if you had loaded the policy during early-boot.
      
      It is also worth noting that the "/sys/fs/selinux/disable" file is
      always writable now, regardless of the Kconfig settings, but writing
      to the file has no effect on the system, other than to display an
      error on the console if a non-zero/true value is written.
      
      Finally, in the several years where we have been working on
      deprecating this functionality, there has only been one instance of
      someone mentioning any user visible breakage.  In this particular
      case it was an individual's kernel test system, and the workaround
      documented in the deprecation notice ("selinux=0" on the kernel
      command line) resolved the issue without problem.
      
      Acked-by: default avatarCasey Schaufler <casey@schaufler-ca.com>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      f22f9aaf
  5. Mar 01, 2023
    • Linus Torvalds's avatar
      capability: just use a 'u64' instead of a 'u32[2]' array · f122a08b
      Linus Torvalds authored
      
      Back in 2008 we extended the capability bits from 32 to 64, and we did
      it by extending the single 32-bit capability word from one word to an
      array of two words.  It was then obfuscated by hiding the "2" behind two
      macro expansions, with the reasoning being that maybe it gets extended
      further some day.
      
      That reasoning may have been valid at the time, but the last thing we
      want to do is to extend the capability set any more.  And the array of
      values not only causes source code oddities (with loops to deal with
      it), but also results in worse code generation.  It's a lose-lose
      situation.
      
      So just change the 'u32[2]' into a 'u64' and be done with it.
      
      We still have to deal with the fact that the user space interface is
      designed around an array of these 32-bit values, but that was the case
      before too, since the array layouts were different (ie user space
      doesn't use an array of 32-bit values for individual capability masks,
      but an array of 32-bit slices of multiple masks).
      
      So that marshalling of data is actually simplified too, even if it does
      remain somewhat obscure and odd.
      
      This was all triggered by my reaction to the new "cap_isidentical()"
      introduced recently.  By just using a saner data structure, it went from
      
      	unsigned __capi;
      	CAP_FOR_EACH_U32(__capi) {
      		if (a.cap[__capi] != b.cap[__capi])
      			return false;
      	}
      	return true;
      
      to just being
      
      	return a.val == b.val;
      
      instead.  Which is rather more obvious both to humans and to compilers.
      
      Cc: Mateusz Guzik <mjguzik@gmail.com>
      Cc: Casey Schaufler <casey@schaufler-ca.com>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Paul Moore <paul@paul-moore.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f122a08b
  6. Jan 19, 2023
    • Christian Brauner's avatar
      fs: port vfs{g,u}id helpers to mnt_idmap · 4d7ca409
      Christian Brauner authored
      
      Convert to struct mnt_idmap.
      
      Last cycle we merged the necessary infrastructure in
      256c8aed ("fs: introduce dedicated idmap type for mounts").
      This is just the conversion to struct mnt_idmap.
      
      Currently we still pass around the plain namespace that was attached to a
      mount. This is in general pretty convenient but it makes it easy to
      conflate namespaces that are relevant on the filesystem with namespaces
      that are relevent on the mount level. Especially for non-vfs developers
      without detailed knowledge in this area this can be a potential source for
      bugs.
      
      Once the conversion to struct mnt_idmap is done all helpers down to the
      really low-level helpers will take a struct mnt_idmap argument instead of
      two namespace arguments. This way it becomes impossible to conflate the two
      eliminating the possibility of any bugs. All of the vfs and all filesystems
      only operate on struct mnt_idmap.
      
      Acked-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      4d7ca409
    • Christian Brauner's avatar
      fs: port privilege checking helpers to mnt_idmap · 9452e93e
      Christian Brauner authored
      
      Convert to struct mnt_idmap.
      
      Last cycle we merged the necessary infrastructure in
      256c8aed ("fs: introduce dedicated idmap type for mounts").
      This is just the conversion to struct mnt_idmap.
      
      Currently we still pass around the plain namespace that was attached to a
      mount. This is in general pretty convenient but it makes it easy to
      conflate namespaces that are relevant on the filesystem with namespaces
      that are relevent on the mount level. Especially for non-vfs developers
      without detailed knowledge in this area this can be a potential source for
      bugs.
      
      Once the conversion to struct mnt_idmap is done all helpers down to the
      really low-level helpers will take a struct mnt_idmap argument instead of
      two namespace arguments. This way it becomes impossible to conflate the two
      eliminating the possibility of any bugs. All of the vfs and all filesystems
      only operate on struct mnt_idmap.
      
      Acked-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      9452e93e
    • Christian Brauner's avatar
      fs: port xattr to mnt_idmap · 39f60c1c
      Christian Brauner authored
      
      Convert to struct mnt_idmap.
      
      Last cycle we merged the necessary infrastructure in
      256c8aed ("fs: introduce dedicated idmap type for mounts").
      This is just the conversion to struct mnt_idmap.
      
      Currently we still pass around the plain namespace that was attached to a
      mount. This is in general pretty convenient but it makes it easy to
      conflate namespaces that are relevant on the filesystem with namespaces
      that are relevent on the mount level. Especially for non-vfs developers
      without detailed knowledge in this area this can be a potential source for
      bugs.
      
      Once the conversion to struct mnt_idmap is done all helpers down to the
      really low-level helpers will take a struct mnt_idmap argument instead of
      two namespace arguments. This way it becomes impossible to conflate the two
      eliminating the possibility of any bugs. All of the vfs and all filesystems
      only operate on struct mnt_idmap.
      
      Acked-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      39f60c1c
    • Christian Brauner's avatar
      fs: port ->permission() to pass mnt_idmap · 4609e1f1
      Christian Brauner authored
      
      Convert to struct mnt_idmap.
      
      Last cycle we merged the necessary infrastructure in
      256c8aed ("fs: introduce dedicated idmap type for mounts").
      This is just the conversion to struct mnt_idmap.
      
      Currently we still pass around the plain namespace that was attached to a
      mount. This is in general pretty convenient but it makes it easy to
      conflate namespaces that are relevant on the filesystem with namespaces
      that are relevent on the mount level. Especially for non-vfs developers
      without detailed knowledge in this area this can be a potential source for
      bugs.
      
      Once the conversion to struct mnt_idmap is done all helpers down to the
      really low-level helpers will take a struct mnt_idmap argument instead of
      two namespace arguments. This way it becomes impossible to conflate the two
      eliminating the possibility of any bugs. All of the vfs and all filesystems
      only operate on struct mnt_idmap.
      
      Acked-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      4609e1f1
  7. Nov 18, 2022
  8. Oct 28, 2022
    • Gaosheng Cui's avatar
      capabilities: fix potential memleak on error path from vfs_getxattr_alloc() · 8cf0a1bc
      Gaosheng Cui authored
      
      In cap_inode_getsecurity(), we will use vfs_getxattr_alloc() to
      complete the memory allocation of tmpbuf, if we have completed
      the memory allocation of tmpbuf, but failed to call handler->get(...),
      there will be a memleak in below logic:
      
        |-- ret = (int)vfs_getxattr_alloc(mnt_userns, ...)
          |           /* ^^^ alloc for tmpbuf */
          |-- value = krealloc(*xattr_value, error + 1, flags)
          |           /* ^^^ alloc memory */
          |-- error = handler->get(handler, ...)
          |           /* error! */
          |-- *xattr_value = value
          |           /* xattr_value is &tmpbuf (memory leak!) */
      
      So we will try to free(tmpbuf) after vfs_getxattr_alloc() fails to fix it.
      
      Cc: stable@vger.kernel.org
      Fixes: 8db6c34f ("Introduce v3 namespaced file capabilities")
      Signed-off-by: default avatarGaosheng Cui <cuigaosheng1@huawei.com>
      Acked-by: default avatarSerge Hallyn <serge@hallyn.com>
      [PM: subject line and backtrace tweaks]
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      8cf0a1bc
  9. Oct 26, 2022
  10. Dec 05, 2021
    • Christian Brauner's avatar
      fs: support mapped mounts of mapped filesystems · bd303368
      Christian Brauner authored
      In previous patches we added new and modified existing helpers to handle
      idmapped mounts of filesystems mounted with an idmapping. In this final
      patch we convert all relevant places in the vfs to actually pass the
      filesystem's idmapping into these helpers.
      
      With this the vfs is in shape to handle idmapped mounts of filesystems
      mounted with an idmapping. Note that this is just the generic
      infrastructure. Actually adding support for idmapped mounts to a
      filesystem mountable with an idmapping is follow-up work.
      
      In this patch we extend the definition of an idmapped mount from a mount
      that that has the initial idmapping attached to it to a mount that has
      an idmapping attached to it which is not the same as the idmapping the
      filesystem was mounted with.
      
      As before we do not allow the initial idmapping to be attached to a
      mount. In addition this patch prevents that the idmapping the filesystem
      was mounted with can be attached to a mount created based on this
      filesystem.
      
      This has multiple reasons and advantages. First, attaching the initial
      idmapping or the filesystem's idmapping doesn't make much sense as in
      both cases the values of the i_{g,u}id and other places where k{g,u}ids
      are used do not change. Second, a user that really wants to do this for
      whatever reason can just create a separate dedicated identical idmapping
      to attach to the mount. Third, we can continue to use the initial
      idmapping as an indicator that a mount is not idmapped allowing us to
      continue to keep passing the initial idmapping into the mapping helpers
      to tell them that something isn't an idmapped mount even if the
      filesystem is mounted with an idmapping.
      
      Link: https://lore.kernel.org/r/20211123114227.3124056-11-brauner@kernel.org (v1)
      Link: https://lore.kernel.org/r/20211130121032.3753852-11-brauner@kernel.org (v2)
      Link: https://lore.kernel.org/r/20211203111707.3901969-11-brauner@kernel.org
      
      
      Cc: Seth Forshee <sforshee@digitalocean.com>
      Cc: Amir Goldstein <amir73il@gmail.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      CC: linux-fsdevel@vger.kernel.org
      Reviewed-by: default avatarSeth Forshee <sforshee@digitalocean.com>
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      bd303368
  11. Dec 03, 2021
  12. Apr 15, 2021
  13. Mar 24, 2021
    • Arnd Bergmann's avatar
      security: commoncap: fix -Wstringop-overread warning · 82e5d8cc
      Arnd Bergmann authored
      
      gcc-11 introdces a harmless warning for cap_inode_getsecurity:
      
      security/commoncap.c: In function ‘cap_inode_getsecurity’:
      security/commoncap.c:440:33: error: ‘memcpy’ reading 16 bytes from a region of size 0 [-Werror=stringop-overread]
        440 |                                 memcpy(&nscap->data, &cap->data, sizeof(__le32) * 2 * VFS_CAP_U32);
            |                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      The problem here is that tmpbuf is initialized to NULL, so gcc assumes
      it is not accessible unless it gets set by vfs_getxattr_alloc().  This is
      a legitimate warning as far as I can tell, but the code is correct since
      it correctly handles the error when that function fails.
      
      Add a separate NULL check to tell gcc about it as well.
      
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: default avatarJames Morris <jamorris@linux.microsoft.com>
      82e5d8cc
  14. Mar 12, 2021
  15. Jan 28, 2021
    • Miklos Szeredi's avatar
      cap: fix conversions on getxattr · f2b00be4
      Miklos Szeredi authored
      
      If a capability is stored on disk in v2 format cap_inode_getsecurity() will
      currently return in v2 format unconditionally.
      
      This is wrong: v2 cap should be equivalent to a v3 cap with zero rootid,
      and so the same conversions performed on it.
      
      If the rootid cannot be mapped, v3 is returned unconverted.  Fix this so
      that both v2 and v3 return -EOVERFLOW if the rootid (or the owner of the fs
      user namespace in case of v2) cannot be mapped into the current user
      namespace.
      
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Acked-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      f2b00be4
  16. Jan 24, 2021
    • Christian Brauner's avatar
      commoncap: handle idmapped mounts · 71bc356f
      Christian Brauner authored
      When interacting with user namespace and non-user namespace aware
      filesystem capabilities the vfs will perform various security checks to
      determine whether or not the filesystem capabilities can be used by the
      caller, whether they need to be removed and so on. The main
      infrastructure for this resides in the capability codepaths but they are
      called through the LSM security infrastructure even though they are not
      technically an LSM or optional. This extends the existing security hooks
      security_inode_removexattr(), security_inode_killpriv(),
      security_inode_getsecurity() to pass down the mount's user namespace and
      makes them aware of idmapped mounts.
      
      In order to actually get filesystem capabilities from disk the
      capability infrastructure exposes the get_vfs_caps_from_disk() helper.
      For user namespace aware filesystem capabilities a root uid is stored
      alongside the capabilities.
      
      In order to determine whether the caller can make use of the filesystem
      capability or whether it needs to be ignored it is translated according
      to the superblock's user namespace. If it can be translated to uid 0
      according to that id mapping the caller can use the filesystem
      capabilities stored on disk. If we are accessing the inode that holds
      the filesystem capabilities through an idmapped mount we map the root
      uid according to the mount's user namespace. Afterwards the checks are
      identical to non-idmapped mounts: reading filesystem caps from disk
      enforces that the root uid associated with the filesystem capability
      must have a mapping in the superblock's user namespace and that the
      caller is either in the same user namespace or is a descendant of the
      superblock's user namespace. For filesystems that are mountable inside
      user namespace the caller can just mount the filesystem and won't
      usually need to idmap it. If they do want to idmap it they can create an
      idmapped mount and mark it with a user namespace they created and which
      is thus a descendant of s_user_ns. For filesystems that are not
      mountable inside user namespaces the descendant rule is trivially true
      because the s_user_ns will be the initial user namespace.
      
      If the initial user namespace is passed nothing changes so non-idmapped
      mounts will see identical behavior as before.
      
      Link: https://lore.kernel.org/r/20210121131959.646623-11-christian.brauner@ubuntu.com
      
      
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: linux-fsdevel@vger.kernel.org
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Acked-by: default avatarJames Morris <jamorris@linux.microsoft.com>
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      71bc356f
    • Tycho Andersen's avatar
      xattr: handle idmapped mounts · c7c7a1a1
      Tycho Andersen authored
      When interacting with extended attributes the vfs verifies that the
      caller is privileged over the inode with which the extended attribute is
      associated. For posix access and posix default extended attributes a uid
      or gid can be stored on-disk. Let the functions handle posix extended
      attributes on idmapped mounts. If the inode is accessed through an
      idmapped mount we need to map it according to the mount's user
      namespace. Afterwards the checks are identical to non-idmapped mounts.
      This has no effect for e.g. security xattrs since they don't store uids
      or gids and don't perform permission checks on them like posix acls do.
      
      Link: https://lore.kernel.org/r/20210121131959.646623-10-christian.brauner@ubuntu.com
      
      
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: linux-fsdevel@vger.kernel.org
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJames Morris <jamorris@linux.microsoft.com>
      Signed-off-by: default avatarTycho Andersen <tycho@tycho.pizza>
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      c7c7a1a1
    • Christian Brauner's avatar
      acl: handle idmapped mounts · e65ce2a5
      Christian Brauner authored
      The posix acl permission checking helpers determine whether a caller is
      privileged over an inode according to the acls associated with the
      inode. Add helpers that make it possible to handle acls on idmapped
      mounts.
      
      The vfs and the filesystems targeted by this first iteration make use of
      posix_acl_fix_xattr_from_user() and posix_acl_fix_xattr_to_user() to
      translate basic posix access and default permissions such as the
      ACL_USER and ACL_GROUP type according to the initial user namespace (or
      the superblock's user namespace) to and from the caller's current user
      namespace. Adapt these two helpers to handle idmapped mounts whereby we
      either map from or into the mount's user namespace depending on in which
      direction we're translating.
      Similarly, cap_convert_nscap() is used by the vfs to translate user
      namespace and non-user namespace aware filesystem capabilities from the
      superblock's user namespace to the caller's user namespace. Enable it to
      handle idmapped mounts by accounting for the mount's user namespace.
      
      In addition the fileystems targeted in the first iteration of this patch
      series make use of the posix_acl_chmod() and, posix_acl_update_mode()
      helpers. Both helpers perform permission checks on the target inode. Let
      them handle idmapped mounts. These two helpers are called when posix
      acls are set by the respective filesystems to handle this case we extend
      the ->set() method to take an additional user namespace argument to pass
      the mount's user namespace down.
      
      Link: https://lore.kernel.org/r/20210121131959.646623-9-christian.brauner@ubuntu.com
      
      
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: linux-fsdevel@vger.kernel.org
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      e65ce2a5
    • Christian Brauner's avatar
      capability: handle idmapped mounts · 0558c1bf
      Christian Brauner authored
      In order to determine whether a caller holds privilege over a given
      inode the capability framework exposes the two helpers
      privileged_wrt_inode_uidgid() and capable_wrt_inode_uidgid(). The former
      verifies that the inode has a mapping in the caller's user namespace and
      the latter additionally verifies that the caller has the requested
      capability in their current user namespace.
      If the inode is accessed through an idmapped mount map it into the
      mount's user namespace. Afterwards the checks are identical to
      non-idmapped inodes. If the initial user namespace is passed all
      operations are a nop so non-idmapped mounts will not see a change in
      behavior.
      
      Link: https://lore.kernel.org/r/20210121131959.646623-5-christian.brauner@ubuntu.com
      
      
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: linux-fsdevel@vger.kernel.org
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJames Morris <jamorris@linux.microsoft.com>
      Acked-by: default avatarSerge Hallyn <serge@hallyn.com>
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      0558c1bf
  17. Dec 29, 2020
  18. Dec 14, 2020
  19. May 30, 2020
    • Eric W. Biederman's avatar
      exec: Compute file based creds only once · 56305aa9
      Eric W. Biederman authored
      
      Move the computation of creds from prepare_binfmt into begin_new_exec
      so that the creds need only be computed once.  This is just code
      reorganization no semantic changes of any kind are made.
      
      Moving the computation is safe.  I have looked through the kernel and
      verified none of the binfmts look at bprm->cred directly, and that
      there are no helpers that look at bprm->cred indirectly.  Which means
      that it is not a problem to compute the bprm->cred later in the
      execution flow as it is not used until it becomes current->cred.
      
      A new function bprm_creds_from_file is added to contain the work that
      needs to be done.  bprm_creds_from_file first computes which file
      bprm->executable or most likely bprm->file that the bprm->creds
      will be computed from.
      
      The funciton bprm_fill_uid is updated to receive the file instead of
      accessing bprm->file.  The now unnecessary work needed to reset the
      bprm->cred->euid, and bprm->cred->egid is removed from brpm_fill_uid.
      A small comment to document that bprm_fill_uid now only deals with the
      work to handle suid and sgid files.  The default case is already
      heandled by prepare_exec_creds.
      
      The function security_bprm_repopulate_creds is renamed
      security_bprm_creds_from_file and now is explicitly passed the file
      from which to compute the creds.  The documentation of the
      bprm_creds_from_file security hook is updated to explain when the hook
      is called and what it needs to do.  The file is passed from
      cap_bprm_creds_from_file into get_file_caps so that the caps are
      computed for the appropriate file.  The now unnecessary work in
      cap_bprm_creds_from_file to reset the ambient capabilites has been
      removed.  A small comment to document that the work of
      cap_bprm_creds_from_file is to read capabilities from the files
      secureity attribute and derive capabilities from the fact the
      user had uid 0 has been added.
      
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      56305aa9
    • Eric W. Biederman's avatar
      exec: Add a per bprm->file version of per_clear · a7868323
      Eric W. Biederman authored
      There is a small bug in the code that recomputes parts of bprm->cred
      for every bprm->file.  The code never recomputes the part of
      clear_dangerous_personality_flags it is responsible for.
      
      Which means that in practice if someone creates a sgid script
      the interpreter will not be able to use any of:
      	READ_IMPLIES_EXEC
      	ADDR_NO_RANDOMIZE
      	ADDR_COMPAT_LAYOUT
      	MMAP_PAGE_ZERO.
      
      This accentially clearing of personality flags probably does
      not matter in practice because no one has complained
      but it does make the code more difficult to understand.
      
      Further remaining bug compatible prevents the recomputation from being
      removed and replaced by simply computing bprm->cred once from the
      final bprm->file.
      
      Making this change removes the last behavior difference between
      computing bprm->creds from the final file and recomputing
      bprm->cred several times.  Which allows this behavior change
      to be justified for it's own reasons, and for any but hunts
      looking into why the behavior changed to wind up here instead
      of in the code that will follow that computes bprm->cred
      from the final bprm->file.
      
      This small logic bug appears to have existed since the code
      started clearing dangerous personality bits.
      
      History Tree: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
      
      
      Fixes: 1bb0fa189c6a ("[PATCH] NX: clean up legacy binary support")
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      a7868323
  20. May 26, 2020
    • Eric W. Biederman's avatar
      exec: Always set cap_ambient in cap_bprm_set_creds · a4ae32c7
      Eric W. Biederman authored
      
      An invariant of cap_bprm_set_creds is that every field in the new cred
      structure that cap_bprm_set_creds might set, needs to be set every
      time to ensure the fields does not get a stale value.
      
      The field cap_ambient is not set every time cap_bprm_set_creds is
      called, which means that if there is a suid or sgid script with an
      interpreter that has neither the suid nor the sgid bits set the
      interpreter should be able to accept ambient credentials.
      Unfortuantely because cap_ambient is not reset to it's original value
      the interpreter can not accept ambient credentials.
      
      Given that the ambient capability set is expected to be controlled by
      the caller, I don't think this is particularly serious.  But it is
      definitely worth fixing so the code works correctly.
      
      I have tested to verify my reading of the code is correct and the
      interpreter of a sgid can receive ambient capabilities with this
      change and cannot receive ambient capabilities without this change.
      
      Cc: stable@vger.kernel.org
      Cc: Andy Lutomirski <luto@kernel.org>
      Fixes: 58319057 ("capabilities: ambient capabilities")
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      a4ae32c7
  21. May 21, 2020
  22. Jul 07, 2019
  23. Jun 11, 2019
  24. May 30, 2019
  25. Feb 25, 2019
  26. Jan 25, 2019
  27. Jan 10, 2019
  28. Jan 08, 2019
  29. Dec 12, 2018
    • Paul Gortmaker's avatar
      security: audit and remove any unnecessary uses of module.h · 876979c9
      Paul Gortmaker authored
      
      Historically a lot of these existed because we did not have
      a distinction between what was modular code and what was providing
      support to modules via EXPORT_SYMBOL and friends.  That changed
      when we forked out support for the latter into the export.h file.
      This means we should be able to reduce the usage of module.h
      in code that is obj-y Makefile or bool Kconfig.
      
      The advantage in removing such instances is that module.h itself
      sources about 15 other headers; adding significantly to what we feed
      cpp, and it can obscure what headers we are effectively using.
      
      Since module.h might have been the implicit source for init.h
      (for __init) and for export.h (for EXPORT_SYMBOL) we consider each
      instance for the presence of either and replace as needed.
      
      Cc: James Morris <jmorris@namei.org>
      Cc: "Serge E. Hallyn" <serge@hallyn.com>
      Cc: John Johansen <john.johansen@canonical.com>
      Cc: Mimi Zohar <zohar@linux.ibm.com>
      Cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: linux-security-module@vger.kernel.org
      Cc: linux-integrity@vger.kernel.org
      Cc: keyrings@vger.kernel.org
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: default avatarJames Morris <james.morris@microsoft.com>
      876979c9
  30. Aug 29, 2018
  31. Aug 11, 2018
    • Eddie.Horng's avatar
      cap_inode_getsecurity: use d_find_any_alias() instead of d_find_alias() · 355139a8
      Eddie.Horng authored
      The code in cap_inode_getsecurity(), introduced by commit 8db6c34f
      ("Introduce v3 namespaced file capabilities"), should use
      d_find_any_alias() instead of d_find_alias() do handle unhashed dentry
      correctly. This is needed, for example, if execveat() is called with an
      open but unlinked overlayfs file, because overlayfs unhashes dentry on
      unlink.
      This is a regression of real life application, first reported at
      https://www.spinics.net/lists/linux-unionfs/msg05363.html
      
      
      
      Below reproducer and setup can reproduce the case.
        const char* exec="echo";
        const char *newargv[] = { "echo", "hello", NULL};
        const char *newenviron[] = { NULL };
        int fd, err;
      
        fd = open(exec, O_PATH);
        unlink(exec);
        err = syscall(322/*SYS_execveat*/, fd, "", newargv, newenviron,
      AT_EMPTY_PATH);
        if(err<0)
          fprintf(stderr, "execveat: %s\n", strerror(errno));
      
      gcc compile into ~/test/a.out
      mount -t overlay -orw,lowerdir=/mnt/l,upperdir=/mnt/u,workdir=/mnt/w
      none /mnt/m
      cd /mnt/m
      cp /bin/echo .
      ~/test/a.out
      
      Expected result:
      hello
      Actually result:
      execveat: Invalid argument
      dmesg:
      Invalid argument reading file caps for /dev/fd/3
      
      The 2nd reproducer and setup emulates similar case but for
      regular filesystem:
        const char* exec="echo";
        int fd, err;
        char buf[256];
      
        fd = open(exec, O_RDONLY);
        unlink(exec);
        err = fgetxattr(fd, "security.capability", buf, 256);
        if(err<0)
          fprintf(stderr, "fgetxattr: %s\n", strerror(errno));
      
      gcc compile into ~/test_fgetxattr
      
      cd /tmp
      cp /bin/echo .
      ~/test_fgetxattr
      
      Result:
      fgetxattr: Invalid argument
      
      On regular filesystem, for example, ext4 read xattr from
      disk and return to execveat(), will not trigger this issue, however,
      the overlay attr handler pass real dentry to vfs_getxattr() will.
      This reproducer calls fgetxattr() with an unlinked fd, involkes
      vfs_getxattr() then reproduced the case that d_find_alias() in
      cap_inode_getsecurity() can't find the unlinked dentry.
      
      Suggested-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Acked-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Acked-by: default avatarSerge E. Hallyn <serge@hallyn.com>
      Fixes: 8db6c34f ("Introduce v3 namespaced file capabilities")
      Cc: <stable@vger.kernel.org> # v4.14
      Signed-off-by: default avatarEddie Horng <eddie.horng@mediatek.com>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      355139a8
  32. May 24, 2018
    • Eric W. Biederman's avatar
      capabilities: Allow privileged user in s_user_ns to set security.* xattrs · b1d749c5
      Eric W. Biederman authored
      
      A privileged user in s_user_ns will generally have the ability to
      manipulate the backing store and insert security.* xattrs into
      the filesystem directly. Therefore the kernel must be prepared to
      handle these xattrs from unprivileged mounts, and it makes little
      sense for commoncap to prevent writing these xattrs to the
      filesystem. The capability and LSM code have already been updated
      to appropriately handle xattrs from unprivileged mounts, so it
      is safe to loosen this restriction on setting xattrs.
      
      The exception to this logic is that writing xattrs to a mounted
      filesystem may also cause the LSM inode_post_setxattr or
      inode_setsecurity callbacks to be invoked. SELinux will deny the
      xattr update by virtue of applying mountpoint labeling to
      unprivileged userns mounts, and Smack will deny the writes for
      any user without global CAP_MAC_ADMIN, so loosening the
      capability check in commoncap is safe in this respect as well.
      
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Acked-by: default avatarSerge Hallyn <serge@hallyn.com>
      Acked-by: default avatarChristian Brauner <christian@brauner.io>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      b1d749c5
Loading