Skip to content
Snippets Groups Projects
  1. Nov 15, 2021
  2. Nov 14, 2021
  3. Nov 11, 2021
  4. Nov 09, 2021
  5. Nov 06, 2021
  6. Oct 18, 2021
  7. Oct 11, 2021
  8. Sep 22, 2021
  9. Sep 20, 2021
    • Leon Romanovsky's avatar
      init: don't panic if mount_nodev_root failed · 40c8ee67
      Leon Romanovsky authored and Al Viro's avatar Al Viro committed
      
      Attempt to mount 9p file system as root gives the following kernel panic:
      
       9pnet_virtio: no channels available for device root
       Kernel panic - not syncing: VFS: Unable to mount root "root" (9p), err=-2
       CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.15.0-rc1+ #127
       Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
       Call Trace:
        dump_stack_lvl+0x45/0x59
        panic+0x1e2/0x44b
        ? __warn_printk+0xf3/0xf3
        ? free_unref_page+0x2d4/0x4a0
        ? trace_hardirqs_on+0x32/0x120
        ? free_unref_page+0x2d4/0x4a0
        mount_root+0x189/0x1e0
        prepare_namespace+0x136/0x165
        kernel_init_freeable+0x3b8/0x3cb
        ? rest_init+0x2e0/0x2e0
        kernel_init+0x19/0x130
        ret_from_fork+0x1f/0x30
       Kernel Offset: disabled
       ---[ end Kernel panic - not syncing: VFS: Unable to mount root "root" (9p), err=-2 ]---
      
      QEMU command line:
       "qemu-system-x86_64 -append root=/dev/root rw rootfstype=9p rootflags=trans=virtio ..."
      
      This error is because root_device_name is truncated in prepare_namespace() from
      being "/dev/root" to be "root" prior to call to mount_nodev_root().
      
      As a solution, don't treat errors in mount_nodev_root() as errors that
      require panics and allow failback to the mount flow that existed before
      patch citied in Fixes tag.
      
      Fixes: f9259be6 ("init: allow mounting arbitrary non-blockdevice filesystems as root")
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      40c8ee67
    • Vivek Goyal's avatar
      init/do_mounts.c: Harden split_fs_names() against buffer overflow · b51593c4
      Vivek Goyal authored and Al Viro's avatar Al Viro committed
      
      split_fs_names() currently takes comma separate list of filesystems
      and converts it into individual filesystem strings. Pleaces these
      strings in the input buffer passed by caller and returns number of
      strings.
      
      If caller manages to pass input string bigger than buffer, then we
      can write beyond the buffer. Or if string just fits buffer, we will
      still write beyond the buffer as we append a '\0' byte at the end.
      
      Pass size of input buffer to split_fs_names() and put enough checks
      in place so such buffer overrun possibilities do not occur.
      
      This patch does few things.
      
      - Add a parameter "size" to split_fs_names(). This specifies size
        of input buffer.
      
      - Use strlcpy() (instead of strcpy()) so that we can't go beyond
        buffer size. If input string "names" is larger than passed in
        buffer, input string will be truncated to fit in buffer.
      
      - Stop appending extra '\0' character at the end and avoid one
        possibility of going beyond the input buffer size.
      
      - Do not use extra loop to count number of strings.
      
      - Previously if one passed "rootfstype=foo,,bar", split_fs_names()
        will return only 1 string "foo" (and "bar" will be truncated
        due to extra ,). After this patch, now split_fs_names() will
        return 3 strings ("foo", zero-sized-string, and "bar").
      
        Callers of split_fs_names() have been modified to check for
        zero sized string and skip to next one.
      
      Reported-by: default avatarxu xin <xu.xin16@zte.com.cn>
      Signed-off-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      b51593c4
  10. Sep 14, 2021
    • Linus Torvalds's avatar
      memblock: introduce saner 'memblock_free_ptr()' interface · 77e02cf5
      Linus Torvalds authored
      The boot-time allocation interface for memblock is a mess, with
      'memblock_alloc()' returning a virtual pointer, but then you are
      supposed to free it with 'memblock_free()' that takes a _physical_
      address.
      
      Not only is that all kinds of strange and illogical, but it actually
      causes bugs, when people then use it like a normal allocation function,
      and it fails spectacularly on a NULL pointer:
      
         https://lore.kernel.org/all/20210912140820.GD25450@xsang-OptiPlex-9020/
      
      or just random memory corruption if the debug checks don't catch it:
      
         https://lore.kernel.org/all/61ab2d0c-3313-aaab-514c-e15b7aa054a0@suse.cz/
      
      
      
      I really don't want to apply patches that treat the symptoms, when the
      fundamental cause is this horribly confusing interface.
      
      I started out looking at just automating a sane replacement sequence,
      but because of this mix or virtual and physical addresses, and because
      people have used the "__pa()" macro that can take either a regular
      kernel pointer, or just the raw "unsigned long" address, it's all quite
      messy.
      
      So this just introduces a new saner interface for freeing a virtual
      address that was allocated using 'memblock_alloc()', and that was kept
      as a regular kernel pointer.  And then it converts a couple of users
      that are obvious and easy to test, including the 'xbc_nodes' case in
      lib/bootconfig.c that caused problems.
      
      Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
      Fixes: 40caa127 ("init: bootconfig: Remove all bootconfig data when the init memory is removed")
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      77e02cf5
  11. Sep 08, 2021
    • Masami Hiramatsu's avatar
      init/bootconfig: Reorder init parameter from bootconfig and cmdline · b66fbbe8
      Masami Hiramatsu authored
      Reorder the init parameters from bootconfig and kernel cmdline
      so that the kernel cmdline always be the last part of the
      parameters as below.
      
       " -- "[bootconfig init params][cmdline init params]
      
      This change will help us to prevent that bootconfig init params
      overwrite the init params which user gives in the command line.
      
      Link: https://lkml.kernel.org/r/163077085675.222577.5665176468023636160.stgit@devnote2
      
      
      
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      b66fbbe8
    • Masami Hiramatsu's avatar
      init: bootconfig: Remove all bootconfig data when the init memory is removed · 40caa127
      Masami Hiramatsu authored
      Since the bootconfig is used only in the init functions,
      it doesn't need to keep the data after boot. Free it when
      the init memory is removed.
      
      Link: https://lkml.kernel.org/r/163077084958.222577.5924961258513004428.stgit@devnote2
      
      
      
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      40caa127
    • Kefeng Wang's avatar
      trap: cleanup trap_init() · 8b097881
      Kefeng Wang authored
      There are some empty trap_init() definitions in different ARCHs, Introduce
      a new weak trap_init() function to clean them up.
      
      Link: https://lkml.kernel.org/r/20210812123602.76356-1-wangkefeng.wang@huawei.com
      
      
      Signed-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>	[arm32]
      Acked-by: Vineet Gupta						[arc]
      Acked-by: Michael Ellerman <mpe@ellerman.id.au>			[powerpc]
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <palmerdabbelt@google.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8b097881
    • Rasmus Villemoes's avatar
      init: move usermodehelper_enable() to populate_rootfs() · b234ed6d
      Rasmus Villemoes authored
      Currently, usermodehelper is enabled right before PID1 starts going
      through the initcalls. However, any call of a usermodehelper from a
      pure_, core_, postcore_, arch_, subsys_ or fs_ initcall is futile, as
      there is no filesystem contents yet.
      
      Up until commit e7cb072e ("init/initramfs.c: do unpacking
      asynchronously"), such calls, whether via some request_module(), a
      legacy uevent "/sbin/hotplug" notification or something else, would
      just fail silently with (presumably) -ENOENT from
      kernel_execve(). However, that commit introduced the
      wait_for_initramfs() synchronization hook which must be called from
      the usermodehelper exec path right before the kernel_execve, in order
      that request_module() et al done from *after* rootfs_initcall()
      time (i.e. device_ and late_ initcalls) would continue to find a
      populated initramfs as they used to.
      
      Any call of wait_for_initramfs() done before the unpacking has been
      scheduled (i.e. before rootfs_initcall time) must just return
      immediately [and let the caller find an empty file system] in order
      not to deadlock the machine. I mistakenly thought, and my limited
      testing confirmed, that there were no such calls, so I added a
      pr_warn_once() in wait_for_initramfs(). It turns out that one can
      indeed hit request_module() as well as kobject_uevent_env() during
      those early init calls, leading to a user-visible warning in the
      kernel log emitted consistently for certain configurations.
      
      We could just remove the pr_warn_once(), but I think it's better to
      postpone enabling the usermodehelper framework until there is at least
      some chance of finding the executable. That is also a little more
      efficient in that a lot of work done in umh.c will be elided. However,
      it does change the error seen by those early callers from -ENOENT to
      -EBUSY, so there is a risk of a regression if any caller care about
      the exact error value.
      
      Link: https://lkml.kernel.org/r/20210728134638.329060-1-linux@rasmusvillemoes.dk
      
      
      Fixes: e7cb072e ("init/initramfs.c: do unpacking asynchronously")
      Signed-off-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Reported-by: default avatarAlexander Egorenkov <egorenar@linux.ibm.com>
      Reported-by: default avatarBruno Goncalves <bgoncalv@redhat.com>
      Reported-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b234ed6d
    • Marco Elver's avatar
      kbuild: Only default to -Werror if COMPILE_TEST · b339ec9c
      Marco Elver authored
      
      The cross-product of the kernel's supported toolchains, architectures,
      and configuration options is large. So large, that it's generally
      accepted to be infeasible to enumerate and build+test them all
      (many compile-testers rely on randomly generated configs).
      
      Without the possibility to enumerate all possible combinations of
      toolchains, architectures, and configuration options, it is inevitable
      that compiler warnings in this space exist.
      
      With -Werror, this means that an innumerable set of kernels are now
      broken, yet had been perfectly usable before (confused compilers, code
      with warnings unused, or luck).
      
      Distributors will necessarily pick a point in the toolchain X arch X
      config space, and if unlucky, will have a broken build. Granted, those
      will likely disable CONFIG_WERROR and move on.
      
      The kernel's default configuration is unlikely to be suitable for all
      users, but it's inappropriate to force many users to set CONFIG_WERROR=n.
      
      This also holds for CI systems which are focused on runtime testing,
      where the odd warning in some subsystem will disrupt testing of the rest
      of the kernel. Many of those runtime-focused CI systems run tests or
      fuzz the kernel using runtime debugging tools. Runtime testing of
      different subsystems can proceed in parallel, and potentially uncover
      serious bugs; halting runtime testing of the entire kernel because of
      the odd warning (now error) in a subsystem or driver is simply
      inappropriate.
      
      Therefore, runtime-focused CI systems will likely choose CONFIG_WERROR=n
      as well.
      
      The appropriate usecase for -Werror is therefore compile-test focused
      builds (often done by developers or CI systems).
      
      Reflect this in the Kconfig option by making the default value of WERROR
      match COMPILE_TEST.
      
      Signed-off-by: default avatarMarco Elver <elver@google.com>
      Acked-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Acked-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Reviwed-by: default avatarMark Brown <broonie@kernel.org>
      Reviewed-by: default avatarNathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b339ec9c
  12. Sep 05, 2021
    • Linus Torvalds's avatar
      Enable '-Werror' by default for all kernel builds · 3fe617cc
      Linus Torvalds authored
      
      ... but make it a config option so that broken environments can disable
      it when required.
      
      We really should always have a clean build, and will disable specific
      over-eager warnings as required, if we can't fix them.  But while I
      fairly religiously enforce that in my own tree, it doesn't get enforced
      by various build robots that don't necessarily report warnings.
      
      So this just makes '-Werror' a default compiler flag, but allows people
      to disable it for their configuration if they have some particular
      issues.
      
      Occasionally, new compiler versions end up enabling new warnings, and it
      can take a while before we have them fixed (or the warnings disabled if
      that is what it takes), so the config option allows for that situation.
      
      Hopefully this will mean that I get fewer pull requests that have new
      warnings that were not noticed by various automation we have in place.
      
      Knock wood.
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3fe617cc
  13. Aug 24, 2021
  14. Aug 23, 2021
  15. Aug 20, 2021
  16. Aug 12, 2021
  17. Aug 03, 2021
  18. Jul 26, 2021
    • John Ogness's avatar
      printk: remove NMI tracking · 85e3e7fb
      John Ogness authored
      
      All NMI contexts are handled the same as the safe context: store the
      message and defer printing. There is no need to have special NMI
      context tracking for this. Using in_nmi() is enough.
      
      There are several parts of the kernel that are manually calling into
      the printk NMI context tracking in order to cause general printk
      deferred printing:
      
          arch/arm/kernel/smp.c
          arch/powerpc/kexec/crash.c
          kernel/trace/trace.c
      
      For arm/kernel/smp.c and powerpc/kexec/crash.c, provide a new
      function pair printk_deferred_enter/exit that explicitly achieves the
      same objective.
      
      For ftrace, remove the printk context manipulation completely. It was
      added in commit 03fc7f9c ("printk/nmi: Prevent deadlock when
      accessing the main log buffer in NMI"). The purpose was to enforce
      storing messages directly into the ring buffer even in NMI context.
      It really should have only modified the behavior in NMI context.
      There is no need for a special behavior any longer. All messages are
      always stored directly now. The console deferring is handled
      transparently in vprintk().
      
      Signed-off-by: default avatarJohn Ogness <john.ogness@linutronix.de>
      [pmladek@suse.com: Remove special handling in ftrace.c completely.
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
      Link: https://lore.kernel.org/r/20210715193359.25946-5-john.ogness@linutronix.de
      85e3e7fb
  19. Jul 19, 2021
    • Chris Down's avatar
      printk: Userspace format indexing support · 33701557
      Chris Down authored
      
      We have a number of systems industry-wide that have a subset of their
      functionality that works as follows:
      
      1. Receive a message from local kmsg, serial console, or netconsole;
      2. Apply a set of rules to classify the message;
      3. Do something based on this classification (like scheduling a
         remediation for the machine), rinse, and repeat.
      
      As a couple of examples of places we have this implemented just inside
      Facebook, although this isn't a Facebook-specific problem, we have this
      inside our netconsole processing (for alarm classification), and as part
      of our machine health checking. We use these messages to determine
      fairly important metrics around production health, and it's important
      that we get them right.
      
      While for some kinds of issues we have counters, tracepoints, or metrics
      with a stable interface which can reliably indicate the issue, in order
      to react to production issues quickly we need to work with the interface
      which most kernel developers naturally use when developing: printk.
      
      Most production issues come from unexpected phenomena, and as such
      usually the code in question doesn't have easily usable tracepoints or
      other counters available for the specific problem being mitigated. We
      have a number of lines of monitoring defence against problems in
      production (host metrics, process metrics, service metrics, etc), and
      where it's not feasible to reliably monitor at another level, this kind
      of pragmatic netconsole monitoring is essential.
      
      As one would expect, monitoring using printk is rather brittle for a
      number of reasons -- most notably that the message might disappear
      entirely in a new version of the kernel, or that the message may change
      in some way that the regex or other classification methods start to
      silently fail.
      
      One factor that makes this even harder is that, under normal operation,
      many of these messages are never expected to be hit. For example, there
      may be a rare hardware bug which one wants to detect if it was to ever
      happen again, but its recurrence is not likely or anticipated. This
      precludes using something like checking whether the printk in question
      was printed somewhere fleetwide recently to determine whether the
      message in question is still present or not, since we don't anticipate
      that it should be printed anywhere, but still need to monitor for its
      future presence in the long-term.
      
      This class of issue has happened on a number of occasions, causing
      unhealthy machines with hardware issues to remain in production for
      longer than ideal. As a recent example, some monitoring around
      blk_update_request fell out of date and caused semi-broken machines to
      remain in production for longer than would be desirable.
      
      Searching through the codebase to find the message is also extremely
      fragile, because many of the messages are further constructed beyond
      their callsite (eg. btrfs_printk and other module-specific wrappers,
      each with their own functionality). Even if they aren't, guessing the
      format and formulation of the underlying message based on the aesthetics
      of the message emitted is not a recipe for success at scale, and our
      previous issues with fleetwide machine health checking demonstrate as
      much.
      
      This provides a solution to the issue of silently changed or deleted
      printks: we record pointers to all printk format strings known at
      compile time into a new .printk_index section, both in vmlinux and
      modules. At runtime, this can then be iterated by looking at
      <debugfs>/printk/index/<module>, which emits the following format, both
      readable by humans and able to be parsed by machines:
      
          $ head -1 vmlinux; shuf -n 5 vmlinux
          # <level[,flags]> filename:line function "format"
          <5> block/blk-settings.c:661 disk_stack_limits "%s: Warning: Device %s is misaligned\n"
          <4> kernel/trace/trace.c:8296 trace_create_file "Could not create tracefs '%s' entry\n"
          <6> arch/x86/kernel/hpet.c:144 _hpet_print_config "hpet: %s(%d):\n"
          <6> init/do_mounts.c:605 prepare_namespace "Waiting for root device %s...\n"
          <6> drivers/acpi/osl.c:1410 acpi_no_auto_serialize_setup "ACPI: auto-serialization disabled\n"
      
      This mitigates the majority of cases where we have a highly-specific
      printk which we want to match on, as we can now enumerate and check
      whether the format changed or the printk callsite disappeared entirely
      in userspace. This allows us to catch changes to printks we monitor
      earlier and decide what to do about it before it becomes problematic.
      
      There is no additional runtime cost for printk callers or printk itself,
      and the assembly generated is exactly the same.
      
      Signed-off-by: default avatarChris Down <chris@chrisdown.name>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Jessica Yu <jeyu@kernel.org>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: John Ogness <john.ogness@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kees Cook <keescook@chromium.org>
      Reviewed-by: default avatarPetr Mladek <pmladek@suse.com>
      Tested-by: default avatarPetr Mladek <pmladek@suse.com>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Acked-by: default avatarAndy Shevchenko <andy.shevchenko@gmail.com>
      Acked-by: Jessica Yu <jeyu@kernel.org> # for module.{c,h}
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
      Link: https://lore.kernel.org/r/e42070983637ac5e384f17fbdbe86d19c7b212a5.1623775748.git.chris@chrisdown.name
      33701557
  20. Jul 17, 2021
  21. Jul 08, 2021
  22. Jul 01, 2021
    • Andrew Halaney's avatar
      init: print out unknown kernel parameters · 86d1919a
      Andrew Halaney authored
      It is easy to foobar setting a kernel parameter on the command line
      without realizing it, there's not much output that you can use to assess
      what the kernel did with that parameter by default.
      
      Make it a little more explicit which parameters on the command line
      _looked_ like a valid parameter for the kernel, but did not match anything
      and ultimately got tossed to init.  This is very similar to the unknown
      parameter message received when loading a module.
      
      This assumes the parameters are processed in a normal fashion, some
      parameters (dyndbg= for example) don't register their parameter with the
      rest of the kernel's parameters, and therefore always show up in this list
      (and are also given to init - like the rest of this list).
      
      Another example is BOOT_IMAGE= is highlighted as an offender, which it
      technically is, but is passed by LILO and GRUB so most systems will see
      that complaint.
      
      An example output where "foobared" and "unrecognized" are intentionally
      invalid parameters:
      
        Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.12-dirty debug log_buf_len=4M foobared unrecognized=foo
        Unknown command line parameters: foobared BOOT_IMAGE=/boot/vmlinuz-5.12-dirty unrecognized=foo
      
      Link: https://lkml.kernel.org/r/20210511211009.42259-1-ahalaney@redhat.com
      
      
      Signed-off-by: default avatarAndrew Halaney <ahalaney@redhat.com>
      Suggested-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Suggested-by: default avatarBorislav Petkov <bp@suse.de>
      Acked-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      86d1919a
  23. Jun 22, 2021
  24. Jun 18, 2021
  25. Jun 10, 2021
Loading