Skip to content
Snippets Groups Projects
  1. Aug 25, 2023
  2. Jul 12, 2023
    • Yonghong Song's avatar
      kallsyms: strip LTO-only suffixes from promoted global functions · 8cc32a9b
      Yonghong Song authored
      Commit 6eb4bd92 ("kallsyms: strip LTO suffixes from static functions")
      stripped all function/variable suffixes started with '.' regardless
      of whether those suffixes are generated at LTO mode or not. In fact,
      as far as I know, in LTO mode, when a static function/variable is
      promoted to the global scope, '.llvm.<...>' suffix is added.
      
      The existing mechanism breaks live patch for a LTO kernel even if
      no <symbol>.llvm.<...> symbols are involved. For example, for the following
      kernel symbols:
        $ grep bpf_verifier_vlog /proc/kallsyms
        ffffffff81549f60 t bpf_verifier_vlog
        ffffffff8268b430 d bpf_verifier_vlog._entry
        ffffffff8282a958 d bpf_verifier_vlog._entry_ptr
        ffffffff82e12a1f d bpf_verifier_vlog.__already_done
      'bpf_verifier_vlog' is a static function. '_entry', '_entry_ptr' and
      '__already_done' are static variables used inside 'bpf_verifier_vlog',
      so llvm promotes them to file-level static with prefix 'bpf_verifier_vlog.'.
      Note that the func-level to file-level static function promotion also
      happens without LTO.
      
      Given a symbol name 'bpf_verifier_vlog', with LTO kernel, current mechanism will
      return 4 symbols to live patch subsystem which current live patching
      subsystem cannot handle it. With non-LTO kernel, only one symbol
      is returned.
      
      In [1], we have a lengthy discussion, the suggestion is to separate two
      cases:
        (1). new symbols with suffix which are generated regardless of whether
             LTO is enabled or not, and
        (2). new symbols with suffix generated only when LTO is enabled.
      
      The cleanup_symbol_name() should only remove suffixes for case (2).
      Case (1) should not be changed so it can work uniformly with or without LTO.
      
      This patch removed LTO-only suffix '.llvm.<...>' so live patching and
      tracing should work the same way for non-LTO kernel.
      The cleanup_symbol_name() in scripts/kallsyms.c is also changed to have the same
      filtering pattern so both kernel and kallsyms tool have the same
      expectation on the order of symbols.
      
       [1] https://lore.kernel.org/live-patching/20230615170048.2382735-1-song@kernel.org/T/#u
      
      
      
      Fixes: 6eb4bd92 ("kallsyms: strip LTO suffixes from static functions")
      Reported-by: default avatarSong Liu <song@kernel.org>
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Reviewed-by: default avatarZhen Lei <thunder.leizhen@huawei.com>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Acked-by: default avatarSong Liu <song@kernel.org>
      Link: https://lore.kernel.org/r/20230628181926.4102448-1-yhs@fb.com
      
      
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      8cc32a9b
  3. Jun 14, 2023
  4. Jun 08, 2023
  5. May 26, 2023
  6. Mar 19, 2023
  7. Nov 15, 2022
    • Zhen Lei's avatar
      kallsyms: Add self-test facility · 30f3bb09
      Zhen Lei authored
      
      Added test cases for basic functions and performance of functions
      kallsyms_lookup_name(), kallsyms_on_each_symbol() and
      kallsyms_on_each_match_symbol(). It also calculates the compression rate
      of the kallsyms compression algorithm for the current symbol set.
      
      The basic functions test begins by testing a set of symbols whose address
      values are known. Then, traverse all symbol addresses and find the
      corresponding symbol name based on the address. It's impossible to
      determine whether these addresses are correct, but we can use the above
      three functions along with the addresses to test each other. Due to the
      traversal operation of kallsyms_on_each_symbol() is too slow, only 60
      symbols can be tested in one second, so let it test on average once
      every 128 symbols. The other two functions validate all symbols.
      
      If the basic functions test is passed, print only performance test
      results. If the test fails, print error information, but do not perform
      subsequent performance tests.
      
      Start self-test automatically after system startup if
      CONFIG_KALLSYMS_SELFTEST=y.
      
      Example of output content: (prefix 'kallsyms_selftest:' is omitted
       start
        ---------------------------------------------------------
       | nr_symbols | compressed size | original size | ratio(%) |
       |---------------------------------------------------------|
       |     107543 |       1357912   |      2407433  |  56.40   |
        ---------------------------------------------------------
       kallsyms_lookup_name() looked up 107543 symbols
       The time spent on each symbol is (ns): min=630, max=35295, avg=7353
       kallsyms_on_each_symbol() traverse all: 11782628 ns
       kallsyms_on_each_match_symbol() traverse all: 9261 ns
       finish
      
      Signed-off-by: default avatarZhen Lei <thunder.leizhen@huawei.com>
      Signed-off-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      30f3bb09
  8. Nov 13, 2022
    • Zhen Lei's avatar
      kallsyms: Add helper kallsyms_on_each_match_symbol() · 4dc533e0
      Zhen Lei authored
      
      Function kallsyms_on_each_symbol() traverses all symbols and submits each
      symbol to the hook 'fn' for judgment and processing. For some cases, the
      hook actually only handles the matched symbol, such as livepatch.
      
      Because all symbols are currently sorted by name, all the symbols with the
      same name are clustered together. Function kallsyms_lookup_names() gets
      the start and end positions of the set corresponding to the specified
      name. So we can easily and quickly traverse all the matches.
      
      The test results are as follows (twice): (x86)
      kallsyms_on_each_match_symbol:     7454,     7984
      kallsyms_on_each_symbol      : 11733809, 11785803
      
      kallsyms_on_each_match_symbol() consumes only 0.066% of
      kallsyms_on_each_symbol()'s time. In other words, 1523x better
      performance.
      
      Signed-off-by: default avatarZhen Lei <thunder.leizhen@huawei.com>
      Signed-off-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      4dc533e0
    • Zhen Lei's avatar
      kallsyms: Reduce the memory occupied by kallsyms_seqs_of_names[] · 19bd8981
      Zhen Lei authored
      
      kallsyms_seqs_of_names[] records the symbol index sorted by address, the
      maximum value in kallsyms_seqs_of_names[] is the number of symbols. And
      2^24 = 16777216, which means that three bytes are enough to store the
      index. This can help us save (1 * kallsyms_num_syms) bytes of memory.
      
      Signed-off-by: default avatarZhen Lei <thunder.leizhen@huawei.com>
      Signed-off-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      19bd8981
    • Zhen Lei's avatar
      kallsyms: Improve the performance of kallsyms_lookup_name() · 60443c88
      Zhen Lei authored
      
      Currently, to search for a symbol, we need to expand the symbols in
      'kallsyms_names' one by one, and then use the expanded string for
      comparison. It's O(n).
      
      If we sort names in ascending order like addresses, we can also use
      binary search. It's O(log(n)).
      
      In order not to change the implementation of "/proc/kallsyms", the table
      kallsyms_names[] is still stored in a one-to-one correspondence with the
      address in ascending order.
      
      Add array kallsyms_seqs_of_names[], it's indexed by the sequence number
      of the sorted names, and the corresponding content is the sequence number
      of the sorted addresses. For example:
      Assume that the index of NameX in array kallsyms_seqs_of_names[] is 'i',
      the content of kallsyms_seqs_of_names[i] is 'k', then the corresponding
      address of NameX is kallsyms_addresses[k]. The offset in kallsyms_names[]
      is get_symbol_offset(k).
      
      Note that the memory usage will increase by (4 * kallsyms_num_syms)
      bytes, the next two patches will reduce (1 * kallsyms_num_syms) bytes
      and properly handle the case CONFIG_LTO_CLANG=y.
      
      Performance test results: (x86)
      Before:
      min=234, max=10364402, avg=5206926
      min=267, max=11168517, avg=5207587
      After:
      min=1016, max=90894, avg=7272
      min=1014, max=93470, avg=7293
      
      The average lookup performance of kallsyms_lookup_name() improved 715x.
      
      Signed-off-by: default avatarZhen Lei <thunder.leizhen@huawei.com>
      Signed-off-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      60443c88
  9. Nov 01, 2022
  10. Oct 17, 2022
  11. Sep 28, 2022
  12. Sep 26, 2022
  13. Jul 18, 2022
  14. Jul 12, 2022
  15. May 10, 2022
  16. Mar 18, 2022
  17. Jan 07, 2022
    • David Vernet's avatar
      livepatch: Avoid CPU hogging with cond_resched · f5bdb34b
      David Vernet authored
      
      When initializing a 'struct klp_object' in klp_init_object_loaded(), and
      performing relocations in klp_resolve_symbols(), klp_find_object_symbol()
      is invoked to look up the address of a symbol in an already-loaded module
      (or vmlinux). This, in turn, calls kallsyms_on_each_symbol() or
      module_kallsyms_on_each_symbol() to find the address of the symbol that is
      being patched.
      
      It turns out that symbol lookups often take up the most CPU time when
      enabling and disabling a patch, and may hog the CPU and cause other tasks
      on that CPU's runqueue to starve -- even in paths where interrupts are
      enabled.  For example, under certain workloads, enabling a KLP patch with
      many objects or functions may cause ksoftirqd to be starved, and thus for
      interrupts to be backlogged and delayed. This may end up causing TCP
      retransmits on the host where the KLP patch is being applied, and in
      general, may cause any interrupts serviced by softirqd to be delayed while
      the patch is being applied.
      
      So as to ensure that kallsyms_on_each_symbol() does not end up hogging the
      CPU, this patch adds a call to cond_resched() in kallsyms_on_each_symbol()
      and module_kallsyms_on_each_symbol(), which are invoked when doing a symbol
      lookup in vmlinux and a module respectively.  Without this patch, if a
      live-patch is applied on a 36-core Intel host with heavy TCP traffic, a
      ~10x spike is observed in TCP retransmits while the patch is being applied.
      Additionally, collecting sched events with perf indicates that ksoftirqd is
      awakened ~1.3 seconds before it's eventually scheduled.  With the patch, no
      increase in TCP retransmit events is observed, and ksoftirqd is scheduled
      shortly after it's awakened.
      
      Signed-off-by: default avatarDavid Vernet <void@manifault.com>
      Acked-by: default avatarMiroslav Benes <mbenes@suse.cz>
      Acked-by: default avatarSong Liu <song@kernel.org>
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
      Link: https://lore.kernel.org/r/20211229215646.830451-1-void@manifault.com
      f5bdb34b
  18. Oct 04, 2021
  19. Jul 08, 2021
    • Stephen Boyd's avatar
      module: add printk formats to add module build ID to stacktraces · 9294523e
      Stephen Boyd authored
      Let's make kernel stacktraces easier to identify by including the build
      ID[1] of a module if the stacktrace is printing a symbol from a module.
      This makes it simpler for developers to locate a kernel module's full
      debuginfo for a particular stacktrace.  Combined with
      scripts/decode_stracktrace.sh, a developer can download the matching
      debuginfo from a debuginfod[2] server and find the exact file and line
      number for the functions plus offsets in a stacktrace that match the
      module.  This is especially useful for pstore crash debugging where the
      kernel crashes are recorded in something like console-ramoops and the
      recovery kernel/modules are different or the debuginfo doesn't exist on
      the device due to space concerns (the debuginfo can be too large for space
      limited devices).
      
      Originally, I put this on the %pS format, but that was quickly rejected
      given that %pS is used in other places such as ftrace where build IDs
      aren't meaningful.  There was some discussions on the list to put every
      module build ID into the "Modules linked in:" section of the stacktrace
      message but that quickly becomes very hard to read once you have more than
      three or four modules linked in.  It also provides too much information
      when we don't expect each module to be traversed in a stacktrace.  Having
      the build ID for modules that aren't important just makes things messy.
      Splitting it to multiple lines for each module quickly explodes the number
      of lines printed in an oops too, possibly wrapping the warning off the
      console.  And finally, trying to stash away each module used in a
      callstack to provide the ID of each symbol printed is cumbersome and would
      require changes to each architecture to stash away modules and return
      their build IDs once unwinding has completed.
      
      Instead, we opt for the simpler approach of introducing new printk formats
      '%pS[R]b' for "pointer symbolic backtrace with module build ID" and '%pBb'
      for "pointer backtrace with module build ID" and then updating the few
      places in the architecture layer where the stacktrace is printed to use
      this new format.
      
      Before:
      
       Call trace:
        lkdtm_WARNING+0x28/0x30 [lkdtm]
        direct_entry+0x16c/0x1b4 [lkdtm]
        full_proxy_write+0x74/0xa4
        vfs_write+0xec/0x2e8
      
      After:
      
       Call trace:
        lkdtm_WARNING+0x28/0x30 [lkdtm 6c2215028606bda50de823490723dc4bc5bf46f9]
        direct_entry+0x16c/0x1b4 [lkdtm 6c2215028606bda50de823490723dc4bc5bf46f9]
        full_proxy_write+0x74/0xa4
        vfs_write+0xec/0x2e8
      
      [akpm@linux-foundation.org: fix build with CONFIG_MODULES=n, tweak code layout]
      [rdunlap@infradead.org: fix build when CONFIG_MODULES is not set]
        Link: https://lkml.kernel.org/r/20210513171510.20328-1-rdunlap@infradead.org
      [akpm@linux-foundation.org: make kallsyms_lookup_buildid() static]
      [cuibixuan@huawei.com: fix build error when CONFIG_SYSFS is disabled]
        Link: https://lkml.kernel.org/r/20210525105049.34804-1-cuibixuan@huawei.com
      
      Link: https://lkml.kernel.org/r/20210511003845.2429846-6-swboyd@chromium.org
      Link: https://fedoraproject.org/wiki/Releases/FeatureBuildId [1]
      Link: https://sourceware.org/elfutils/Debuginfod.html
      
       [2]
      Signed-off-by: default avatarStephen Boyd <swboyd@chromium.org>
      Signed-off-by: default avatarBixuan Cui <cuibixuan@huawei.com>
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Jessica Yu <jeyu@kernel.org>
      Cc: Evan Green <evgreen@chromium.org>
      Cc: Hsin-Yi Wang <hsinyi@chromium.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Sasha Levin <sashal@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9294523e
  20. Apr 08, 2021
  21. Feb 08, 2021
  22. Oct 25, 2020
  23. Aug 23, 2020
  24. Jul 08, 2020
    • Kees Cook's avatar
      kallsyms: Refactor kallsyms_show_value() to take cred · 16025184
      Kees Cook authored
      
      In order to perform future tests against the cred saved during open(),
      switch kallsyms_show_value() to operate on a cred, and have all current
      callers pass current_cred(). This makes it very obvious where callers
      are checking the wrong credential in their "read" contexts. These will
      be fixed in the coming patches.
      
      Additionally switch return value to bool, since it is always used as a
      direct permission check, not a 0-on-success, negative-on-error style
      function return.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      16025184
  25. Jun 15, 2020
  26. Apr 07, 2020
  27. Feb 05, 2020
  28. Feb 04, 2020
  29. Aug 27, 2019
    • Marc Zyngier's avatar
      kallsyms: Don't let kallsyms_lookup_size_offset() fail on retrieving the first symbol · 2a1a3fa0
      Marc Zyngier authored
      
      An arm64 kernel configured with
      
        CONFIG_KPROBES=y
        CONFIG_KALLSYMS=y
        # CONFIG_KALLSYMS_ALL is not set
        CONFIG_KALLSYMS_BASE_RELATIVE=y
      
      reports the following kprobe failure:
      
        [    0.032677] kprobes: failed to populate blacklist: -22
        [    0.033376] Please take care of using kprobes.
      
      It appears that kprobe fails to retrieve the symbol at address
      0xffff000010081000, despite this symbol being in System.map:
      
        ffff000010081000 T __exception_text_start
      
      This symbol is part of the first group of aliases in the
      kallsyms_offsets array (symbol names generated using ugly hacks in
      scripts/kallsyms.c):
      
        kallsyms_offsets:
                .long   0x1000 // do_undefinstr
                .long   0x1000 // efi_header_end
                .long   0x1000 // _stext
                .long   0x1000 // __exception_text_start
                .long   0x12b0 // do_cp15instr
      
      Looking at the implementation of get_symbol_pos(), it returns the
      lowest index for aliasing symbols. In this case, it return 0.
      
      But kallsyms_lookup_size_offset() considers 0 as a failure, which
      is obviously wrong (there is definitely a valid symbol living there).
      In turn, the kprobe blacklisting stops abruptly, hence the original
      error.
      
      A CONFIG_KALLSYMS_ALL kernel wouldn't fail as there is always
      some random symbols at the beginning of this array, which are never
      looked up via kallsyms_lookup_size_offset.
      
      Fix it by considering that get_symbol_pos() is always successful
      (which is consistent with the other uses of this function).
      
      Fixes: ffc50891 ("[PATCH] Create kallsyms_lookup_size_offset()")
      Reviewed-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      2a1a3fa0
  30. May 21, 2019
  31. Jan 21, 2019
  32. Sep 10, 2018
  33. Aug 14, 2018
Loading