Skip to content
Snippets Groups Projects
  1. Oct 02, 2024
    • Al Viro's avatar
      move asm/unaligned.h to linux/unaligned.h · 5f60d5f6
      Al Viro authored
      asm/unaligned.h is always an include of asm-generic/unaligned.h;
      might as well move that thing to linux/unaligned.h and include
      that - there's nothing arch-specific in that header.
      
      auto-generated by the following:
      
      for i in `git grep -l -w asm/unaligned.h`; do
      	sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i
      done
      for i in `git grep -l -w asm-generic/unaligned.h`; do
      	sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i
      done
      git mv include/asm-generic/unaligned.h include/linux/unaligned.h
      git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h
      sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild
      sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
      5f60d5f6
  2. Sep 20, 2024
  3. Sep 13, 2024
    • Heiko Carstens's avatar
      s390/vdso: Wire up getrandom() vdso implementation · b920aa77
      Heiko Carstens authored and Jason A. Donenfeld's avatar Jason A. Donenfeld committed
      
      Provide the s390 specific vdso getrandom() architecture backend.
      
      _vdso_rng_data required data is placed within the _vdso_data vvar page,
      by using a hardcoded offset larger than vdso_data.
      
      As required the chacha20 implementation does not write to the stack.
      
      The implementation follows more or less the arm64 implementations and
      makes use of vector instructions. It has a fallback to the getrandom()
      system call for machines where the vector facility is not installed.
      
      The check if the vector facility is installed, as well as an
      optimization for machines with the vector-enhancements facility 2, is
      implemented with alternatives, avoiding runtime checks.
      
      Note that __kernel_getrandom() is implemented without the vdso user
      wrapper which would setup a stack frame for odd cases (aka very old
      glibc variants) where the caller has not done that. All callers of
      __kernel_getrandom() are required to setup a stack frame, like the C ABI
      requires it.
      
      The vdso testcases vdso_test_getrandom and vdso_test_chacha pass.
      
      Benchmark on a z16:
      
          $ ./vdso_test_getrandom bench-single
             vdso: 25000000 times in 0.493703559 seconds
          syscall: 25000000 times in 6.584025337 seconds
      
      Signed-off-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Reviewed-by: default avatarHarald Freudenberger <freude@linux.ibm.com>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      b920aa77
    • Christophe Leroy's avatar
      powerpc/vdso: Wire up getrandom() vDSO implementation on VDSO32 · 53cee505
      Christophe Leroy authored and Jason A. Donenfeld's avatar Jason A. Donenfeld committed
      
      To be consistent with other VDSO functions, the function is called
      __kernel_getrandom()
      
      __arch_chacha20_blocks_nostack() fonction is implemented basically
      with 32 bits operations. It performs 4 QUARTERROUND operations in
      parallele. There are enough registers to avoid using the stack:
      
      On input:
      	r3: output bytes
      	r4: 32-byte key input
      	r5: 8-byte counter input/output
      	r6: number of 64-byte blocks to write to output
      
      During operation:
      	stack: pointer to counter (r5) and non-volatile registers (r14-131)
      	r0: counter of blocks (initialised with r6)
      	r4: Value '4' after key has been read, used for indexing
      	r5-r12: key
      	r14-r15: block counter
      	r16-r31: chacha state
      
      At the end:
      	r0, r6-r12: Zeroised
      	r5, r14-r31: Restored
      
      Performance on powerpc 885 (using kernel selftest):
      	~# ./vdso_test_getrandom bench-single
      	   vdso: 25000000 times in 62.938002291 seconds
      	   libc: 25000000 times in 535.581916866 seconds
      	syscall: 25000000 times in 531.525042806 seconds
      
      Performance on powerpc 8321 (using kernel selftest):
      	~# ./vdso_test_getrandom bench-single
      	   vdso: 25000000 times in 16.899318858 seconds
      	   libc: 25000000 times in 131.050596522 seconds
      	syscall: 25000000 times in 129.794790389 seconds
      
      This first patch adds support for VDSO32. As selftests cannot easily
      be generated only for VDSO32, and because the following patch brings
      support for VDSO64 anyway, this patch opts out all code in
      __arch_chacha20_blocks_nostack() so that vdso_test_chacha will not
      fail to compile and will not crash on PPC64/PPC64LE, allthough the
      selftest itself will fail.
      
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Acked-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      53cee505
    • Adhemerval Zanella's avatar
      arm64: vDSO: Wire up getrandom() vDSO implementation · 712676ea
      Adhemerval Zanella authored and Jason A. Donenfeld's avatar Jason A. Donenfeld committed
      
      Hook up the generic vDSO implementation to the aarch64 vDSO data page.
      The _vdso_rng_data required data is placed within the _vdso_data vvar
      page, by using a offset larger than the vdso_data.
      
      The vDSO function requires a ChaCha20 implementation that does not write
      to the stack, and that can do an entire ChaCha20 permutation.  The one
      provided uses NEON on the permute operation, with a fallback to the
      syscall for chips that do not support AdvSIMD.
      
      This also passes the vdso_test_chacha test along with
      vdso_test_getrandom. The vdso_test_getrandom bench-single result on
      Neoverse-N1 shows:
      
         vdso: 25000000 times in 0.783884250 seconds
         libc: 25000000 times in 8.780275399 seconds
      syscall: 25000000 times in 8.786581518 seconds
      
      A small fixup to arch/arm64/include/asm/mman.h was required to avoid
      pulling kernel code into the vDSO, similar to what's already done in
      arch/arm64/include/asm/rwonce.h.
      
      Signed-off-by: default avatarAdhemerval Zanella <adhemerval.zanella@linaro.org>
      Reviewed-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Acked-by: default avatarWill Deacon <will@kernel.org>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      712676ea
    • Xi Ruoyao's avatar
      LoongArch: vDSO: Wire up getrandom() vDSO implementation · 18efd0b1
      Xi Ruoyao authored and Jason A. Donenfeld's avatar Jason A. Donenfeld committed
      
      Hook up the generic vDSO implementation to the LoongArch vDSO data page
      by providing the required __arch_chacha20_blocks_nostack,
      __arch_get_k_vdso_rng_data, and getrandom_syscall implementations. Also
      wire up the selftests.
      
      Signed-off-by: default avatarXi Ruoyao <xry111@xry111.site>
      Acked-by: default avatarHuacai Chen <chenhuacai@kernel.org>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      18efd0b1
  4. Aug 30, 2024
  5. Aug 07, 2024
    • Namhyung Kim's avatar
      tools/include: Sync arm64 headers with the kernel sources · d5b85489
      Namhyung Kim authored
      
      To pick up changes from:
      
        9ef54a38 arm64: cputype: Add Cortex-A725 definitions
        58d245e0 arm64: cputype: Add Cortex-X1C definitions
        fd2ff5f0 arm64: cputype: Add Cortex-X925 definitions
        add332c4 arm64: cputype: Add Cortex-A720 definitions
        be5a6f23 arm64: cputype: Add Cortex-X3 definitions
      
      This should be used to beautify x86 syscall arguments and it addresses
      these tools/perf build warnings:
      
        Warning: Kernel ABI header differences:
        diff -u tools/arch/arm64/include/asm/cputype.h arch/arm64/include/asm/cputype.h
      
      Please see tools/include/uapi/README for details (it's in the first patch
      of this series).
      
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      d5b85489
    • Namhyung Kim's avatar
      tools/include: Sync x86 headers with the kernel sources · f6d9883f
      Namhyung Kim authored
      
      To pick up changes from:
      
        149fd471 perf/x86/intel: Support Perfmon MSRs aliasing
        21b362cc x86/resctrl: Enable shared RMID mode on Sub-NUMA Cluster (SNC) systems
        4f460bff cpufreq: acpi: move MSR_K7_HWCR_CPB_DIS_BIT into msr-index.h
        7ea81936 x86/cpufeatures: Add HWP highest perf change feature flag
        78ce84b9 x86/cpufeatures: Flip the /proc/cpuinfo appearance logic
        1beb348d x86/sev: Provide SVSM discovery support
      
      This should be used to beautify x86 syscall arguments and it addresses
      these tools/perf build warnings:
      
        Warning: Kernel ABI header differences:
        diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
        diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
      
      Please see tools/include/uapi/README for details (it's in the first patch
      of this series).
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: x86@kernel.org
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      f6d9883f
  6. Aug 06, 2024
    • Namhyung Kim's avatar
      tools/include: Sync uapi/linux/kvm.h with the kernel sources · a625df39
      Namhyung Kim authored
      And other arch-specific UAPI headers to pick up changes from:
      
        4b23e0c1 KVM: Ensure new code that references immediate_exit gets extra scrutiny
        85542adb KVM: x86: Add KVM_RUN_X86_GUEST_MODE kvm_run flag
        6fef5185 KVM: x86: Add a capability to configure bus frequency for APIC timer
        34ff6590 x86/sev: Use kernel provided SVSM Calling Areas
        5dcc1e76 Merge tag 'kvm-x86-misc-6.11' of https://github.com/kvm-x86/linux
      
       into HEAD
        9a0d2f49 KVM: PPC: Book3S HV: Add one-reg interface for HASHPKEYR register
        e9eb790b KVM: PPC: Book3S HV: Add one-reg interface for HASHKEYR register
        1a1e6865 KVM: PPC: Book3S HV: Add one-reg interface for DEXCR register
      
      This should be used to beautify KVM syscall arguments and it addresses
      these tools/perf build warnings:
      
        Warning: Kernel ABI header differences:
        diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
        diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h
        diff -u tools/arch/x86/include/uapi/asm/svm.h arch/x86/include/uapi/asm/svm.h
        diff -u tools/arch/powerpc/include/uapi/asm/kvm.h arch/powerpc/include/uapi/asm/kvm.h
      
      Please see tools/include/uapi/README for details (it's in the first patch
      of this series).
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: kvm@vger.kernel.org
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      a625df39
  7. Aug 02, 2024
  8. Jul 10, 2024
    • Arnd Bergmann's avatar
      clone3: drop __ARCH_WANT_SYS_CLONE3 macro · 505d66d1
      Arnd Bergmann authored
      
      When clone3() was introduced, it was not obvious how each architecture
      deals with setting up the stack and keeping the register contents in
      a fork()-like system call, so this was left for the architecture
      maintainers to implement, with __ARCH_WANT_SYS_CLONE3 defined by those
      that already implement it.
      
      Five years later, we still have a few architectures left that are missing
      clone3(), and the macro keeps getting in the way as it's fundamentally
      different from all the other __ARCH_WANT_SYS_* macros that are meant
      to provide backwards-compatibility with applications using older
      syscalls that are no longer provided by default.
      
      Address this by reversing the polarity of the macro, adding an
      __ARCH_BROKEN_SYS_CLONE3 macro to all architectures that don't
      already provide the syscall, and remove __ARCH_WANT_SYS_CLONE3
      from all the other ones.
      
      Acked-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      505d66d1
  9. Jun 12, 2024
  10. Jun 04, 2024
    • Arnaldo Carvalho de Melo's avatar
      tools headers arm64: Sync arm64's cputype.h with the kernel sources · dc6abbbd
      Arnaldo Carvalho de Melo authored
      
      To get the changes in:
      
        0ce85db6 ("arm64: cputype: Add Neoverse-V3 definitions")
        02a0a046 ("arm64: cputype: Add Cortex-X4 definitions")
        f4d9d9dc ("arm64: Add Neoverse-V2 part")
      
      That makes this perf source code to be rebuilt:
      
        CC      /tmp/build/perf-tools/util/arm-spe.o
      
      The changes in the above patch add MIDR_NEOVERSE_V[23] and
      MIDR_NEOVERSE_V1 is used in arm-spe.c, so probably we need to add those
      and perhaps MIDR_CORTEX_X4 to that array? Or maybe we need to leave this
      for later when this is all tested on those machines?
      
        static const struct midr_range neoverse_spe[] = {
                MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N1),
                MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2),
                MIDR_ALL_VERSIONS(MIDR_NEOVERSE_V1),
                {},
        };
      
      Mark Rutland recommended about arm-spe.c:
      
      "I would not touch this for now -- someone would have to go audit the
      TRMs to check that those other cores have the same encoding, and I think
      it'd be better to do that as a follow-up."
      
      That addresses this perf build warning:
      
        Warning: Kernel ABI header differences:
          diff -u tools/arch/arm64/include/asm/cputype.h arch/arm64/include/asm/cputype.h
      
      Acked-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Besar Wicaksono <bwicaksono@nvidia.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lore.kernel.org/lkml/Zl8cYk0Tai2fs7aM@x1
      
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      dc6abbbd
  11. May 28, 2024
    • Arnaldo Carvalho de Melo's avatar
      tools headers UAPI: Sync kvm headers with the kernel sources · 88e52051
      Arnaldo Carvalho de Melo authored
      To pick the changes in:
      
        4af663c2 ("KVM: SEV: Allow per-guest configuration of GHCB protocol version")
        4f5defae ("KVM: SEV: introduce KVM_SEV_INIT2 operation")
        26c44aa9 ("KVM: SEV: define VM types for SEV and SEV-ES")
        ac5c4802 ("KVM: SEV: publish supported VMSA features")
        651d61bc ("KVM: PPC: Fix documentation for ppc mmu caps")
      
      That don't change functionality in tools/perf, as no new ioctl is added
      for the 'perf trace' scripts to harvest.
      
      This addresses these perf build warnings:
      
        Warning: Kernel ABI header differences:
          diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
          diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Joel Stanley <joel@jms.id.au>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michael Roth <michael.roth@amd.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Link: https://lore.kernel.org/lkml/ZlYxAdHjyAkvGtMW@x1
      
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      88e52051
    • Arnaldo Carvalho de Melo's avatar
      tools arch x86: Sync the msr-index.h copy with the kernel sources · ac4b0690
      Arnaldo Carvalho de Melo authored
      To pick up the changes from these csets:
      
        53bc516a ("x86/msr: Move ARCH_CAP_XAPIC_DISABLE bit definition to its rightful place")
      
      That patch just move definitions around, so this just silences this perf
      build warning:
      
        Warning: Kernel ABI header differences:
          diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov (AMD) <bp@alien8.de>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Link: https://lore.kernel.org/lkml/ZlYe8jOzd1_DyA7X@x1
      
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ac4b0690
  12. May 14, 2024
  13. May 02, 2024
    • Adrian Hunter's avatar
      x86/insn: Add support for APX EVEX instructions to the opcode map · 690ca3a3
      Adrian Hunter authored and Ingo Molnar's avatar Ingo Molnar committed
      
      To support APX functionality, the EVEX prefix is used to:
      
       - promote legacy instructions
       - promote VEX instructions
       - add new instructions
      
      Promoted VEX instructions require no extra annotation because the opcodes
      do not change and the permissive nature of the instruction decoder already
      allows them to have an EVEX prefix.
      
      Promoted legacy instructions and new instructions are placed in map 4 which
      has not been used before.
      
      Create a new table for map 4 and add APX instructions.
      
      Annotate SCALABLE instructions with "(es)" - refer to patch "x86/insn: Add
      support for APX EVEX to the instruction decoder logic". SCALABLE
      instructions must be represented in both no-prefix (NP) and 66 prefix
      forms.
      
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Link: https://lore.kernel.org/r/20240502105853.5338-9-adrian.hunter@intel.com
      690ca3a3
    • Adrian Hunter's avatar
      x86/insn: Add support for APX EVEX to the instruction decoder logic · 87bbaf1a
      Adrian Hunter authored and Ingo Molnar's avatar Ingo Molnar committed
      
      Intel Advanced Performance Extensions (APX) extends the EVEX prefix to
      support:
      
       - extended general purpose registers (EGPRs) i.e. r16 to r31
       - Push-Pop Acceleration (PPX) hints
       - new data destination (NDD) register
       - suppress status flags writes (NF) of common instructions
       - new instructions
      
      Refer to the Intel Advanced Performance Extensions (Intel APX) Architecture
      Specification for details.
      
      The extended EVEX prefix does not need amended instruction decoder logic,
      except in one area. Some instructions are defined as SCALABLE which means
      the EVEX.W bit and EVEX.pp bits are used to determine operand size.
      Specifically, if an instruction is SCALABLE and EVEX.W is zero, then
      EVEX.pp value 0 (representing no prefix NP) means default operand size,
      whereas EVEX.pp value 1 (representing 66 prefix) means operand size
      override i.e. 16 bits
      
      Add an attribute (INAT_EVEX_SCALABLE) to identify such instructions, and
      amend the logic appropriately.
      
      Amend the awk script that generates the attribute tables from the opcode
      map, to recognise "(es)" as attribute INAT_EVEX_SCALABLE.
      
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Link: https://lore.kernel.org/r/20240502105853.5338-8-adrian.hunter@intel.com
      87bbaf1a
    • Adrian Hunter's avatar
      x86/insn: x86/insn: Add support for REX2 prefix to the instruction decoder opcode map · 159039af
      Adrian Hunter authored and Ingo Molnar's avatar Ingo Molnar committed
      
      Support for REX2 has been added to the instruction decoder logic and the
      awk script that generates the attribute tables from the opcode map.
      
      Add REX2 prefix byte (0xD5) to the opcode map.
      
      Add annotation (!REX2) for map 0/1 opcodes that are reserved under REX2.
      
      Add JMPABS to the opcode map and add annotation (REX2) to identify that it
      has a mandatory REX2 prefix. A separate opcode attribute table is not
      needed at this time because JMPABS has the same attribute encoding as the
      MOV instruction that it shares an opcode with i.e. INAT_MOFFSET.
      
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Link: https://lore.kernel.org/r/20240502105853.5338-7-adrian.hunter@intel.com
      159039af
    • Adrian Hunter's avatar
      x86/insn: Add support for REX2 prefix to the instruction decoder logic · eada38d5
      Adrian Hunter authored and Ingo Molnar's avatar Ingo Molnar committed
      
      Intel Advanced Performance Extensions (APX) uses a new 2-byte prefix named
      REX2 to select extended general purpose registers (EGPRs) i.e. r16 to r31.
      
      The REX2 prefix is effectively an extended version of the REX prefix.
      
      REX2 and EVEX are also used with PUSH/POP instructions to provide a
      Push-Pop Acceleration (PPX) hint. With PPX hints, a CPU will attempt to
      fast-forward register data between matching PUSH and POP instructions.
      
      REX2 is valid only with opcodes in maps 0 and 1. Similar extension for
      other maps is provided by the EVEX prefix, covered in a separate patch.
      
      Some opcodes in maps 0 and 1 are reserved under REX2. One of these is used
      for a new 64-bit absolute direct jump instruction JMPABS.
      
      Refer to the Intel Advanced Performance Extensions (Intel APX) Architecture
      Specification for details.
      
      Define a code value for the REX2 prefix (INAT_PFX_REX2), and add attribute
      flags for opcodes reserved under REX2 (INAT_NO_REX2) and to identify
      opcodes (only JMPABS) that require a mandatory REX2 prefix
      (INAT_REX2_VARIANT).
      
      Amend logic to read the REX2 prefix and get the opcode attribute for the
      map number (0 or 1) encoded in the REX2 prefix.
      
      Amend the awk script that generates the attribute tables from the opcode
      map, to recognise "REX2" as attribute INAT_PFX_REX2, and "(!REX2)"
      as attribute INAT_NO_REX2, and "(REX2)" as attribute INAT_REX2_VARIANT.
      
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Link: https://lore.kernel.org/r/20240502105853.5338-6-adrian.hunter@intel.com
      eada38d5
    • Adrian Hunter's avatar
      x86/insn: Add misc new Intel instructions · 9dd36128
      Adrian Hunter authored and Ingo Molnar's avatar Ingo Molnar committed
      
      The x86 instruction decoder is used not only for decoding kernel
      instructions. It is also used by perf uprobes (user space probes) and by
      perf tools Intel Processor Trace decoding. Consequently, it needs to
      support instructions executed by user space also.
      
      Add instructions documented in Intel Architecture Instruction Set
      Extensions and Future Features Programming Reference March 2024
      319433-052, that have not been added yet:
      
      	AADD
      	AAND
      	AOR
      	AXOR
      	CMPccXADD
      	PBNDKB
      	RDMSRLIST
      	URDMSR
      	UWRMSR
      	VBCSTNEBF162PS
      	VBCSTNESH2PS
      	VCVTNEEBF162PS
      	VCVTNEEPH2PS
      	VCVTNEOBF162PS
      	VCVTNEOPH2PS
      	VCVTNEPS2BF16
      	VPDPB[SU,UU,SS]D[,S]
      	VPDPW[SU,US,UU]D[,S]
      	VPMADD52HUQ
      	VPMADD52LUQ
      	VSHA512MSG1
      	VSHA512MSG2
      	VSHA512RNDS2
      	VSM3MSG1
      	VSM3MSG2
      	VSM3RNDS2
      	VSM4KEY4
      	VSM4RNDS4
      	WRMSRLIST
      	TCMMIMFP16PS
      	TCMMRLFP16PS
      	TDPFP16PS
      	PREFETCHIT1
      	PREFETCHIT0
      
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Link: https://lore.kernel.org/r/20240502105853.5338-5-adrian.hunter@intel.com
      9dd36128
    • Adrian Hunter's avatar
      x86/insn: Add VEX versions of VPDPBUSD, VPDPBUSDS, VPDPWSSD and VPDPWSSDS · b8000264
      Adrian Hunter authored and Ingo Molnar's avatar Ingo Molnar committed
      
      The x86 instruction decoder is used not only for decoding kernel
      instructions. It is also used by perf uprobes (user space probes) and by
      perf tools Intel Processor Trace decoding. Consequently, it needs to
      support instructions executed by user space also.
      
      Intel Architecture Instruction Set Extensions and Future Features manual
      number 319433-044 of May 2021, documented VEX versions of instructions
      VPDPBUSD, VPDPBUSDS, VPDPWSSD and VPDPWSSDS, but the opcode map has them
      listed as EVEX only.
      
      Remove EVEX-only (ev) annotation from instructions VPDPBUSD, VPDPBUSDS,
      VPDPWSSD and VPDPWSSDS, which allows them to be decoded with either a VEX
      or EVEX prefix.
      
      Fixes: 0153d98f ("x86/insn: Add misc instructions to x86 instruction decoder")
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Link: https://lore.kernel.org/r/20240502105853.5338-4-adrian.hunter@intel.com
      b8000264
    • Adrian Hunter's avatar
      x86/insn: Fix PUSH instruction in x86 instruction decoder opcode map · 59162e0c
      Adrian Hunter authored and Ingo Molnar's avatar Ingo Molnar committed
      
      The x86 instruction decoder is used not only for decoding kernel
      instructions. It is also used by perf uprobes (user space probes) and by
      perf tools Intel Processor Trace decoding. Consequently, it needs to
      support instructions executed by user space also.
      
      Opcode 0x68 PUSH instruction is currently defined as 64-bit operand size
      only i.e. (d64). That was based on Intel SDM Opcode Map. However that is
      contradicted by the Instruction Set Reference section for PUSH in the
      same manual.
      
      Remove 64-bit operand size only annotation from opcode 0x68 PUSH
      instruction.
      
      Example:
      
        $ cat pushw.s
        .global  _start
        .text
        _start:
                pushw   $0x1234
                mov     $0x1,%eax   # system call number (sys_exit)
                int     $0x80
        $ as -o pushw.o pushw.s
        $ ld -s -o pushw pushw.o
        $ objdump -d pushw | tail -4
        0000000000401000 <.text>:
          401000:       66 68 34 12             pushw  $0x1234
          401004:       b8 01 00 00 00          mov    $0x1,%eax
          401009:       cd 80                   int    $0x80
        $ perf record -e intel_pt//u ./pushw
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.014 MB perf.data ]
      
       Before:
      
        $ perf script --insn-trace=disasm
        Warning:
        1 instruction trace errors
                 pushw   10349 [000] 10586.869237014:            401000 [unknown] (/home/ahunter/git/misc/rtit-tests/pushw)           pushw $0x1234
                 pushw   10349 [000] 10586.869237014:            401006 [unknown] (/home/ahunter/git/misc/rtit-tests/pushw)           addb %al, (%rax)
                 pushw   10349 [000] 10586.869237014:            401008 [unknown] (/home/ahunter/git/misc/rtit-tests/pushw)           addb %cl, %ch
                 pushw   10349 [000] 10586.869237014:            40100a [unknown] (/home/ahunter/git/misc/rtit-tests/pushw)           addb $0x2e, (%rax)
         instruction trace error type 1 time 10586.869237224 cpu 0 pid 10349 tid 10349 ip 0x40100d code 6: Trace doesn't match instruction
      
       After:
      
        $ perf script --insn-trace=disasm
                   pushw   10349 [000] 10586.869237014:            401000 [unknown] (./pushw)           pushw $0x1234
                   pushw   10349 [000] 10586.869237014:            401004 [unknown] (./pushw)           movl $1, %eax
      
      Fixes: eb13296c ("x86: Instruction decoder API")
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Link: https://lore.kernel.org/r/20240502105853.5338-3-adrian.hunter@intel.com
      59162e0c
    • Chang S. Bae's avatar
      x86/insn: Add Key Locker instructions to the opcode map · a5dd673a
      Chang S. Bae authored and Ingo Molnar's avatar Ingo Molnar committed
      
      The x86 instruction decoder needs to know these new instructions that
      are going to be used in the crypto library as well as the x86 core
      code. Add the following:
      
      LOADIWKEY:
      	Load a CPU-internal wrapping key.
      
      ENCODEKEY128:
      	Wrap a 128-bit AES key to a key handle.
      
      ENCODEKEY256:
      	Wrap a 256-bit AES key to a key handle.
      
      AESENC128KL:
      	Encrypt a 128-bit block of data using a 128-bit AES key
      	indicated by a key handle.
      
      AESENC256KL:
      	Encrypt a 128-bit block of data using a 256-bit AES key
      	indicated by a key handle.
      
      AESDEC128KL:
      	Decrypt a 128-bit block of data using a 128-bit AES key
      	indicated by a key handle.
      
      AESDEC256KL:
      	Decrypt a 128-bit block of data using a 256-bit AES key
      	indicated by a key handle.
      
      AESENCWIDE128KL:
      	Encrypt 8 128-bit blocks of data using a 128-bit AES key
      	indicated by a key handle.
      
      AESENCWIDE256KL:
      	Encrypt 8 128-bit blocks of data using a 256-bit AES key
      	indicated by a key handle.
      
      AESDECWIDE128KL:
      	Decrypt 8 128-bit blocks of data using a 128-bit AES key
      	indicated by a key handle.
      
      AESDECWIDE256KL:
      	Decrypt 8 128-bit blocks of data using a 256-bit AES key
      	indicated by a key handle.
      
      The detail can be found in Intel Software Developer Manual.
      
      Signed-off-by: default avatarChang S. Bae <chang.seok.bae@intel.com>
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Link: https://lore.kernel.org/r/20240502105853.5338-2-adrian.hunter@intel.com
      a5dd673a
  14. Apr 29, 2024
  15. Apr 28, 2024
  16. Apr 27, 2024
    • Arnaldo Carvalho de Melo's avatar
      tools headers x86 cpufeatures: Sync with the kernel sources to pick BHI mitigation changes · 8f211643
      Arnaldo Carvalho de Melo authored
      To pick the changes from:
      
        95a6ccbd ("x86/bhi: Mitigate KVM by default")
        ec9404e4 ("x86/bhi: Add BHI mitigation knob")
        be482ff9 ("x86/bhi: Enumerate Branch History Injection (BHI) bug")
        0f4a8376 ("x86/bhi: Define SPEC_CTRL_BHI_DIS_S")
        7390db8a ("x86/bhi: Add support for clearing branch history at syscall entry")
      
      This causes these perf files to be rebuilt and brings some X86_FEATURE
      that will be used when updating the copies of
      tools/arch/x86/lib/mem{cpy,set}_64.S with the kernel sources:
      
            CC       /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
            CC       /tmp/build/perf/bench/mem-memset-x86-64-asm.o
      
      And addresses this perf build warning:
      
        Warning: Kernel ABI header differences:
          diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Daniel Sneddon <daniel.sneddon@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: https://lore.kernel.org/lkml/ZirIx4kPtJwGFZS0@x1
      
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      8f211643
  17. Apr 23, 2024
    • Arnaldo Carvalho de Melo's avatar
      tools arch x86: Sync the msr-index.h copy with the kernel sources · b29781af
      Arnaldo Carvalho de Melo authored
      To pick up the changes from these csets:
      
        be482ff9 ("x86/bhi: Enumerate Branch History Injection (BHI) bug")
        0f4a8376 ("x86/bhi: Define SPEC_CTRL_BHI_DIS_S")
      
      That cause no changes to tooling:
      
        $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > x86_msr.before
        $ objdump -dS /tmp/build/perf-tools-next/util/amd-sample-raw.o > amd-sample-raw.o.before
        $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
        $ make -C tools/perf O=/tmp/build/perf-tools-next
        <SNIP>
        CC      /tmp/build/perf-tools-next/trace/beauty/tracepoints/x86_msr.o
        <SNIP>
        CC      /tmp/build/perf-tools-next/util/amd-sample-raw.o
        <SNIP>
        $ objdump -dS /tmp/build/perf-tools-next/util/amd-sample-raw.o > amd-sample-raw.o.after
        $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > x86_msr.after
        $ diff -u x86_msr.before x86_msr.after
        $ diff -u amd-sample-raw.o.before amd-sample-raw.o.after
      
      Just silences this perf build warning:
      
        Warning: Kernel ABI header differences:
          diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Daniel Sneddon <daniel.sneddon@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: https://lore.kernel.org/lkml/ZifCnEZFx5MZQuIW@x1
      
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b29781af
Loading