- Oct 02, 2024
-
-
Al Viro authored
asm/unaligned.h is always an include of asm-generic/unaligned.h; might as well move that thing to linux/unaligned.h and include that - there's nothing arch-specific in that header. auto-generated by the following: for i in `git grep -l -w asm/unaligned.h`; do sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i done for i in `git grep -l -w asm-generic/unaligned.h`; do sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i done git mv include/asm-generic/unaligned.h include/linux/unaligned.h git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
-
- Sep 20, 2024
-
-
Charlie Jenkins authored
Many of the other architectures use their custom barrier implementations. Use the barrier code from the kernel sources to optimize barriers in tools. Signed-off-by:
Charlie Jenkins <charlie@rivosinc.com> Reviewed-by:
Andrea Parri <parri.andrea@gmail.com> Link: https://lore.kernel.org/r/20240806-optimize_ring_buffer_read_riscv-v2-1-ca7e193ae198@rivosinc.com Signed-off-by:
Palmer Dabbelt <palmer@rivosinc.com>
-
- Sep 13, 2024
-
-
Provide the s390 specific vdso getrandom() architecture backend. _vdso_rng_data required data is placed within the _vdso_data vvar page, by using a hardcoded offset larger than vdso_data. As required the chacha20 implementation does not write to the stack. The implementation follows more or less the arm64 implementations and makes use of vector instructions. It has a fallback to the getrandom() system call for machines where the vector facility is not installed. The check if the vector facility is installed, as well as an optimization for machines with the vector-enhancements facility 2, is implemented with alternatives, avoiding runtime checks. Note that __kernel_getrandom() is implemented without the vdso user wrapper which would setup a stack frame for odd cases (aka very old glibc variants) where the caller has not done that. All callers of __kernel_getrandom() are required to setup a stack frame, like the C ABI requires it. The vdso testcases vdso_test_getrandom and vdso_test_chacha pass. Benchmark on a z16: $ ./vdso_test_getrandom bench-single vdso: 25000000 times in 0.493703559 seconds syscall: 25000000 times in 6.584025337 seconds Signed-off-by:
Heiko Carstens <hca@linux.ibm.com> Reviewed-by:
Harald Freudenberger <freude@linux.ibm.com> Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
To be consistent with other VDSO functions, the function is called __kernel_getrandom() __arch_chacha20_blocks_nostack() fonction is implemented basically with 32 bits operations. It performs 4 QUARTERROUND operations in parallele. There are enough registers to avoid using the stack: On input: r3: output bytes r4: 32-byte key input r5: 8-byte counter input/output r6: number of 64-byte blocks to write to output During operation: stack: pointer to counter (r5) and non-volatile registers (r14-131) r0: counter of blocks (initialised with r6) r4: Value '4' after key has been read, used for indexing r5-r12: key r14-r15: block counter r16-r31: chacha state At the end: r0, r6-r12: Zeroised r5, r14-r31: Restored Performance on powerpc 885 (using kernel selftest): ~# ./vdso_test_getrandom bench-single vdso: 25000000 times in 62.938002291 seconds libc: 25000000 times in 535.581916866 seconds syscall: 25000000 times in 531.525042806 seconds Performance on powerpc 8321 (using kernel selftest): ~# ./vdso_test_getrandom bench-single vdso: 25000000 times in 16.899318858 seconds libc: 25000000 times in 131.050596522 seconds syscall: 25000000 times in 129.794790389 seconds This first patch adds support for VDSO32. As selftests cannot easily be generated only for VDSO32, and because the following patch brings support for VDSO64 anyway, this patch opts out all code in __arch_chacha20_blocks_nostack() so that vdso_test_chacha will not fail to compile and will not crash on PPC64/PPC64LE, allthough the selftest itself will fail. Signed-off-by:
Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
Hook up the generic vDSO implementation to the aarch64 vDSO data page. The _vdso_rng_data required data is placed within the _vdso_data vvar page, by using a offset larger than the vdso_data. The vDSO function requires a ChaCha20 implementation that does not write to the stack, and that can do an entire ChaCha20 permutation. The one provided uses NEON on the permute operation, with a fallback to the syscall for chips that do not support AdvSIMD. This also passes the vdso_test_chacha test along with vdso_test_getrandom. The vdso_test_getrandom bench-single result on Neoverse-N1 shows: vdso: 25000000 times in 0.783884250 seconds libc: 25000000 times in 8.780275399 seconds syscall: 25000000 times in 8.786581518 seconds A small fixup to arch/arm64/include/asm/mman.h was required to avoid pulling kernel code into the vDSO, similar to what's already done in arch/arm64/include/asm/rwonce.h. Signed-off-by:
Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by:
Ard Biesheuvel <ardb@kernel.org> Acked-by:
Will Deacon <will@kernel.org> Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
Hook up the generic vDSO implementation to the LoongArch vDSO data page by providing the required __arch_chacha20_blocks_nostack, __arch_get_k_vdso_rng_data, and getrandom_syscall implementations. Also wire up the selftests. Signed-off-by:
Xi Ruoyao <xry111@xry111.site> Acked-by:
Huacai Chen <chenhuacai@kernel.org> Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
- Aug 30, 2024
-
-
Architectures use different location for vDSO sources: arch/mips/vdso arch/sparc/vdso arch/arm64/kernel/vdso arch/riscv/kernel/vdso arch/csky/kernel/vdso arch/x86/um/vdso arch/x86/entry/vdso arch/powerpc/kernel/vdso arch/arm/vdso arch/loongarch/vdso Don't hard-code vdso sources location in selftest Makefile. Instead create a vdso/ symbolic link in tools/arch/$arch/ and update Makefile accordingly. Signed-off-by:
Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
- Aug 07, 2024
-
-
Namhyung Kim authored
To pick up changes from: 9ef54a38 arm64: cputype: Add Cortex-A725 definitions 58d245e0 arm64: cputype: Add Cortex-X1C definitions fd2ff5f0 arm64: cputype: Add Cortex-X925 definitions add332c4 arm64: cputype: Add Cortex-A720 definitions be5a6f23 arm64: cputype: Add Cortex-X3 definitions This should be used to beautify x86 syscall arguments and it addresses these tools/perf build warnings: Warning: Kernel ABI header differences: diff -u tools/arch/arm64/include/asm/cputype.h arch/arm64/include/asm/cputype.h Please see tools/include/uapi/README for details (it's in the first patch of this series). Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Signed-off-by:
Namhyung Kim <namhyung@kernel.org>
-
Namhyung Kim authored
To pick up changes from: 149fd471 perf/x86/intel: Support Perfmon MSRs aliasing 21b362cc x86/resctrl: Enable shared RMID mode on Sub-NUMA Cluster (SNC) systems 4f460bff cpufreq: acpi: move MSR_K7_HWCR_CPB_DIS_BIT into msr-index.h 7ea81936 x86/cpufeatures: Add HWP highest perf change feature flag 78ce84b9 x86/cpufeatures: Flip the /proc/cpuinfo appearance logic 1beb348d x86/sev: Provide SVSM discovery support This should be used to beautify x86 syscall arguments and it addresses these tools/perf build warnings: Warning: Kernel ABI header differences: diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h Please see tools/include/uapi/README for details (it's in the first patch of this series). Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Signed-off-by:
Namhyung Kim <namhyung@kernel.org>
-
- Aug 06, 2024
-
-
Namhyung Kim authored
And other arch-specific UAPI headers to pick up changes from: 4b23e0c1 KVM: Ensure new code that references immediate_exit gets extra scrutiny 85542adb KVM: x86: Add KVM_RUN_X86_GUEST_MODE kvm_run flag 6fef5185 KVM: x86: Add a capability to configure bus frequency for APIC timer 34ff6590 x86/sev: Use kernel provided SVSM Calling Areas 5dcc1e76 Merge tag 'kvm-x86-misc-6.11' of https://github.com/kvm-x86/linux into HEAD 9a0d2f49 KVM: PPC: Book3S HV: Add one-reg interface for HASHPKEYR register e9eb790b KVM: PPC: Book3S HV: Add one-reg interface for HASHKEYR register 1a1e6865 KVM: PPC: Book3S HV: Add one-reg interface for DEXCR register This should be used to beautify KVM syscall arguments and it addresses these tools/perf build warnings: Warning: Kernel ABI header differences: diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h diff -u tools/arch/x86/include/uapi/asm/svm.h arch/x86/include/uapi/asm/svm.h diff -u tools/arch/powerpc/include/uapi/asm/kvm.h arch/powerpc/include/uapi/asm/kvm.h Please see tools/include/uapi/README for details (it's in the first patch of this series). Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: kvm@vger.kernel.org Signed-off-by:
Namhyung Kim <namhyung@kernel.org>
-
- Aug 02, 2024
-
-
Ahmed S. Darwish authored
For parsing the cpuid bitfields, kcpuid uses an incomplete CSV file with 300+ bitfields. Use an auto-generated CSV file from the x86-cpuid.org project instead. It provides complete bitfields coverage: 830+ bitfields, all with proper descriptions. The auto-generated file has the following blurb automatically added: # SPDX-License-Identifier: CC0-1.0 # Generator: x86-cpuid-db v1.0 The generator tag includes the project's workspace "git describe" version string. It is intended for projects like KernelCI, to aid in verifying that the auto-generated files have not been tampered with. The file also has the blurb: # Auto-generated file. # Please submit all updates and bugfixes to https://x86-cpuid.org It's thus kindly requested that the Linux kernel's x86 tree maintainers enforce sending all updates to x86-cpuid.org's upstream database first, thus benefiting the whole ecosystem. Signed-off-by:
Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Link: https://gitlab.com/x86-cpuid.org/x86-cpuid-db/-/blob/v1.0/LICENSE.rst Link: https://gitlab.com/x86-cpuid.org/x86-cpuid-db Link: https://lore.kernel.org/all/20240718134755.378115-9-darwi@linutronix.de
-
Ahmed S. Darwish authored
It's a common pattern in cpuid leaves to have the same bitfields format repeated across a number of subleaves. Typically, this is used for enumerating hierarchial structures like cache and TLB levels, CPU topology levels, etc. Modify kcpuid.c to handle subleaf ranges in the CSV file subleaves column. For example, make it able to parse lines in the form: # LEAF, SUBLEAVES, reg, bits, short_name , ... 0xb, 1:0, eax, 4:0, x2apic_id_shift , ... 0xb, 1:0, ebx, 15:0, domain_lcpus_count , ... 0xb, 1:0, ecx, 7:0, domain_nr , ... This way, full output can be printed to the user. Signed-off-by:
Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20240718134755.378115-8-darwi@linutronix.de
-
Ahmed S. Darwish authored
cpuid.csv will be extended in further commits with all-publicly-known CPUID leaves and bitfields. Thus, modify has_subleafs() to identify all known leaves with subleaves. Remove the redundant "is_amd" check since all x86 vendors already report the maxium supported extended leaf at leaf 0x80000000 EAX register. The extra mentioned leaves are: - Leaf 0x12, Intel Software Guard Extensions (SGX) enumeration - Leaf 0x14, Intel process trace (PT) enumeration - Leaf 0x17, Intel SoC vendor attributes enumeration - Leaf 0x1b, Intel PCONFIG (Platform configuration) enumeration - Leaf 0x1d, Intel AMX (Advanced Matrix Extensions) tile information - Leaf 0x1f, Intel v2 extended topology enumeration - Leaf 0x23, Intel ArchPerfmonExt (Architectural PMU ext) enumeration - Leaf 0x80000020, AMD Platform QoS extended features enumeration - Leaf 0x80000026, AMD v2 extended topology enumeration Set the 'max_subleaf' variable for all the newly marked leaves with extra subleaves. Ideally, this should be fetched from the CSV file instead, but the current kcpuid code architecture has two runs: one run to serially invoke the cpuid instructions and save all the output in-memory, and one run to parse this in-memory output through the CSV specification. Signed-off-by:
Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20240718134755.378115-7-darwi@linutronix.de
-
Ahmed S. Darwish authored
While parsing and saving bitfield names from the CSV file, an extra leading space is copied verbatim. That extra space is not a big issue now, but further commits will add a new CSV file with much more padding for the bitfield's name column. Strip leading/trailing whitespaces while saving bitfield names. Signed-off-by:
Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20240718134755.378115-6-darwi@linutronix.de
-
Ahmed S. Darwish authored
Protect against the kcpuid code parsing faulty max subleaf numbers through a min() expression. Thus, ensuring that max_subleaf will always be ≤ MAX_SUBLEAF_NUM. Use "u32" for the subleaf numbers since kcpuid is compiled with -Wextra, which includes signed/unsigned comparisons warnings. Signed-off-by:
Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20240718134755.378115-5-darwi@linutronix.de
-
Ahmed S. Darwish authored
cpuid.csv will be extended in further commits with all-publicly-known CPUID leaves and bitfields. One of the new leaves is 0xd for extended CPU state enumeration. Depending on XCR0 dword bits, it can export up to 64 subleaves. Set kcpuid.c MAX_SUBLEAF_NUM to 64. Signed-off-by:
Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20240718134755.378115-4-darwi@linutronix.de
-
Ahmed S. Darwish authored
When kcpuid is invoked with "--all --details", the detailed description column is not properly aligned for all bitfield rows: CPUID_0x4_ECX[0x0]: cache_level : 0x1 - Cache Level ... cache_self_init - Cache Self Initialization This is due to differences in output handling between boolean single-bit "bitflags" and multi-bit bitfields. For the former, the bitfield's value is not outputted as it is implied to be true by just outputting the bitflag's name in its respective line. If long descriptions were requested through the --all parameter, properly align the bitflag's description columns through extra tabs. With that, the sample output above becomes: CPUID_0x4_ECX[0x0]: cache_level : 0x1 - Cache Level ... cache_self_init - Cache Self Initialization Signed-off-by:
Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20240718134755.378115-3-darwi@linutronix.de
-
Ahmed S. Darwish authored
Global variable "num_leafs" is set in multiple places but is never read anywhere. Remove it. Signed-off-by:
Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20240718134755.378115-2-darwi@linutronix.de
-
- Jul 10, 2024
-
-
Arnd Bergmann authored
When clone3() was introduced, it was not obvious how each architecture deals with setting up the stack and keeping the register contents in a fork()-like system call, so this was left for the architecture maintainers to implement, with __ARCH_WANT_SYS_CLONE3 defined by those that already implement it. Five years later, we still have a few architectures left that are missing clone3(), and the macro keeps getting in the way as it's fundamentally different from all the other __ARCH_WANT_SYS_* macros that are meant to provide backwards-compatibility with applications using older syscalls that are no longer provided by default. Address this by reversing the polarity of the macro, adding an __ARCH_BROKEN_SYS_CLONE3 macro to all architectures that don't already provide the syscall, and remove __ARCH_WANT_SYS_CLONE3 from all the other ones. Acked-by:
Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by:
Arnd Bergmann <arnd@arndb.de>
-
- Jun 12, 2024
-
-
Christian Heusel authored
So far the Makefile just installed the csv into $(HWDATADIR)/cpuid.csv, which made it unaware about $DESTDIR. Add $DESTDIR to the install command and while at it also create the directory, should it not exist already. This eases the packaging of kcpuid and allows i.e. for the install on Arch to look like this: $ make BINDIR=/usr/bin DESTDIR="$pkgdir" -C tools/arch/x86/kcpuid install Some background on DESTDIR: DESTDIR is commonly used in packaging for staged installs (regardless of the used package manager): https://www.gnu.org/prep/standards/html_node/DESTDIR.html So the package is built and installed into a directory which the package manager later picks up and creates some archive from it. What is specific to Arch Linux here is only the usage of $pkgdir in the example, DESTDIR itself is widely used. [ bp: Extend the commit message with Christian's info on DESTDIR as a GNU coding standards thing. ] Signed-off-by:
Christian Heusel <christian@heusel.eu> Signed-off-by:
Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240531111757.719528-2-christian@heusel.eu
-
- Jun 04, 2024
-
-
Arnaldo Carvalho de Melo authored
To get the changes in: 0ce85db6 ("arm64: cputype: Add Neoverse-V3 definitions") 02a0a046 ("arm64: cputype: Add Cortex-X4 definitions") f4d9d9dc ("arm64: Add Neoverse-V2 part") That makes this perf source code to be rebuilt: CC /tmp/build/perf-tools/util/arm-spe.o The changes in the above patch add MIDR_NEOVERSE_V[23] and MIDR_NEOVERSE_V1 is used in arm-spe.c, so probably we need to add those and perhaps MIDR_CORTEX_X4 to that array? Or maybe we need to leave this for later when this is all tested on those machines? static const struct midr_range neoverse_spe[] = { MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N1), MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2), MIDR_ALL_VERSIONS(MIDR_NEOVERSE_V1), {}, }; Mark Rutland recommended about arm-spe.c: "I would not touch this for now -- someone would have to go audit the TRMs to check that those other cores have the same encoding, and I think it'd be better to do that as a follow-up." That addresses this perf build warning: Warning: Kernel ABI header differences: diff -u tools/arch/arm64/include/asm/cputype.h arch/arm64/include/asm/cputype.h Acked-by:
Mark Rutland <mark.rutland@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Besar Wicaksono <bwicaksono@nvidia.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/lkml/Zl8cYk0Tai2fs7aM@x1 Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com>
-
- May 28, 2024
-
-
Arnaldo Carvalho de Melo authored
To pick the changes in: 4af663c2 ("KVM: SEV: Allow per-guest configuration of GHCB protocol version") 4f5defae ("KVM: SEV: introduce KVM_SEV_INIT2 operation") 26c44aa9 ("KVM: SEV: define VM types for SEV and SEV-ES") ac5c4802 ("KVM: SEV: publish supported VMSA features") 651d61bc ("KVM: PPC: Fix documentation for ppc mmu caps") That don't change functionality in tools/perf, as no new ioctl is added for the 'perf trace' scripts to harvest. This addresses these perf build warnings: Warning: Kernel ABI header differences: diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Joel Stanley <joel@jms.id.au> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michael Roth <michael.roth@amd.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/lkml/ZlYxAdHjyAkvGtMW@x1 Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
To pick up the changes from these csets: 53bc516a ("x86/msr: Move ARCH_CAP_XAPIC_DISABLE bit definition to its rightful place") That patch just move definitions around, so this just silences this perf build warning: Warning: Kernel ABI header differences: diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Link: https://lore.kernel.org/lkml/ZlYe8jOzd1_DyA7X@x1 Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com>
-
- May 14, 2024
-
-
Hans de Goede authored
Dell All In One (AIO) models released after 2017 use a backlight controller board connected to an UART. Add a small emulator to allow development and testing of the drivers/platform/x86/dell/dell-uart-backlight.c driver for this board, without requiring access to an actual Dell All In One. Signed-off-by:
Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20240513144603.93874-3-hdegoede@redhat.com
-
- May 02, 2024
-
-
To support APX functionality, the EVEX prefix is used to: - promote legacy instructions - promote VEX instructions - add new instructions Promoted VEX instructions require no extra annotation because the opcodes do not change and the permissive nature of the instruction decoder already allows them to have an EVEX prefix. Promoted legacy instructions and new instructions are placed in map 4 which has not been used before. Create a new table for map 4 and add APX instructions. Annotate SCALABLE instructions with "(es)" - refer to patch "x86/insn: Add support for APX EVEX to the instruction decoder logic". SCALABLE instructions must be represented in both no-prefix (NP) and 66 prefix forms. Signed-off-by:
Adrian Hunter <adrian.hunter@intel.com> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240502105853.5338-9-adrian.hunter@intel.com
-
Intel Advanced Performance Extensions (APX) extends the EVEX prefix to support: - extended general purpose registers (EGPRs) i.e. r16 to r31 - Push-Pop Acceleration (PPX) hints - new data destination (NDD) register - suppress status flags writes (NF) of common instructions - new instructions Refer to the Intel Advanced Performance Extensions (Intel APX) Architecture Specification for details. The extended EVEX prefix does not need amended instruction decoder logic, except in one area. Some instructions are defined as SCALABLE which means the EVEX.W bit and EVEX.pp bits are used to determine operand size. Specifically, if an instruction is SCALABLE and EVEX.W is zero, then EVEX.pp value 0 (representing no prefix NP) means default operand size, whereas EVEX.pp value 1 (representing 66 prefix) means operand size override i.e. 16 bits Add an attribute (INAT_EVEX_SCALABLE) to identify such instructions, and amend the logic appropriately. Amend the awk script that generates the attribute tables from the opcode map, to recognise "(es)" as attribute INAT_EVEX_SCALABLE. Signed-off-by:
Adrian Hunter <adrian.hunter@intel.com> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240502105853.5338-8-adrian.hunter@intel.com
-
Support for REX2 has been added to the instruction decoder logic and the awk script that generates the attribute tables from the opcode map. Add REX2 prefix byte (0xD5) to the opcode map. Add annotation (!REX2) for map 0/1 opcodes that are reserved under REX2. Add JMPABS to the opcode map and add annotation (REX2) to identify that it has a mandatory REX2 prefix. A separate opcode attribute table is not needed at this time because JMPABS has the same attribute encoding as the MOV instruction that it shares an opcode with i.e. INAT_MOFFSET. Signed-off-by:
Adrian Hunter <adrian.hunter@intel.com> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240502105853.5338-7-adrian.hunter@intel.com
-
Intel Advanced Performance Extensions (APX) uses a new 2-byte prefix named REX2 to select extended general purpose registers (EGPRs) i.e. r16 to r31. The REX2 prefix is effectively an extended version of the REX prefix. REX2 and EVEX are also used with PUSH/POP instructions to provide a Push-Pop Acceleration (PPX) hint. With PPX hints, a CPU will attempt to fast-forward register data between matching PUSH and POP instructions. REX2 is valid only with opcodes in maps 0 and 1. Similar extension for other maps is provided by the EVEX prefix, covered in a separate patch. Some opcodes in maps 0 and 1 are reserved under REX2. One of these is used for a new 64-bit absolute direct jump instruction JMPABS. Refer to the Intel Advanced Performance Extensions (Intel APX) Architecture Specification for details. Define a code value for the REX2 prefix (INAT_PFX_REX2), and add attribute flags for opcodes reserved under REX2 (INAT_NO_REX2) and to identify opcodes (only JMPABS) that require a mandatory REX2 prefix (INAT_REX2_VARIANT). Amend logic to read the REX2 prefix and get the opcode attribute for the map number (0 or 1) encoded in the REX2 prefix. Amend the awk script that generates the attribute tables from the opcode map, to recognise "REX2" as attribute INAT_PFX_REX2, and "(!REX2)" as attribute INAT_NO_REX2, and "(REX2)" as attribute INAT_REX2_VARIANT. Signed-off-by:
Adrian Hunter <adrian.hunter@intel.com> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240502105853.5338-6-adrian.hunter@intel.com
-
The x86 instruction decoder is used not only for decoding kernel instructions. It is also used by perf uprobes (user space probes) and by perf tools Intel Processor Trace decoding. Consequently, it needs to support instructions executed by user space also. Add instructions documented in Intel Architecture Instruction Set Extensions and Future Features Programming Reference March 2024 319433-052, that have not been added yet: AADD AAND AOR AXOR CMPccXADD PBNDKB RDMSRLIST URDMSR UWRMSR VBCSTNEBF162PS VBCSTNESH2PS VCVTNEEBF162PS VCVTNEEPH2PS VCVTNEOBF162PS VCVTNEOPH2PS VCVTNEPS2BF16 VPDPB[SU,UU,SS]D[,S] VPDPW[SU,US,UU]D[,S] VPMADD52HUQ VPMADD52LUQ VSHA512MSG1 VSHA512MSG2 VSHA512RNDS2 VSM3MSG1 VSM3MSG2 VSM3RNDS2 VSM4KEY4 VSM4RNDS4 WRMSRLIST TCMMIMFP16PS TCMMRLFP16PS TDPFP16PS PREFETCHIT1 PREFETCHIT0 Signed-off-by:
Adrian Hunter <adrian.hunter@intel.com> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240502105853.5338-5-adrian.hunter@intel.com
-
The x86 instruction decoder is used not only for decoding kernel instructions. It is also used by perf uprobes (user space probes) and by perf tools Intel Processor Trace decoding. Consequently, it needs to support instructions executed by user space also. Intel Architecture Instruction Set Extensions and Future Features manual number 319433-044 of May 2021, documented VEX versions of instructions VPDPBUSD, VPDPBUSDS, VPDPWSSD and VPDPWSSDS, but the opcode map has them listed as EVEX only. Remove EVEX-only (ev) annotation from instructions VPDPBUSD, VPDPBUSDS, VPDPWSSD and VPDPWSSDS, which allows them to be decoded with either a VEX or EVEX prefix. Fixes: 0153d98f ("x86/insn: Add misc instructions to x86 instruction decoder") Signed-off-by:
Adrian Hunter <adrian.hunter@intel.com> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240502105853.5338-4-adrian.hunter@intel.com
-
The x86 instruction decoder is used not only for decoding kernel instructions. It is also used by perf uprobes (user space probes) and by perf tools Intel Processor Trace decoding. Consequently, it needs to support instructions executed by user space also. Opcode 0x68 PUSH instruction is currently defined as 64-bit operand size only i.e. (d64). That was based on Intel SDM Opcode Map. However that is contradicted by the Instruction Set Reference section for PUSH in the same manual. Remove 64-bit operand size only annotation from opcode 0x68 PUSH instruction. Example: $ cat pushw.s .global _start .text _start: pushw $0x1234 mov $0x1,%eax # system call number (sys_exit) int $0x80 $ as -o pushw.o pushw.s $ ld -s -o pushw pushw.o $ objdump -d pushw | tail -4 0000000000401000 <.text>: 401000: 66 68 34 12 pushw $0x1234 401004: b8 01 00 00 00 mov $0x1,%eax 401009: cd 80 int $0x80 $ perf record -e intel_pt//u ./pushw [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.014 MB perf.data ] Before: $ perf script --insn-trace=disasm Warning: 1 instruction trace errors pushw 10349 [000] 10586.869237014: 401000 [unknown] (/home/ahunter/git/misc/rtit-tests/pushw) pushw $0x1234 pushw 10349 [000] 10586.869237014: 401006 [unknown] (/home/ahunter/git/misc/rtit-tests/pushw) addb %al, (%rax) pushw 10349 [000] 10586.869237014: 401008 [unknown] (/home/ahunter/git/misc/rtit-tests/pushw) addb %cl, %ch pushw 10349 [000] 10586.869237014: 40100a [unknown] (/home/ahunter/git/misc/rtit-tests/pushw) addb $0x2e, (%rax) instruction trace error type 1 time 10586.869237224 cpu 0 pid 10349 tid 10349 ip 0x40100d code 6: Trace doesn't match instruction After: $ perf script --insn-trace=disasm pushw 10349 [000] 10586.869237014: 401000 [unknown] (./pushw) pushw $0x1234 pushw 10349 [000] 10586.869237014: 401004 [unknown] (./pushw) movl $1, %eax Fixes: eb13296c ("x86: Instruction decoder API") Signed-off-by:
Adrian Hunter <adrian.hunter@intel.com> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240502105853.5338-3-adrian.hunter@intel.com
-
The x86 instruction decoder needs to know these new instructions that are going to be used in the crypto library as well as the x86 core code. Add the following: LOADIWKEY: Load a CPU-internal wrapping key. ENCODEKEY128: Wrap a 128-bit AES key to a key handle. ENCODEKEY256: Wrap a 256-bit AES key to a key handle. AESENC128KL: Encrypt a 128-bit block of data using a 128-bit AES key indicated by a key handle. AESENC256KL: Encrypt a 128-bit block of data using a 256-bit AES key indicated by a key handle. AESDEC128KL: Decrypt a 128-bit block of data using a 128-bit AES key indicated by a key handle. AESDEC256KL: Decrypt a 128-bit block of data using a 256-bit AES key indicated by a key handle. AESENCWIDE128KL: Encrypt 8 128-bit blocks of data using a 128-bit AES key indicated by a key handle. AESENCWIDE256KL: Encrypt 8 128-bit blocks of data using a 256-bit AES key indicated by a key handle. AESDECWIDE128KL: Decrypt 8 128-bit blocks of data using a 128-bit AES key indicated by a key handle. AESDECWIDE256KL: Decrypt 8 128-bit blocks of data using a 256-bit AES key indicated by a key handle. The detail can be found in Intel Software Developer Manual. Signed-off-by:
Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by:
Adrian Hunter <adrian.hunter@intel.com> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Reviewed-by:
Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/20240502105853.5338-2-adrian.hunter@intel.com
-
- Apr 29, 2024
-
-
David E. Box authored
Add support to read the 'meter_current' file. The display is the same as the 'meter_certificate', but will show the current snapshot of the counters. Signed-off-by:
David E. Box <david.e.box@linux.intel.com> Reviewed-by:
Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by:
Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20240411025856.2782476-10-david.e.box@linux.intel.com Signed-off-by:
Hans de Goede <hdegoede@redhat.com>
-
David E. Box authored
Add #define for feature length and move NUL assignment from callers to get_feature(). Signed-off-by:
David E. Box <david.e.box@linux.intel.com> Reviewed-by:
Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by:
Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20240411025856.2782476-9-david.e.box@linux.intel.com Signed-off-by:
Hans de Goede <hdegoede@redhat.com>
-
David E. Box authored
Fix errors in the calculation of the start position of the counters and in the display loop. While here, use a #define for the bundle count and size. Fixes: 7fdc03a7 ("tools/arch/x86: intel_sdsi: Add support for reading meter certificates") Signed-off-by:
David E. Box <david.e.box@linux.intel.com> Reviewed-by:
Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by:
Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20240411025856.2782476-8-david.e.box@linux.intel.com Signed-off-by:
Hans de Goede <hdegoede@redhat.com>
-
David E. Box authored
Fixes sdsi_meter_cert_show() to correctly decode and display the meter certificate output. Adds and displays a missing version field, displays the ASCII name of the signature, and fixes the print alignment. Fixes: 7fdc03a7 ("tools/arch/x86: intel_sdsi: Add support for reading meter certificates") Signed-off-by:
David E. Box <david.e.box@linux.intel.com> Reviewed-by:
Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by:
Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20240411025856.2782476-7-david.e.box@linux.intel.com Signed-off-by:
Hans de Goede <hdegoede@redhat.com>
-
David E. Box authored
The maximum number of bundles in the meter certificate was set to 8 which is much less than the maximum. Instead, since the bundles appear at the end of the file, set it based on the remaining file size from the bundle start position. Fixes: 7fdc03a7 ("tools/arch/x86: intel_sdsi: Add support for reading meter certificates") Signed-off-by:
David E. Box <david.e.box@linux.intel.com> Reviewed-by:
Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20240411025856.2782476-6-david.e.box@linux.intel.com Signed-off-by:
Hans de Goede <hdegoede@redhat.com>
-
- Apr 28, 2024
-
-
Shiqi Liu authored
Fix left shift overflow issue when the parameter idx is greater than or equal to 8 in the calculation of perm in PIRx_ELx_PERM macro. Fix this by modifying the encoding to use a long integer type. Signed-off-by:
Shiqi Liu <shiqiliu@hust.edu.cn> Acked-by:
Marc Zyngier <maz@kernel.org> Reviewed-by:
Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20240421063328.29710-1-shiqiliu@hust.edu.cn Signed-off-by:
Will Deacon <will@kernel.org>
-
- Apr 27, 2024
-
-
Arnaldo Carvalho de Melo authored
To pick the changes from: 95a6ccbd ("x86/bhi: Mitigate KVM by default") ec9404e4 ("x86/bhi: Add BHI mitigation knob") be482ff9 ("x86/bhi: Enumerate Branch History Injection (BHI) bug") 0f4a8376 ("x86/bhi: Define SPEC_CTRL_BHI_DIS_S") 7390db8a ("x86/bhi: Add support for clearing branch history at syscall entry") This causes these perf files to be rebuilt and brings some X86_FEATURE that will be used when updating the copies of tools/arch/x86/lib/mem{cpy,set}_64.S with the kernel sources: CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o CC /tmp/build/perf/bench/mem-memset-x86-64-asm.o And addresses this perf build warning: Warning: Kernel ABI header differences: diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Daniel Sneddon <daniel.sneddon@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/lkml/ZirIx4kPtJwGFZS0@x1 Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com>
-
- Apr 23, 2024
-
-
Arnaldo Carvalho de Melo authored
To pick up the changes from these csets: be482ff9 ("x86/bhi: Enumerate Branch History Injection (BHI) bug") 0f4a8376 ("x86/bhi: Define SPEC_CTRL_BHI_DIS_S") That cause no changes to tooling: $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > x86_msr.before $ objdump -dS /tmp/build/perf-tools-next/util/amd-sample-raw.o > amd-sample-raw.o.before $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h $ make -C tools/perf O=/tmp/build/perf-tools-next <SNIP> CC /tmp/build/perf-tools-next/trace/beauty/tracepoints/x86_msr.o <SNIP> CC /tmp/build/perf-tools-next/util/amd-sample-raw.o <SNIP> $ objdump -dS /tmp/build/perf-tools-next/util/amd-sample-raw.o > amd-sample-raw.o.after $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > x86_msr.after $ diff -u x86_msr.before x86_msr.after $ diff -u amd-sample-raw.o.before amd-sample-raw.o.after Just silences this perf build warning: Warning: Kernel ABI header differences: diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Daniel Sneddon <daniel.sneddon@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/lkml/ZifCnEZFx5MZQuIW@x1 Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com>
-