- Nov 24, 2022
-
-
Steven Rostedt (Google) authored
In order to make sure that a timer is not re-armed after it is stopped before freeing, a new shutdown state is added to the timer code. The API timer_shutdown_sync() and timer_shutdown() must be called before the object that holds the timer can be freed. Update the documentation to reflect this new workflow. [ tglx: Updated to the new semantics and updated the zh_CN version ] Signed-off-by:
Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Tested-by:
Guenter Roeck <linux@roeck-us.net> Reviewed-by:
Jacob Keller <jacob.e.keller@intel.com> Reviewed-by:
Anna-Maria Behnsen <anna-maria@linutronix.de> Link: https://lore.kernel.org/r/20221110064147.712934793@goodmis.org Link: https://lore.kernel.org/r/20221123201625.375284489@linutronix.de
-
Thomas Gleixner authored
Adjust to the new preferred function names. Suggested-by:
Steven Rostedt <rostedt@goodmis.org> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Jacob Keller <jacob.e.keller@intel.com> Reviewed-by:
Anna-Maria Behnsen <anna-maria@linutronix.de> Link: https://lore.kernel.org/r/20221123201625.075320635@linutronix.de
-
- Nov 21, 2022
-
-
Liam Beguin authored
Fix the kernel-doc markings for div64 functions to point to the header file instead of the lib/ directory. This avoids having implementation specific comments in generic documentation. Furthermore, given that some kernel-doc comments are identical, drop them from lib/math64 and only keep there comments that add implementation details. Signed-off-by:
Liam Beguin <liambeguin@gmail.com> Acked-by:
Randy Dunlap <rdunlap@infradead.org> Tested-by:
Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20221118182309.3824530-1-liambeguin@gmail.com Signed-off-by:
Jonathan Corbet <corbet@lwn.net>
-
- Oct 28, 2022
-
-
Kees Cook authored
While there were varying degrees of kern-doc for various str*()-family functions, many needed updating and clarification, or to just be entirely written. Update (and relocate) existing kern-doc and add missing functions, sadly shaking my head at how many times I have written "Do not use this function". Include the results in the core kernel API doc. Cc: Bagas Sanjaya <bagasdotme@gmail.com> Cc: Andy Shevchenko <andy@kernel.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-hardening@vger.kernel.org Tested-by:
Akira Yokosawa <akiyks@gmail.com> Link: https://lore.kernel.org/lkml/9b0cf584-01b3-3013-b800-1ef59fe82476@gmail.com Signed-off-by:
Kees Cook <keescook@chromium.org>
-
- Oct 25, 2022
-
-
Kees Cook authored
Fix the kern-doc markings for several of the overflow helpers and move their location into the core kernel API documentation, where it belongs (it's not driver-specific). Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Cc: linux-hardening@vger.kernel.org Reviewed-by:
Akira Yokosawa <akiyks@gmail.com> Signed-off-by:
Kees Cook <keescook@chromium.org>
-
- Oct 03, 2022
-
-
Miaohe Lin authored
Since commit dacb5d88 ("tcp: fix page frag corruption on page fault"), there's no caller of gfpflags_normal_context(). Remove it as this helper is strictly tied to the sk page frag usage and there won't be other user in the future. [linmiaohe@huawei.com: fix htmldocs] Link: https://lkml.kernel.org/r/1bc55727-9b66-0e9e-c306-f10c4716ea89@huawei.com Link: https://lkml.kernel.org/r/20220916072257.9639-16-linmiaohe@huawei.com Signed-off-by:
Miaohe Lin <linmiaohe@huawei.com> Reviewed-by:
David Hildenbrand <david@redhat.com> Reviewed-by:
Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by:
Oscar Salvador <osalvador@suse.de> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
- Sep 29, 2022
-
-
Jonathan Corbet authored
These files describe part of the core API, but have never been converted to RST due to ... let's say local oppposition. So, create a set of special-purpose wrappers to ..include these files into a separate page so that they can be a part of the htmldocs build. Then link them into the core-api manual and remove them from the "staging" dumping ground. Acked-by:
Jani Nikula <jani.nikula@intel.com> Signed-off-by:
Jonathan Corbet <corbet@lwn.net> Reviewed-by:
David Vernet <void@manifault.com> Acked-by:
Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20220927160559.97154-7-corbet@lwn.net Signed-off-by:
Jonathan Corbet <corbet@lwn.net>
-
Jonathan Corbet authored
This one file should not really be in the top-level documentation directory. core-api/ may not be a perfect fit but seems to be best, so move it there. Adjust a couple of internal document references to make them location-independent, and point checkpatch.pl at the new location. Cc: Jiri Slaby <jirislaby@kernel.org> Cc: Joe Perches <joe@perches.com> Reviewed-by:
David Vernet <void@manifault.com> Acked-by:
Jani Nikula <jani.nikula@intel.com> Signed-off-by:
Jonathan Corbet <corbet@lwn.net> Acked-by:
Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20220927160559.97154-6-corbet@lwn.net Signed-off-by:
Jonathan Corbet <corbet@lwn.net>
-
- Sep 28, 2022
-
-
Gary Guo authored
This patch adds a format specifier `%pA` to `vsprintf` which formats a pointer as `core::fmt::Arguments`. Doing so allows us to directly format to the internal buffer of `printf`, so we do not have to use a temporary buffer on the stack to pre-assemble the message on the Rust side. This specifier is intended only to be used from Rust and not for C, so `checkpatch.pl` is intentionally unchanged to catch any misuse. Reviewed-by:
Kees Cook <keescook@chromium.org> Acked-by:
Petr Mladek <pmladek@suse.com> Reviewed-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Co-developed-by:
Alex Gaynor <alex.gaynor@gmail.com> Signed-off-by:
Alex Gaynor <alex.gaynor@gmail.com> Co-developed-by:
Wedson Almeida Filho <wedsonaf@google.com> Signed-off-by:
Wedson Almeida Filho <wedsonaf@google.com> Signed-off-by:
Gary Guo <gary@garyguo.net> Co-developed-by:
Miguel Ojeda <ojeda@kernel.org> Signed-off-by:
Miguel Ojeda <ojeda@kernel.org>
-
- Sep 27, 2022
-
-
Akhil Raj authored
I have removed repeated `the` inside the documentation Signed-off-by:
Akhil Raj <lf32.dev@gmail.com> Link: https://lore.kernel.org/r/20220827145359.32599-1-lf32.dev@gmail.com Signed-off-by:
Jonathan Corbet <corbet@lwn.net>
-
Liam R. Howlett authored
Patch series "Introducing the Maple Tree" The maple tree is an RCU-safe range based B-tree designed to use modern processor cache efficiently. There are a number of places in the kernel that a non-overlapping range-based tree would be beneficial, especially one with a simple interface. If you use an rbtree with other data structures to improve performance or an interval tree to track non-overlapping ranges, then this is for you. The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf nodes. With the increased branching factor, it is significantly shorter than the rbtree so it has fewer cache misses. The removal of the linked list between subsequent entries also reduces the cache misses and the need to pull in the previous and next VMA during many tree alterations. The first user that is covered in this patch set is the vm_area_struct, where three data structures are replaced by the maple tree: the augmented rbtree, the vma cache, and the linked list of VMAs in the mm_struct. The long term goal is to reduce or remove the mmap_lock contention. The plan is to get to the point where we use the maple tree in RCU mode. Readers will not block for writers. A single write operation will be allowed at a time. A reader re-walks if stale data is encountered. VMAs would be RCU enabled and this mode would be entered once multiple tasks are using the mm_struct. Davidlor said : Yes I like the maple tree, and at this stage I don't think we can ask for : more from this series wrt the MM - albeit there seems to still be some : folks reporting breakage. Fundamentally I see Liam's work to (re)move : complexity out of the MM (not to say that the actual maple tree is not : complex) by consolidating the three complimentary data structures very : much worth it considering performance does not take a hit. This was very : much a turn off with the range locking approach, which worst case scenario : incurred in prohibitive overhead. Also as Liam and Matthew have : mentioned, RCU opens up a lot of nice performance opportunities, and in : addition academia[1] has shown outstanding scalability of address spaces : with the foundation of replacing the locked rbtree with RCU aware trees. A similar work has been discovered in the academic press https://pdos.csail.mit.edu/papers/rcuvm:asplos12.pdf Sheer coincidence. We designed our tree with the intention of solving the hardest problem first. Upon settling on a b-tree variant and a rough outline, we researched ranged based b-trees and RCU b-trees and did find that article. So it was nice to find reassurances that we were on the right path, but our design choice of using ranges made that paper unusable for us. This patch (of 70): The maple tree is an RCU-safe range based B-tree designed to use modern processor cache efficiently. There are a number of places in the kernel that a non-overlapping range-based tree would be beneficial, especially one with a simple interface. If you use an rbtree with other data structures to improve performance or an interval tree to track non-overlapping ranges, then this is for you. The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf nodes. With the increased branching factor, it is significantly shorter than the rbtree so it has fewer cache misses. The removal of the linked list between subsequent entries also reduces the cache misses and the need to pull in the previous and next VMA during many tree alterations. The first user that is covered in this patch set is the vm_area_struct, where three data structures are replaced by the maple tree: the augmented rbtree, the vma cache, and the linked list of VMAs in the mm_struct. The long term goal is to reduce or remove the mmap_lock contention. The plan is to get to the point where we use the maple tree in RCU mode. Readers will not block for writers. A single write operation will be allowed at a time. A reader re-walks if stale data is encountered. VMAs would be RCU enabled and this mode would be entered once multiple tasks are using the mm_struct. There is additional BUG_ON() calls added within the tree, most of which are in debug code. These will be replaced with a WARN_ON() call in the future. There is also additional BUG_ON() calls within the code which will also be reduced in number at a later date. These exist to catch things such as out-of-range accesses which would crash anyways. Link: https://lkml.kernel.org/r/20220906194824.2110408-1-Liam.Howlett@oracle.com Link: https://lkml.kernel.org/r/20220906194824.2110408-2-Liam.Howlett@oracle.com Signed-off-by:
Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by:
David Howells <dhowells@redhat.com> Tested-by:
Sven Schnelle <svens@linux.ibm.com> Tested-by:
Yu Zhao <yuzhao@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: SeongJae Park <sj@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
- Aug 18, 2022
-
-
Eric Lin authored
Signed-off-by:
Eric Lin <dslin1010@gmail.com> Link: https://lore.kernel.org/r/20220811091516.2107908-1-dslin1010@gmail.com Signed-off-by:
Jonathan Corbet <corbet@lwn.net>
-
- Jul 19, 2022
-
-
John Garry authored
Streaming DMA mapping involving an IOMMU may be much slower for larger total mapping size. This is because every IOMMU DMA mapping requires an IOVA to be allocated and freed. IOVA sizes above a certain limit are not cached, which can have a big impact on DMA mapping performance. Provide an API for device drivers to know this "optimal" limit, such that they may try to produce mapping which don't exceed it. Signed-off-by:
John Garry <john.garry@huawei.com> Reviewed-by:
Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by:
Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by:
Christoph Hellwig <hch@lst.de>
-
- Jul 15, 2022
-
-
Yury Norov authored
After moving gfp types out of gfp.h, we have to align MAINTAINERS and Docs, to avoid warnings like this: >> include/linux/gfp.h:1: warning: 'Page mobility and placement hints' not found >> include/linux/gfp.h:1: warning: 'Watermark modifiers' not found >> include/linux/gfp.h:1: warning: 'Reclaim modifiers' not found >> include/linux/gfp.h:1: warning: 'Useful GFP flag combinations' not found Signed-off-by:
Yury Norov <yury.norov@gmail.com>
-
- Jul 11, 2022
-
-
Matthew Wilcox (Oracle) authored
Some people read the documentation, perhaps. Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org>
-
- Jul 01, 2022
-
-
Masahiro Yamada authored
Adjust documents to the file moves made by commit cfc1d277 ("module: Move all into module/"). Thanks to Yanteng Si for helping me to update Documentation/translations/zh_CN/core-api/kernel-api.rst Signed-off-by:
Masahiro Yamada <masahiroy@kernel.org> Acked-by:
Yanteng Si <siyanteng@loongson.cn> Signed-off-by:
Luis Chamberlain <mcgrof@kernel.org>
-
- Jun 28, 2022
-
-
Arnd Bergmann authored
All architecture-independent users of virt_to_bus() and bus_to_virt() have been fixed to use the dma mapping interfaces or have been removed now. This means the definitions on most architectures, and the CONFIG_VIRT_TO_BUS symbol are now obsolete and can be removed. The only exceptions to this are a few network and scsi drivers for m68k Amiga and VME machines and ppc32 Macintosh. These drivers work correctly with the old interfaces and are probably not worth changing. On alpha and parisc, virt_to_bus() were still used in asm/floppy.h. alpha can use isa_virt_to_bus() like x86 does, and parisc can just open-code the virt_to_phys() here, as this is architecture specific code. I tried updating the bus-virt-phys-mapping.rst documentation, which started as an email from Linus to explain some details of the Linux-2.0 driver interfaces. The bits about virt_to_bus() were declared obsolete backin 2000, and the rest is not all that relevant any more, so in the end I just decided to remove the file completely. Reviewed-by:
Geert Uytterhoeven <geert@linux-m68k.org> Acked-by:
Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Acked-by: Helge Deller <deller@gmx.de> # parisc Signed-off-by:
Arnd Bergmann <arnd@arndb.de>
-
- Jun 27, 2022
-
-
Mike Rapoport authored
so it will be consistent with code mm directory and with Documentation/admin-guide/mm and won't be confused with virtual machines. Signed-off-by:
Mike Rapoport <rppt@linux.ibm.com> Suggested-by:
Matthew Wilcox <willy@infradead.org> Tested-by:
Ira Weiny <ira.weiny@intel.com> Acked-by:
Jonathan Corbet <corbet@lwn.net> Acked-by:
Wu XiangCheng <bobwxc@email.cn>
-
- Jun 07, 2022
-
-
Ira Weiny authored
The documentation for user space pkeys was a bit dated including things such as Amazon and distribution testing information which is irrelevant now. Update the documentation. This also streamlines adding the Supervisor pkey documentation later on. Signed-off-by:
Ira Weiny <ira.weiny@intel.com> Signed-off-by:
Dave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/20220419170649.1022246-2-ira.weiny@intel.com
-
- Apr 22, 2022
-
-
Randy Dunlap authored
Move watch_queue documentation to the core-api index and subdirectory. Fixes: c73be61c ("pipe: Add general notification queue support") Signed-off-by:
Randy Dunlap <rdunlap@infradead.org> Cc: David Howells <dhowells@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Signed-off-by:
Jonathan Corbet <corbet@lwn.net>
-
- Apr 14, 2022
-
-
Kurt Kanzenbach authored
Introduce fast/NMI safe accessor to clock tai for tracing. The Linux kernel tracing infrastructure has support for using different clocks to generate timestamps for trace events. Especially in TSN networks it's useful to have TAI as trace clock, because the application scheduling is done in accordance to the network time, which is based on TAI. With a tai trace_clock in place, it becomes very convenient to correlate network activity with Linux kernel application traces. Use the same implementation as ktime_get_boot_fast_ns() does by reading the monotonic time and adding the TAI offset. The same limitations as for the fast boot implementation apply. The TAI offset may change at run time e.g., by setting the time or using adjtimex() with an offset. However, these kind of offset changes are rare events. Nevertheless, the user has to be aware and deal with it in post processing. An alternative approach would be to use the same implementation as ktime_get_real_fast_ns() does. However, this requires to add an additional u64 member to the tk_read_base struct. This struct together with a seqcount is designed to fit into a single cache line on 64 bit architectures. Adding a new member would violate this constraint. Signed-off-by:
Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20220414091805.89667-2-kurt@linutronix.de
-
- Apr 13, 2022
-
-
Petr Mladek authored
Document the printk index feature. The primary motivation is to explain that it is not creating KABI from particular printk() calls. Acked-by:
Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by:
Chris Down <chris@chrisdown.name> Signed-off-by:
Petr Mladek <pmladek@suse.com>
-
- Mar 28, 2022
-
-
Linus Torvalds authored
Halil Pasic points out [1] that the full revert of that commit (revert in bddac7c1), and that a partial revert that only reverts the problematic case, but still keeps some of the cleanups is probably better.  And that partial revert [2] had already been verified by Oleksandr Natalenko to also fix the issue, I had just missed that in the long discussion. So let's reinstate the cleanups from commit aa6f8dcb ("swiotlb: rework "fix info leak with DMA_FROM_DEVICE""), and effectively only revert the part that caused problems. Link: https://lore.kernel.org/all/20220328013731.017ae3e3.pasic@linux.ibm.com/ [1] Link: https://lore.kernel.org/all/20220324055732.GB12078@lst.de/ [2] Link: https://lore.kernel.org/all/4386660.LvFx2qVVIh@natalenko.name/ [3] Suggested-by:
Halil Pasic <pasic@linux.ibm.com> Tested-by:
Oleksandr Natalenko <oleksandr@natalenko.name> Cc: Christoph Hellwig" <hch@lst.de> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Mar 26, 2022
-
-
Linus Torvalds authored
This reverts commit aa6f8dcb. It turns out this breaks at least the ath9k wireless driver, and possibly others. What the ath9k driver does on packet receive is to set up the DMA transfer with: int ath_rx_init(..) .. bf->bf_buf_addr = dma_map_single(sc->dev, skb->data, common->rx_bufsize, DMA_FROM_DEVICE); and then the receive logic (through ath_rx_tasklet()) will fetch incoming packets static bool ath_edma_get_buffers(..) .. dma_sync_single_for_cpu(sc->dev, bf->bf_buf_addr, common->rx_bufsize, DMA_FROM_DEVICE); ret = ath9k_hw_process_rxdesc_edma(ah, rs, skb->data); if (ret == -EINPROGRESS) { /*let device gain the buffer again*/ dma_sync_single_for_device(sc->dev, bf->bf_buf_addr, common->rx_bufsize, DMA_FROM_DEVICE); return false; } and it's worth noting how that first DMA sync: dma_sync_single_for_cpu(..DMA_FROM_DEVICE); is there to make sure the CPU can read the DMA buffer (possibly by copying it from the bounce buffer area, or by doing some cache flush). The iommu correctly turns that into a "copy from bounce bufer" so that the driver can look at the state of the packets. In the meantime, the device may continue to write to the DMA buffer, but we at least have a snapshot of the state due to that first DMA sync. But that _second_ DMA sync: dma_sync_single_for_device(..DMA_FROM_DEVICE); is telling the DMA mapping that the CPU wasn't interested in the area because the packet wasn't there. In the case of a DMA bounce buffer, that is a no-op. Note how it's not a sync for the CPU (the "for_device()" part), and it's not a sync for data written by the CPU (the "DMA_FROM_DEVICE" part). Or rather, it _should_ be a no-op. That's what commit aa6f8dcb broke: it made the code bounce the buffer unconditionally, and changed the DMA_FROM_DEVICE to just unconditionally and illogically be DMA_TO_DEVICE. [ Side note: purely within the confines of the swiotlb driver it wasn't entirely illogical: The reason it did that odd DMA_FROM_DEVICE -> DMA_TO_DEVICE conversion thing is because inside the swiotlb driver, it uses just a swiotlb_bounce() helper that doesn't care about the whole distinction of who the sync is for - only which direction to bounce. So it took the "sync for device" to mean that the CPU must have been the one writing, and thought it meant DMA_TO_DEVICE. ] Also note how the commentary in that commit was wrong, probably due to that whole confusion, claiming that the commit makes the swiotlb code "bounce unconditionally (that is, also when dir == DMA_TO_DEVICE) in order do avoid synchronising back stale data from the swiotlb buffer" which is nonsensical for two reasons: - that "also when dir == DMA_TO_DEVICE" is nonsensical, as that was exactly when it always did - and should do - the bounce. - since this is a sync for the device (not for the CPU), we're clearly fundamentally not coping back stale data from the bounce buffers at all, because we'd be copying *to* the bounce buffers. So that commit was just very confused. It confused the direction of the synchronization (to the device, not the cpu) with the direction of the DMA (from the device). Reported-and-bisected-by:
Oleksandr Natalenko <oleksandr@natalenko.name> Reported-by:
Olha Cherevyk <olha.cherevyk@gmail.com> Cc: Halil Pasic <pasic@linux.ibm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Kalle Valo <kvalo@kernel.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Toke Høiland-Jørgensen <toke@toke.dk> Cc: Maxime Bizon <mbizon@freebox.fr> Cc: Johannes Berg <johannes@sipsolutions.net> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Mar 22, 2022
-
-
NeilBrown authored
Add some "big-picture" documentation for read-ahead and polish the code to make it fit this documentation. The meaning of ->async_size is clarified to match its name. i.e. Any request to ->readahead() has a sync part and an async part. The caller will wait for the sync pages to complete, but will not wait for the async pages. The first async page is still marked PG_readahead Note that the current function names page_cache_sync_ra() and page_cache_async_ra() are misleading. All ra request are partly sync and partly async, so either part can be empty. A page_cache_sync_ra() request will usually set ->async_size non-zero, implying it is not all synchronous. When a non-zero req_count is passed to page_cache_async_ra(), the implication is that some prefix of the request is synchronous, though the calculation made there is incorrect - I haven't tried to fix it. Link: https://lkml.kernel.org/r/164549983734.9187.11586890887006601405.stgit@noble.brown Signed-off-by:
NeilBrown <neilb@suse.de> Cc: Anna Schumaker <Anna.Schumaker@Netapp.com> Cc: Chao Yu <chao@kernel.org> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jeff Layton <jlayton@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Lars Ellenberg <lars.ellenberg@linbit.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Paolo Valente <paolo.valente@linaro.org> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Mar 21, 2022
-
-
Matthew Wilcox (Oracle) authored
Move compound_pincount from the third page to the second page, which means it's available for all compound pages. That lets us delete hpage_pincount_available(). On 32-bit systems, there isn't enough space for both compound_pincount and compound_nr in the second page (it would collide with page->private, which is in use for pages in the swap cache), so revert the optimisation of storing both compound_order and compound_nr on 32-bit systems. Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by:
John Hubbard <jhubbard@nvidia.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Jason Gunthorpe <jgg@nvidia.com> Reviewed-by:
William Kucharski <william.kucharski@oracle.com>
-
- Mar 07, 2022
-
-
Halil Pasic authored
Unfortunately, we ended up merging an old version of the patch "fix info leak with DMA_FROM_DEVICE" instead of merging the latest one. Christoph (the swiotlb maintainer), he asked me to create an incremental fix (after I have pointed this out the mix up, and asked him for guidance). So here we go. The main differences between what we got and what was agreed are: * swiotlb_sync_single_for_device is also required to do an extra bounce * We decided not to introduce DMA_ATTR_OVERWRITE until we have exploiters * The implantation of DMA_ATTR_OVERWRITE is flawed: DMA_ATTR_OVERWRITE must take precedence over DMA_ATTR_SKIP_CPU_SYNC Thus this patch removes DMA_ATTR_OVERWRITE, and makes swiotlb_sync_single_for_device() bounce unconditionally (that is, also when dir == DMA_TO_DEVICE) in order do avoid synchronising back stale data from the swiotlb buffer. Let me note, that if the size used with dma_sync_* API is less than the size used with dma_[un]map_*, under certain circumstances we may still end up with swiotlb not being transparent. In that sense, this is no perfect fix either. To get this bullet proof, we would have to bounce the entire mapping/bounce buffer. For that we would have to figure out the starting address, and the size of the mapping in swiotlb_sync_single_for_device(). While this does seem possible, there seems to be no firm consensus on how things are supposed to work. Signed-off-by:
Halil Pasic <pasic@linux.ibm.com> Fixes: ddbd89de ("swiotlb: fix info leak with DMA_FROM_DEVICE") Cc: stable@vger.kernel.org Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Feb 14, 2022
-
-
Halil Pasic authored
The problem I'm addressing was discovered by the LTP test covering cve-2018-1000204. A short description of what happens follows: 1) The test case issues a command code 00 (TEST UNIT READY) via the SG_IO interface with: dxfer_len == 524288, dxdfer_dir == SG_DXFER_FROM_DEV and a corresponding dxferp. The peculiar thing about this is that TUR is not reading from the device. 2) In sg_start_req() the invocation of blk_rq_map_user() effectively bounces the user-space buffer. As if the device was to transfer into it. Since commit a45b599a ("scsi: sg: allocate with __GFP_ZERO in sg_build_indirect()") we make sure this first bounce buffer is allocated with GFP_ZERO. 3) For the rest of the story we keep ignoring that we have a TUR, so the device won't touch the buffer we prepare as if the we had a DMA_FROM_DEVICE type of situation. My setup uses a virtio-scsi device and the buffer allocated by SG is mapped by the function virtqueue_add_split() which uses DMA_FROM_DEVICE for the "in" sgs (here scatter-gather and not scsi generics). This mapping involves bouncing via the swiotlb (we need swiotlb to do virtio in protected guest like s390 Secure Execution, or AMD SEV). 4) When the SCSI TUR is done, we first copy back the content of the second (that is swiotlb) bounce buffer (which most likely contains some previous IO data), to the first bounce buffer, which contains all zeros. Then we copy back the content of the first bounce buffer to the user-space buffer. 5) The test case detects that the buffer, which it zero-initialized, ain't all zeros and fails. One can argue that this is an swiotlb problem, because without swiotlb we leak all zeros, and the swiotlb should be transparent in a sense that it does not affect the outcome (if all other participants are well behaved). Copying the content of the original buffer into the swiotlb buffer is the only way I can think of to make swiotlb transparent in such scenarios. So let's do just that if in doubt, but allow the driver to tell us that the whole mapped buffer is going to be overwritten, in which case we can preserve the old behavior and avoid the performance impact of the extra bounce. Signed-off-by:
Halil Pasic <pasic@linux.ibm.com> Signed-off-by:
Christoph Hellwig <hch@lst.de>
-
- Feb 03, 2022
-
-
Matthew Wilcox (Oracle) authored
It wasn't obvious to all readers that it's unsafe to reuse an xa_state after dropping the xas_lock() or the rcu_read_lock(). Reported-by:
Charan Teja Kalla <charante@codeaurora.org> Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org>
-
- Jan 27, 2022
-
-
Nicolas Saenz Julienne authored
The topic of nesting and reentrancy in the context of early entry code hasn't been addressed so far. So do it. Signed-off-by:
Nicolas Saenz Julienne <nsaenzju@redhat.com> Reviewed-by:
Frederic Weisbecker <frederic@kernel.org> Reviewed-by:
Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20220110105044.94423-2-nsaenzju@redhat.com Signed-off-by:
Jonathan Corbet <corbet@lwn.net>
-
Thomas Gleixner authored
The entry/exit handling for exceptions, interrupts, syscalls and KVM is not really documented except for some comments. Fill the gaps. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Co-developed-by:
Nicolas Saenz Julienne <nsaenzju@redhat.com> Signed-off-by:
Nicolas Saenz Julienne <nsaenzju@redhat.com> Reviewed-by:
Mark Rutland <mark.rutland@arm.com> Reviewed-by:
Paul E. McKenney <paulmck@kernel.org> ---- Changes since v3: - s/nointr/noinstr/ Changes since v2: - No big content changes, just style corrections, so it should be pretty clean at this stage. In the light of this, I kept Mark's Reviewed-by. - Paul's style and paragraph re-writes - Randy's style comments - Add links to transition type sections Documentation/core-api/entry.rst | 261 +++++++++++++++++++++++++++++++ Documentation/core-api/index.rst | 8 + 2 files changed, 269 insertions(+) create mode 100644 Documentation/core-api/entry.rst Reviewed-by:
Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20220110105044.94423-1-nsaenzju@redhat.com Signed-off-by:
Jonathan Corbet <corbet@lwn.net>
-
- Jan 07, 2022
-
-
Greg Kroah-Hartman authored
Since commit aa30f47c ("kobject: Add support for default attribute groups to kobj_type") we have been encouraging the use of default_groups instead of default_attrs, so reflect that information in the documentation as well so that no new users get added while the kernel is converted over to not use this field anymore. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20220104105024.1014313-1-gregkh@linuxfoundation.org Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- Dec 28, 2021
-
-
Greg Kroah-Hartman authored
There is no need to pass the pointer to the kset in the struct kset_uevent_ops callbacks as no one uses it, so just remove that pointer entirely. Reviewed-by:
Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by:
Wedson Almeida Filho <wedsonaf@google.com> Link: https://lore.kernel.org/r/20211227163924.3970661-1-gregkh@linuxfoundation.org Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- Dec 27, 2021
-
-
Wedson Almeida Filho authored
This way instances of kobj_type (which contain function pointers) can be stored in .rodata, which means that they cannot be [easily/accidentally] modified at runtime. Signed-off-by:
Wedson Almeida Filho <wedsonaf@google.com> Link: https://lore.kernel.org/r/20211224231345.777370-1-wedsonaf@google.com Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- Dec 16, 2021
-
-
Matthew Wilcox (Oracle) authored
Allow callers to iterate over each folio instead of each page. The bio need not have been constructed using folios originally. Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by:
Jens Axboe <axboe@kernel.dk> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Darrick J. Wong <djwong@kernel.org>
-
- Nov 29, 2021
-
-
Christoph Hellwig authored
All this code is tightly coupled to the blk-mq core, so move it there. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20211117061404.331732-4-hch@lst.de [axboe: remove doc generation for blk-exec.c] Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Nov 06, 2021
-
-
David Hildenbrand authored
We don't support CONFIG_MEMORY_HOTPLUG on 32 bit and consequently not HIGHMEM. Let's remove any leftover code -- including the unused "status_change_nid_high" field part of the memory notifier. Link: https://lkml.kernel.org/r/20210929143600.49379-5-david@redhat.com Signed-off-by:
David Hildenbrand <david@redhat.com> Reviewed-by:
Oscar Salvador <osalvador@suse.de> Cc: Alex Shi <alexs@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Nov 01, 2021
-
-
Petr Mladek authored
The commit 23efd080 ("vsprintf: Make %pGp print the hex value") changed the behavior of %pGp printk format. Update the documentation accordingly. Fixes: 23efd080 ("vsprintf: Make %pGp print the hex value") Reviewed-by:
Yafang Shao <laoar.shao@gmail.com> Signed-off-by:
Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/YXlKqCPY9suM4mfT@alley
-
- Oct 26, 2021
-
-
Mark Rutland authored
Now that entry code handles IRQ entry (including setting the IRQ regs) before calling irqchip code, irqchip code can safely call generic_handle_domain_irq(), and there's no functional reason for it to call handle_domain_irq(). Let's cement this split of responsibility and remove handle_domain_irq() entirely, updating irqchip drivers to call generic_handle_domain_irq(). For consistency, handle_domain_nmi() is similarly removed and replaced with a generic_handle_domain_nmi() function which also does not perform any entry logic. Previously handle_domain_{irq,nmi}() had a WARN_ON() which would fire when they were called in an inappropriate context. So that we can identify similar issues going forward, similar WARN_ON_ONCE() logic is added to the generic_handle_*() functions, and comments are updated for clarity and consistency. Signed-off-by:
Mark Rutland <mark.rutland@arm.com> Reviewed-by:
Marc Zyngier <maz@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de>
-
- Oct 25, 2021
-
-
Boqun Feng authored
The current doc of workqueue API suggests that work items are non-reentrant: any work item is guaranteed to be executed by at most one worker system-wide at any given time. However this is not true, the following case can cause a work item W executed by two workers at the same time: queue_work_on(0, WQ1, W); // after a worker picks up W and clear the pending bit queue_work_on(1, WQ2, W); // workers on CPU0 and CPU1 will execute W in the same time. , which means the non-reentrance of a work item is conditional, and Lai Jiangshan provided a nice summary[1] of the conditions, therefore use it to describe a work item instance and improve the doc. [1]: https://lore.kernel.org/lkml/CAJhGHyDudet_xyNk=8xnuO2==o-u06s0E0GZVP4Q67nmQ84Ceg@mail.gmail.com/ Suggested-by:
Matthew Wilcox <willy@infradead.org> Suggested-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Boqun Feng <boqun.feng@gmail.com> Signed-off-by:
Tejun Heo <tj@kernel.org>
-