- Sep 20, 2022
-
-
Bob Peterson authored
Before this patch, the gfs2 file system was registered prior to creating the three workqueues. In some cases this allowed dlm to send recovery work to a workqueue that did not yet exist because gfs2 was still initializing. This patch changes the order of gfs2's initialization routine so it only registers the file system after the work queues are created. Signed-off-by:
Bob Peterson <rpeterso@redhat.com> Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
Andrew Price authored
Fuzzers like to scribble over sb_bsize_shift but in reality it's very unlikely that this field would be corrupted on its own. Nevertheless it should be checked to avoid the possibility of messy mount errors due to bad calculations. It's always a fixed value based on the block size so we can just check that it's the expected value. Tested with: mkfs.gfs2 -O -p lock_nolock /dev/vdb for i in 0 -1 64 65 32 33; do gfs2_edit -p sb field sb_bsize_shift $i /dev/vdb mount /dev/vdb /mnt/test && umount /mnt/test done Before this patch we get a withdraw after [ 76.413681] gfs2: fsid=loop0.0: fatal: invalid metadata block [ 76.413681] bh = 19 (type: exp=5, found=4) [ 76.413681] function = gfs2_meta_buffer, file = fs/gfs2/meta_io.c, line = 492 and with UBSAN configured we also get complaints like [ 76.373395] UBSAN: shift-out-of-bounds in fs/gfs2/ops_fstype.c:295:19 [ 76.373815] shift exponent 4294967287 is too large for 64-bit type 'long unsigned int' After the patch, these complaints don't appear, mount fails immediately and we get an explanation in dmesg. Reported-by:
<syzbot+dcf33a7aae997956fe06@syzkaller.appspotmail.com> Signed-off-by:
Andrew Price <anprice@redhat.com> Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
- Sep 12, 2022
-
-
Zhang Yi authored
ll_rw_block() is not safe for the sync read path because it cannot guarantee that always submitting read IO if the buffer has been locked, so stop using it. We also switch to new bh_readahead() helper for the readahead path. Link: https://lkml.kernel.org/r/20220901133505.2510834-5-yi.zhang@huawei.com Signed-off-by:
Zhang Yi <yi.zhang@huawei.com> Reviewed-by:
Jan Kara <jack@suse.cz> Reviewed-by:
Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
- Aug 26, 2022
-
-
Andreas Gruenbacher authored
Switch from strlcpy to strscpy and make sure that @count is the size of the smaller of the source and destination buffers. This prevents reading beyond the end of the source buffer when the source string isn't null terminated. Found by a modified version of syzkaller. Suggested-by:
Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
- Aug 25, 2022
-
-
Bob Peterson authored
There are a couple places in function do_xmote where normal processing is circumvented due to withdraws in progress. However, since we bypass most of do_xmote() we bypass telling dlm to lock the dlm lock, which means dlm will never respond with a completion callback. Since the completion callback ordinarily clears GLF_LOCK, this patch changes function do_xmote to handle those situations more gracefully so the file system may be unmounted after withdraw. A very similar situation happens with the GLF_DEMOTE_IN_PROGRESS flag, which is cleared by function finish_xmote(). Since the withdraw causes us to skip the majority of do_xmote, it therefore also skips the call to finish_xmote() so the DEMOTE_IN_PROGRESS flag needs to be cleared manually. Signed-off-by:
Bob Peterson <rpeterso@redhat.com> Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
Bob Peterson authored
When a withdraw occurs, ordinary (not system) glocks may not be granted anymore. Later, when the file system is unmounted, gfs2_gl_hash_clear() tries to clear out all the glocks, but these un-grantable pending waiters prevent some glocks from being freed. So the unmount hangs, at least for its ten-minute timeout period. This patch takes measures to remove any pending waiters from the glocks that will never be granted. This allows the unmount to proceed in a reasonable period of time. Signed-off-by:
Bob Peterson <rpeterso@redhat.com> Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
Bob Peterson authored
When a gfs2 file system is withdrawn it does iput on its journal to allow recovery from another cluster node. If it's unable to get a replacement inode for whatever reason, the journal descriptor would still be pointing at the evicted inode. So when unmount clears out the list of journals, it would do a second iput referencing the pointer. To avoid this, set the inode pointer to NULL. Signed-off-by:
Bob Peterson <rpeterso@redhat.com> Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
Bob Peterson authored
Before this patch, delete_work_func() would check for the GLF_DEMOTE flag on the iopen glock and if set, it would perform special processing. However, there was a race whereby the GLF_DEMOTE flag could be set by another process after the check. Then when it called gfs2_lookup_by_inum() which calls gfs2_inode_lookup(), it tried to lock the iopen glock in SH mode, but the GLF_DEMOTE flag prevented the request from being granted. But the iopen glock could never be demoted because that happens when the inode is evicted, and the evict was never completed because of the failed lookup. To fix that, change function gfs2_inode_lookup() so that when GFS2_BLKST_UNLINKED inodes are searched, it uses the LM_FLAG_TRY flag for the iopen glock. If the locking request fails, fail gfs2_inode_lookup() with -EAGAIN so that delete_work_func() can retry the operation later. Signed-off-by:
Bob Peterson <rpeterso@redhat.com> Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
- Aug 23, 2022
-
-
Alexander Aring authored
The DLM_LSFL_FS flag is set in lockspaces created directly for a kernel user, as opposed to those lockspaces created for user space applications. The user space libdlm allowed this flag to be set for lockspaces created from user space, but then used by a kernel user. No kernel user has ever used this method, so remove the ability to do it. Signed-off-by:
Alexander Aring <aahringo@redhat.com> Signed-off-by:
David Teigland <teigland@redhat.com>
-
- Aug 17, 2022
-
-
Al Viro authored
filldir_t instances (directory iterators callbacks) used to return 0 for "OK, keep going" or -E... for "stop". Note that it's *NOT* how the error values are reported - the rules for those are callback-dependent and ->iterate{,_shared}() instances only care about zero vs. non-zero (look at emit_dir() and friends). So let's just return bool ("should we keep going?") - it's less confusing that way. The choice between "true means keep going" and "true means stop" is bikesheddable; we have two groups of callbacks - do something for everything in directory, until we run into problem and find an entry in directory and do something to it. The former tended to use 0/-E... conventions - -E<something> on failure. The latter tended to use 0/1, 1 being "stop, we are done". The callers treated anything non-zero as "stop", ignoring which non-zero value did they get. "true means stop" would be more natural for the second group; "true means keep going" - for the first one. I tried both variants and the things like if allocation failed something = -ENOMEM; return true; just looked unnatural and asking for trouble. [folded suggestion from Matthew Wilcox <willy@infradead.org>] Acked-by:
Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- Aug 09, 2022
-
-
Al Viro authored
Equivalent of single-segment iovec. Initialized by iov_iter_ubuf(), checked for by iter_is_ubuf(), otherwise behaves like ITER_IOVEC ones. We are going to expose the things like ->write_iter() et.al. to those in subsequent commits. New predicate (user_backed_iter()) that is true for ITER_IOVEC and ITER_UBUF; places like direct-IO handling should use that for checking that pages we modify after getting them from iov_iter_get_pages() would need to be dirtied. DO NOT assume that replacing iter_is_iovec() with user_backed_iter() will solve all problems - there's code that uses iter_is_iovec() to decide how to poke around in iov_iter guts and for that the predicate replacement obviously won't suffice. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- Aug 02, 2022
-
-
Matthew Wilcox (Oracle) authored
There is nothing iomap-specific about iomap_migratepage(), and it fits a pattern used by several other filesystems, so move it to mm/migrate.c, convert it to be filemap_migrate_folio() and convert the iomap filesystems to use it. Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Darrick J. Wong <djwong@kernel.org>
-
Matthew Wilcox (Oracle) authored
Use folio_put_refs() to perform only one atomic operation instead of two. The other changes are straightforward conversions from page APIs to their folio equivalents. Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org>
-
- Jul 22, 2022
-
-
Christoph Hellwig authored
->writepage is only used for single page writeback from memory reclaim, and not called at all for cgroup writeback. Follow the lead of XFS and remove ->writepage and rely entirely on ->writepages. Signed-off-by:
Christoph Hellwig <hch@lst.de> Tested-by:
Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by:
Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by:
Darrick J. Wong <djwong@kernel.org> Signed-off-by:
Darrick J. Wong <djwong@kernel.org>
-
Christoph Hellwig authored
Use filemap_fdatawrite_wbc instead of generic_writepages in gfs2_ail1_start_one so that the functin can also cope with address_space operations that only implement ->writepages and to properly account for cgroup writeback. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by:
Darrick J. Wong <djwong@kernel.org> Signed-off-by:
Darrick J. Wong <djwong@kernel.org>
-
- Jul 14, 2022
-
-
Bart Van Assche authored
Improve static type checking by using the enum req_op type for variables that represent a request operation and the new blk_opf_t type for variables that represent request flags. Combine the first two gfs2_submit_bhs() arguments into a single argument. Reviewed-by:
Andreas Gruenbacher <agruenba@redhat.com> Cc: Bob Peterson <rpeterso@redhat.com> Signed-off-by:
Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-54-bvanassche@acm.org Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Bart Van Assche authored
Both submit_bh() and ll_rw_block() accept a request operation type and request flags as their first two arguments. Micro-optimize these two functions by combining these first two arguments into a single argument. This patch does not change the behavior of any of the modified code. Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jan Kara <jack@suse.cz> Acked-by: Song Liu <song@kernel.org> (for the md changes) Signed-off-by:
Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-48-bvanassche@acm.org Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jul 04, 2022
-
-
Roman Gushchin authored
Currently shrinkers are anonymous objects. For debugging purposes they can be identified by count/scan function names, but it's not always useful: e.g. for superblock's shrinkers it's nice to have at least an idea of to which superblock the shrinker belongs. This commit adds names to shrinkers. register_shrinker() and prealloc_shrinker() functions are extended to take a format and arguments to master a name. In some cases it's not possible to determine a good name at the time when a shrinker is allocated. For such cases shrinker_debugfs_rename() is provided. The expected format is: <subsystem>-<shrinker_type>[:<instance>]-<id> For some shrinkers an instance can be encoded as (MAJOR:MINOR) pair. After this change the shrinker debugfs directory looks like: $ cd /sys/kernel/debug/shrinker/ $ ls dquota-cache-16 sb-devpts-28 sb-proc-47 sb-tmpfs-42 mm-shadow-18 sb-devtmpfs-5 sb-proc-48 sb-tmpfs-43 mm-zspool:zram0-34 sb-hugetlbfs-17 sb-pstore-31 sb-tmpfs-44 rcu-kfree-0 sb-hugetlbfs-33 sb-rootfs-2 sb-tmpfs-49 sb-aio-20 sb-iomem-12 sb-securityfs-6 sb-tracefs-13 sb-anon_inodefs-15 sb-mqueue-21 sb-selinuxfs-22 sb-xfs:vda1-36 sb-bdev-3 sb-nsfs-4 sb-sockfs-8 sb-zsmalloc-19 sb-bpf-32 sb-pipefs-14 sb-sysfs-26 thp-deferred_split-10 sb-btrfs:vda2-24 sb-proc-25 sb-tmpfs-1 thp-zero-9 sb-cgroup2-30 sb-proc-39 sb-tmpfs-27 xfs-buf:vda1-37 sb-configfs-23 sb-proc-41 sb-tmpfs-29 xfs-inodegc:vda1-38 sb-dax-11 sb-proc-45 sb-tmpfs-35 sb-debugfs-7 sb-proc-46 sb-tmpfs-40 [roman.gushchin@linux.dev: fix build warnings] Link: https://lkml.kernel.org/r/Yr+ZTnLb9lJk6fJO@castle Reported-by:
kernel test robot <lkp@intel.com> Link: https://lkml.kernel.org/r/20220601032227.4076670-4-roman.gushchin@linux.dev Signed-off-by:
Roman Gushchin <roman.gushchin@linux.dev> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Dave Chinner <dchinner@redhat.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
- Jun 29, 2022
-
-
Andreas Gruenbacher authored
In do_promote(), we're never removing the current entry from the list and so the list traversal is actually safe. Switch back to list_for_each_entry(). Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
Bob Peterson authored
In do_promote(), when the glock had no strong holders, we were accidentally calling demote_incompat_holders() with new_gh == NULL, so no weak holders were considered incompatible. Instead, the new holder should have been passed in. For doing that, the HIF_HOLDER flag needs to be set in new_gh to prevent may_grant() from complaining. This means that the new holder will now be recognized as a current holder, so skip over it explicitly in demote_incompat_holders() to prevent it from being dequeued. To further clarify things, we can now rename new_gh to current_gh in demote_incompat_holders(); after all, the HIF_HOLDER flag is already set, which means the new holder is already a current holder. Signed-off-by:
Bob Peterson <rpeterso@redhat.com> Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
Andreas Gruenbacher authored
In do_promote() and add_to_queue(), use current_gh as the variable name for the first strong holder we could find: this matches the variable name is may_grant(), and more clearly indicates that we're interested in one (any) of the current strong holders. Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
Andreas Gruenbacher authored
Make go_instantiate take a glock instead of a glock holder as its argument: this handler is supposed to instantiate the object associated with the glock. Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
Andreas Gruenbacher authored
Right now, inode_go_instantiate() contains functionality that relates to how a glock is held rather than the glock itself, like waiting for pending direct I/O to complete and completing interrupted truncates. This code is meant to be run each time a holder is acquired, but go_instantiate is actually only called once, when the glock is instantiated. To fix that, introduce a new go_held glock operation that is called each time a glock holder is acquired. Move the holder specific code in inode_go_instantiate() over to inode_go_held(). Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
Andreas Gruenbacher authored
Now that interrupted truncates are completed in the context of the process taking the glock, there is no need for the glock state engine to delegate that task to gfs2_quotad or for quotad to perform those truncates anymore. Get rid of the obsolete associated infrastructure. Reverts commit 813e0c46 ("GFS2: Fix "truncate in progress" hang"). Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by:
Bob Peterson <rpeterso@redhat.com>
-
Andreas Gruenbacher authored
Instantiate glocks outside of the glock state engine: there is no real reason for instantiating them inside the glock state engine; it only complicates the code. Instead, instantiate them in gfs2_glock_wait() and gfs2_glock_async_wait() using the new gfs2_glock_holder_ready() helper. On top of that, the only other place that acquires a glock without using gfs2_glock_wait() or gfs2_glock_async_wait() is gfs2_upgrade_iopen_glock(), so call gfs2_glock_holder_ready() there as well. If a dinode has a pending truncate, the glock-specific instantiate function for inodes wakes up the truncate function in the quota daemon. Waiting for the completion of the truncate was previously done by the glock state engine, but we now need to wait in inode_go_instantiate(). This also means that gfs2_instantiate() will now no longer return any "special" error codes. Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
Andreas Gruenbacher authored
Since commit 1fc05c8d ("gfs2: cancel timed-out glock requests"), a pending locking request can be canceled by calling gfs2_glock_dq() on the pending holder. In gfs2_glock_async_wait(), when we time out, use that to cancel the remaining locking requests and dequeue the locking requests already granted. That's simpler as well as more efficient than waiting for all locking requests to eventually be granted and dequeuing them then. In addition, gfs2_glock_async_wait() promises that by the time the function completes, all glocks are either granted or dequeued, but the implementation doesn't keep that promise if individual locking requests fail. Fix that as well. Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
Andreas Gruenbacher authored
Add the GL_NOPID flag for the remaining glock holders which are not associated with the current process. Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
Andreas Gruenbacher authored
Add the GL_NOPID flag for flock glock holders. Clean up the flag setting code in do_flock. Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
Andreas Gruenbacher authored
Add a GL_NOPID flag to indicate that once a glock holder has been acquired, it won't be associated with the current process anymore. This is useful for iopen and flock glocks which are associated with open files, as well as journal glock holders and similar which are associated with the filesystem. Once GL_NOPID is used for all applicable glocks (see the next patches), processes will no longer be falsely reported as holding glocks which they are not actually holding in the glocks dump file. Unlike before, when a process is reported as having "(ended)", this will indicate an actual bug. Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
Andreas Gruenbacher authored
Include flock glocks in the "glockfd" debugfs file. Those are similar to the iopen glocks; while an open file is holding an flock, it is holding the file's flock glock. We cannot take f_fl_mutex in gfs2_glockfd_seq_show_flock() or else dumping the "glockfd" file would block on flock operations. Instead, use the file->f_lock spin lock to protect the f_fl_gh.gh_gl glock pointer. Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
Andreas Gruenbacher authored
When a process has a gfs2 file open, the file is keeping a reference on the underlying gfs2 inode, and the inode is keeping the inode's iopen glock held in shared mode. In other words, the process depends on the iopen glock of each open gfs2 file. Expose those dependencies in a new "glockfd" debugfs file. The new debugfs file contains one line for each gfs2 file descriptor, specifying the tgid, file descriptor number, and glock name, e.g., 1601 6 5/816d This list is compiled by iterating all tasks on the system using find_ge_pid(), and all file descriptors of each task using task_lookup_next_fd_rcu(). To make that work from gfs2, export those two functions. Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
- Jun 28, 2022
-
-
Andreas Gruenbacher authored
Add state and flags arguments to gfs2_rlist_alloc() to make it somewhat more obvious which state and flags an rlist uses. With that, stop knocking off flags in gfs2_glock_nq_m() and its nq_m_sync() helper that are never set in the first place. Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
- Jun 23, 2022
-
-
Zhang Jiaming authored
Change 'accomodate' to 'accommodate'. Signed-off-by:
Zhang Jiaming <jiaming@nfschina.com> Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
- Jun 09, 2022
-
-
Bob Peterson authored
Rewrap the comment to keep the line length below 80 characters. Signed-off-by:
Bob Peterson <rpeterso@redhat.com> Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
- Jun 03, 2022
-
-
Minghao Chi authored
kfree on NULL pointer is a no-op. Reported-by:
Zeal Robot <zealci@zte.com.cn> Signed-off-by:
Minghao Chi <chi.minghao@zte.com.cn> Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
- May 24, 2022
-
-
Bob Peterson authored
Before this patch, function bh_get used block_map to figure out the block it needed to read in from the quota_change file. This patch changes it to use iomap directly to make it more efficient. Signed-off-by:
Bob Peterson <rpeterso@redhat.com> Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
Bob Peterson authored
Before this patch, functions gfs2_qa_get and _put used the i_rw_mutex to prevent simultaneous access to its i_qadata. But i_rw_mutex is now used for many other things, including iomap_begin and end, which causes a conflict according to lockdep. We cannot just remove the lock since simultaneous opens (gfs2_open -> gfs2_open_common -> gfs2_qa_get) can then stomp on each others values for i_qadata. This patch solves the conflict by using the i_lock spin_lock in the inode to prevent simultaneous access. Signed-off-by:
Bob Peterson <rpeterso@redhat.com> Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
Andrew Price authored
The bug that 27ca8273 ("gfs2: Make sure FITRIM minlen is rounded up to fs block size") fixes was a little confusing as the user saw "Input/output error" which masked the -EINVAL that sb_issue_discard() returned. sb_issue_discard() can fail for various reasons, so we should return its return value from gfs2_rgrp_send_discards() to avoid all errors being reported as IO errors. This improves error reporting for FITRIM and makes no difference to the -o discard code path because the return value from gfs2_rgrp_send_discards() gets thrown away in that case (and the option switches off). Presumably that's why it was ok to just return -EIO in the past, before FITRIM was implemented. Tested with xfstests. Signed-off-by:
Andrew Price <anprice@redhat.com> Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
Kees Cook authored
Clang's structure layout randomization feature gets upset when it sees struct address_space (which is randomized) cast to struct gfs2_glock. This is due to seeing the mapping pointer as being treated as an array of gfs2_glock, rather than "something else, before struct address_space": In file included from fs/gfs2/acl.c:23: fs/gfs2/meta_io.h:44:12: error: casting from randomized structure pointer type 'struct address_space *' to 'struct gfs2_glock *' return (((struct gfs2_glock *)mapping) - 1)->gl_name.ln_sbd; ^ Replace the instances of open-coded pointer math with container_of() usage, and update the allocator to match. Some cleanups and conversion of gfs2_glock_get() and gfs2_glock_dealloc() by Andreas. Reported-by:
kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/lkml/202205041550.naKxwCBj-lkp@intel.com Cc: Bob Peterson <rpeterso@redhat.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Bill Wendling <morbo@google.com> Cc: cluster-devel@redhat.com Signed-off-by:
Kees Cook <keescook@chromium.org> Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-
Andreas Gruenbacher authored
Add some comments explaining the oddities of partial direct I/O reads and writes. Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com>
-