Skip to content
Snippets Groups Projects
  1. Sep 20, 2022
    • Bob Peterson's avatar
      gfs2: Register fs after creating workqueues · 74b1b10e
      Bob Peterson authored
      
      Before this patch, the gfs2 file system was registered prior to creating
      the three workqueues. In some cases this allowed dlm to send recovery
      work to a workqueue that did not yet exist because gfs2 was still
      initializing.
      
      This patch changes the order of gfs2's initialization routine so it only
      registers the file system after the work queues are created.
      
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      74b1b10e
    • Andrew Price's avatar
      gfs2: Check sb_bsize_shift after reading superblock · 670f8ce5
      Andrew Price authored
      
      Fuzzers like to scribble over sb_bsize_shift but in reality it's very
      unlikely that this field would be corrupted on its own. Nevertheless it
      should be checked to avoid the possibility of messy mount errors due to
      bad calculations. It's always a fixed value based on the block size so
      we can just check that it's the expected value.
      
      Tested with:
      
          mkfs.gfs2 -O -p lock_nolock /dev/vdb
          for i in 0 -1 64 65 32 33; do
              gfs2_edit -p sb field sb_bsize_shift $i /dev/vdb
              mount /dev/vdb /mnt/test && umount /mnt/test
          done
      
      Before this patch we get a withdraw after
      
      [   76.413681] gfs2: fsid=loop0.0: fatal: invalid metadata block
      [   76.413681]   bh = 19 (type: exp=5, found=4)
      [   76.413681]   function = gfs2_meta_buffer, file = fs/gfs2/meta_io.c, line = 492
      
      and with UBSAN configured we also get complaints like
      
      [   76.373395] UBSAN: shift-out-of-bounds in fs/gfs2/ops_fstype.c:295:19
      [   76.373815] shift exponent 4294967287 is too large for 64-bit type 'long unsigned int'
      
      After the patch, these complaints don't appear, mount fails immediately
      and we get an explanation in dmesg.
      
      Reported-by: default avatar <syzbot+dcf33a7aae997956fe06@syzkaller.appspotmail.com>
      Signed-off-by: default avatarAndrew Price <anprice@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      670f8ce5
  2. Sep 12, 2022
  3. Aug 26, 2022
  4. Aug 25, 2022
    • Bob Peterson's avatar
      gfs2: Clear flags when withdraw prevents xmote · 86934198
      Bob Peterson authored
      
      There are a couple places in function do_xmote where normal processing
      is circumvented due to withdraws in progress. However, since we bypass
      most of do_xmote() we bypass telling dlm to lock the dlm lock, which
      means dlm will never respond with a completion callback. Since the
      completion callback ordinarily clears GLF_LOCK, this patch changes
      function do_xmote to handle those situations more gracefully so the
      file system may be unmounted after withdraw.
      
      A very similar situation happens with the GLF_DEMOTE_IN_PROGRESS flag,
      which is cleared by function finish_xmote(). Since the withdraw causes
      us to skip the majority of do_xmote, it therefore also skips the call
      to finish_xmote() so the DEMOTE_IN_PROGRESS flag needs to be cleared
      manually.
      
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      86934198
    • Bob Peterson's avatar
      gfs2: Dequeue waiters when withdrawn · 053640a7
      Bob Peterson authored
      
      When a withdraw occurs, ordinary (not system) glocks may not be granted
      anymore. Later, when the file system is unmounted, gfs2_gl_hash_clear()
      tries to clear out all the glocks, but these un-grantable pending
      waiters prevent some glocks from being freed. So the unmount hangs, at
      least for its ten-minute timeout period.
      
      This patch takes measures to remove any pending waiters from
      the glocks that will never be granted. This allows the unmount to
      proceed in a reasonable period of time.
      
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      053640a7
    • Bob Peterson's avatar
      gfs2: Prevent double iput for journal on error · 04133b60
      Bob Peterson authored
      
      When a gfs2 file system is withdrawn it does iput on its journal to
      allow recovery from another cluster node. If it's unable to get a
      replacement inode for whatever reason, the journal descriptor would
      still be pointing at the evicted inode. So when unmount clears out the
      list of journals, it would do a second iput referencing the pointer.
      To avoid this, set the inode pointer to NULL.
      
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      04133b60
    • Bob Peterson's avatar
      gfs2: Use TRY lock in gfs2_inode_lookup for UNLINKED inodes · c412a97c
      Bob Peterson authored
      
      Before this patch, delete_work_func() would check for the GLF_DEMOTE
      flag on the iopen glock and if set, it would perform special processing.
      However, there was a race whereby the GLF_DEMOTE flag could be set by
      another process after the check. Then when it called
      gfs2_lookup_by_inum() which calls gfs2_inode_lookup(), it tried to lock
      the iopen glock in SH mode, but the GLF_DEMOTE flag prevented the
      request from being granted. But the iopen glock could never be demoted
      because that happens when the inode is evicted, and the evict was never
      completed because of the failed lookup.
      
      To fix that, change function gfs2_inode_lookup() so that when
      GFS2_BLKST_UNLINKED inodes are searched, it uses the LM_FLAG_TRY flag
      for the iopen glock.  If the locking request fails, fail
      gfs2_inode_lookup() with -EAGAIN so that delete_work_func() can retry
      the operation later.
      
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      c412a97c
  5. Aug 23, 2022
  6. Aug 17, 2022
    • Al Viro's avatar
      Change calling conventions for filldir_t · 25885a35
      Al Viro authored
      
      filldir_t instances (directory iterators callbacks) used to return 0 for
      "OK, keep going" or -E... for "stop".  Note that it's *NOT* how the
      error values are reported - the rules for those are callback-dependent
      and ->iterate{,_shared}() instances only care about zero vs. non-zero
      (look at emit_dir() and friends).
      
      So let's just return bool ("should we keep going?") - it's less confusing
      that way.  The choice between "true means keep going" and "true means
      stop" is bikesheddable; we have two groups of callbacks -
      	do something for everything in directory, until we run into problem
      and
      	find an entry in directory and do something to it.
      
      The former tended to use 0/-E... conventions - -E<something> on failure.
      The latter tended to use 0/1, 1 being "stop, we are done".
      The callers treated anything non-zero as "stop", ignoring which
      non-zero value did they get.
      
      "true means stop" would be more natural for the second group; "true
      means keep going" - for the first one.  I tried both variants and
      the things like
      	if allocation failed
      		something = -ENOMEM;
      		return true;
      just looked unnatural and asking for trouble.
      
      [folded suggestion from Matthew Wilcox <willy@infradead.org>]
      Acked-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      25885a35
  7. Aug 09, 2022
    • Al Viro's avatar
      new iov_iter flavour - ITER_UBUF · fcb14cb1
      Al Viro authored
      
      Equivalent of single-segment iovec.  Initialized by iov_iter_ubuf(),
      checked for by iter_is_ubuf(), otherwise behaves like ITER_IOVEC
      ones.
      
      We are going to expose the things like ->write_iter() et.al. to those
      in subsequent commits.
      
      New predicate (user_backed_iter()) that is true for ITER_IOVEC and
      ITER_UBUF; places like direct-IO handling should use that for
      checking that pages we modify after getting them from iov_iter_get_pages()
      would need to be dirtied.
      
      DO NOT assume that replacing iter_is_iovec() with user_backed_iter()
      will solve all problems - there's code that uses iter_is_iovec() to
      decide how to poke around in iov_iter guts and for that the predicate
      replacement obviously won't suffice.
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      fcb14cb1
  8. Aug 02, 2022
  9. Jul 22, 2022
  10. Jul 14, 2022
  11. Jul 04, 2022
    • Roman Gushchin's avatar
      mm: shrinkers: provide shrinkers with names · e33c267a
      Roman Gushchin authored
      Currently shrinkers are anonymous objects.  For debugging purposes they
      can be identified by count/scan function names, but it's not always
      useful: e.g.  for superblock's shrinkers it's nice to have at least an
      idea of to which superblock the shrinker belongs.
      
      This commit adds names to shrinkers.  register_shrinker() and
      prealloc_shrinker() functions are extended to take a format and arguments
      to master a name.
      
      In some cases it's not possible to determine a good name at the time when
      a shrinker is allocated.  For such cases shrinker_debugfs_rename() is
      provided.
      
      The expected format is:
          <subsystem>-<shrinker_type>[:<instance>]-<id>
      For some shrinkers an instance can be encoded as (MAJOR:MINOR) pair.
      
      After this change the shrinker debugfs directory looks like:
        $ cd /sys/kernel/debug/shrinker/
        $ ls
          dquota-cache-16     sb-devpts-28     sb-proc-47       sb-tmpfs-42
          mm-shadow-18        sb-devtmpfs-5    sb-proc-48       sb-tmpfs-43
          mm-zspool:zram0-34  sb-hugetlbfs-17  sb-pstore-31     sb-tmpfs-44
          rcu-kfree-0         sb-hugetlbfs-33  sb-rootfs-2      sb-tmpfs-49
          sb-aio-20           sb-iomem-12      sb-securityfs-6  sb-tracefs-13
          sb-anon_inodefs-15  sb-mqueue-21     sb-selinuxfs-22  sb-xfs:vda1-36
          sb-bdev-3           sb-nsfs-4        sb-sockfs-8      sb-zsmalloc-19
          sb-bpf-32           sb-pipefs-14     sb-sysfs-26      thp-deferred_split-10
          sb-btrfs:vda2-24    sb-proc-25       sb-tmpfs-1       thp-zero-9
          sb-cgroup2-30       sb-proc-39       sb-tmpfs-27      xfs-buf:vda1-37
          sb-configfs-23      sb-proc-41       sb-tmpfs-29      xfs-inodegc:vda1-38
          sb-dax-11           sb-proc-45       sb-tmpfs-35
          sb-debugfs-7        sb-proc-46       sb-tmpfs-40
      
      [roman.gushchin@linux.dev: fix build warnings]
        Link: https://lkml.kernel.org/r/Yr+ZTnLb9lJk6fJO@castle
      
      
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Link: https://lkml.kernel.org/r/20220601032227.4076670-4-roman.gushchin@linux.dev
      
      
      Signed-off-by: default avatarRoman Gushchin <roman.gushchin@linux.dev>
      Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e33c267a
  12. Jun 29, 2022
    • Andreas Gruenbacher's avatar
      gfs2: List traversal in do_promote is safe · 6feaec81
      Andreas Gruenbacher authored
      
      In do_promote(), we're never removing the current entry from the list
      and so the list traversal is actually safe.  Switch back to
      list_for_each_entry().
      
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      6feaec81
    • Bob Peterson's avatar
      gfs2: do_promote glock holder stealing fix · 0befb851
      Bob Peterson authored
      
      In do_promote(), when the glock had no strong holders, we were
      accidentally calling demote_incompat_holders() with new_gh == NULL, so
      no weak holders were considered incompatible.  Instead, the new holder
      should have been passed in.
      
      For doing that, the HIF_HOLDER flag needs to be set in new_gh to prevent
      may_grant() from complaining.  This means that the new holder will now
      be recognized as a current holder, so skip over it explicitly in
      demote_incompat_holders() to prevent it from being dequeued.
      
      To further clarify things, we can now rename new_gh to current_gh in
      demote_incompat_holders(); after all, the HIF_HOLDER flag is already set,
      which means the new holder is already a current holder.
      
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      0befb851
    • Andreas Gruenbacher's avatar
      gfs2: Use better variable name · 8f0028fc
      Andreas Gruenbacher authored
      
      In do_promote() and add_to_queue(), use current_gh as the variable name
      for the first strong holder we could find: this matches the variable
      name is may_grant(), and more clearly indicates that we're interested in
      one (any) of the current strong holders.
      
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      8f0028fc
    • Andreas Gruenbacher's avatar
      gfs2: Make go_instantiate take a glock · 5f38a4d3
      Andreas Gruenbacher authored
      
      Make go_instantiate take a glock instead of a glock holder as its argument:
      this handler is supposed to instantiate the object associated with the glock.
      
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      5f38a4d3
    • Andreas Gruenbacher's avatar
      gfs2: Add new go_held glock operation · 86c30a01
      Andreas Gruenbacher authored
      
      Right now, inode_go_instantiate() contains functionality that relates to
      how a glock is held rather than the glock itself, like waiting for
      pending direct I/O to complete and completing interrupted truncates.
      This code is meant to be run each time a holder is acquired, but
      go_instantiate is actually only called once, when the glock is
      instantiated.
      
      To fix that, introduce a new go_held glock operation that is called each
      time a glock holder is acquired.  Move the holder specific code in
      inode_go_instantiate() over to inode_go_held().
      
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      86c30a01
    • Andreas Gruenbacher's avatar
      gfs2: Revert 'Fix "truncate in progress" hang' · de3f906f
      Andreas Gruenbacher authored
      
      Now that interrupted truncates are completed in the context of the
      process taking the glock, there is no need for the glock state engine to
      delegate that task to gfs2_quotad or for quotad to perform those
      truncates anymore.  Get rid of the obsolete associated infrastructure.
      
      Reverts commit 813e0c46 ("GFS2: Fix "truncate in progress" hang").
      
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      de3f906f
    • Andreas Gruenbacher's avatar
      gfs2: Instantiate glocks ouside of glock state engine · 53d69132
      Andreas Gruenbacher authored
      
      Instantiate glocks outside of the glock state engine: there is no real
      reason for instantiating them inside the glock state engine; it only
      complicates the code.
      
      Instead, instantiate them in gfs2_glock_wait() and gfs2_glock_async_wait()
      using the new gfs2_glock_holder_ready() helper.  On top of that, the only
      other place that acquires a glock without using gfs2_glock_wait() or
      gfs2_glock_async_wait() is gfs2_upgrade_iopen_glock(), so call
      gfs2_glock_holder_ready() there as well.
      
      If a dinode has a pending truncate, the glock-specific instantiate function
      for inodes wakes up the truncate function in the quota daemon.  Waiting for
      the completion of the truncate was previously done by the glock state
      engine, but we now need to wait in inode_go_instantiate().
      
      This also means that gfs2_instantiate() will now no longer return any
      "special" error codes.
      
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      53d69132
    • Andreas Gruenbacher's avatar
      gfs2: Fix up gfs2_glock_async_wait · bdff777c
      Andreas Gruenbacher authored
      
      Since commit 1fc05c8d ("gfs2: cancel timed-out glock requests"), a
      pending locking request can be canceled by calling gfs2_glock_dq() on
      the pending holder.  In gfs2_glock_async_wait(), when we time out, use
      that to cancel the remaining locking requests and dequeue the locking
      requests already granted.  That's simpler as well as more efficient than
      waiting for all locking requests to eventually be granted and dequeuing
      them then.
      
      In addition, gfs2_glock_async_wait() promises that by the time the
      function completes, all glocks are either granted or dequeued, but the
      implementation doesn't keep that promise if individual locking requests
      fail.  Fix that as well.
      
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      bdff777c
    • Andreas Gruenbacher's avatar
      gfs2: Mark the remaining process-independent glock holders as GL_NOPID · ebdc416c
      Andreas Gruenbacher authored
      
      Add the GL_NOPID flag for the remaining glock holders which are not
      associated with the current process.
      
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      ebdc416c
    • Andreas Gruenbacher's avatar
      gfs2: Mark flock glock holders as GL_NOPID · b582d5f0
      Andreas Gruenbacher authored
      
      Add the GL_NOPID flag for flock glock holders.  Clean up the flag
      setting code in do_flock.
      
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      b582d5f0
    • Andreas Gruenbacher's avatar
      gfs2: Add GL_NOPID flag for process-independent glock holders · cbe6d257
      Andreas Gruenbacher authored
      
      Add a GL_NOPID flag to indicate that once a glock holder has been acquired, it
      won't be associated with the current process anymore.  This is useful for iopen
      and flock glocks which are associated with open files, as well as journal glock
      holders and similar which are associated with the filesystem.
      
      Once GL_NOPID is used for all applicable glocks (see the next patches),
      processes will no longer be falsely reported as holding glocks which they are
      not actually holding in the glocks dump file.  Unlike before, when a process is
      reported as having "(ended)", this will indicate an actual bug.
      
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      cbe6d257
    • Andreas Gruenbacher's avatar
      gfs2: Add flocks to glockfd debugfs file · 56535dc6
      Andreas Gruenbacher authored
      
      Include flock glocks in the "glockfd" debugfs file.  Those are similar to the
      iopen glocks; while an open file is holding an flock, it is holding the file's
      flock glock.
      
      We cannot take f_fl_mutex in gfs2_glockfd_seq_show_flock() or else dumping the
      "glockfd" file would block on flock operations.  Instead, use the file->f_lock
      spin lock to protect the f_fl_gh.gh_gl glock pointer.
      
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      56535dc6
    • Andreas Gruenbacher's avatar
      gfs2: Add glockfd debugfs file · 4480c27c
      Andreas Gruenbacher authored
      
      When a process has a gfs2 file open, the file is keeping a reference on the
      underlying gfs2 inode, and the inode is keeping the inode's iopen glock held in
      shared mode.  In other words, the process depends on the iopen glock of each
      open gfs2 file.  Expose those dependencies in a new "glockfd" debugfs file.
      
      The new debugfs file contains one line for each gfs2 file descriptor,
      specifying the tgid, file descriptor number, and glock name, e.g.,
      
        1601 6 5/816d
      
      This list is compiled by iterating all tasks on the system using find_ge_pid(),
      and all file descriptors of each task using task_lookup_next_fd_rcu().  To make
      that work from gfs2, export those two functions.
      
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      4480c27c
  13. Jun 28, 2022
  14. Jun 23, 2022
  15. Jun 09, 2022
  16. Jun 03, 2022
  17. May 24, 2022
Loading