Skip to content
Snippets Groups Projects
  1. Sep 09, 2020
    • Gabriel Krisman's avatar
      f2fs: Return EOF on unaligned end of file DIO read · 20d0a107
      Gabriel Krisman authored
      
      Reading past end of file returns EOF for aligned reads but -EINVAL for
      unaligned reads on f2fs.  While documentation is not strict about this
      corner case, most filesystem returns EOF on this case, like iomap
      filesystems.  This patch consolidates the behavior for f2fs, by making
      it return EOF(0).
      
      it can be verified by a read loop on a file that does a partial read
      before EOF (A file that doesn't end at an aligned address).  The
      following code fails on an unaligned file on f2fs, but not on
      btrfs, ext4, and xfs.
      
        while (done < total) {
          ssize_t delta = pread(fd, buf + done, total - done, off + done);
          if (!delta)
            break;
          ...
        }
      
      It is arguable whether filesystems should actually return EOF or
      -EINVAL, but since iomap filesystems support it, and so does the
      original DIO code, it seems reasonable to consolidate on that.
      
      Signed-off-by: default avatarGabriel Krisman Bertazi <krisman@collabora.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      20d0a107
    • Sahitya Tummala's avatar
      f2fs: fix indefinite loop scanning for free nid · e2cab031
      Sahitya Tummala authored
      
      If the sbi->ckpt->next_free_nid is not NAT block aligned and if there
      are free nids in that NAT block between the start of the block and
      next_free_nid, then those free nids will not be scanned in scan_nat_page().
      This results into mismatch between nm_i->available_nids and the sum of
      nm_i->free_nid_count of all NAT blocks scanned. And nm_i->available_nids
      will always be greater than the sum of free nids in all the blocks.
      Under this condition, if we use all the currently scanned free nids,
      then it will loop forever in f2fs_alloc_nid() as nm_i->available_nids
      is still not zero but nm_i->free_nid_count of that partially scanned
      NAT block is zero.
      
      Fix this to align the nm_i->next_scan_nid to the first nid of the
      corresponding NAT block.
      
      Signed-off-by: default avatarSahitya Tummala <stummala@codeaurora.org>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      e2cab031
    • Shin'ichiro Kawasaki's avatar
      f2fs: Fix type of section block count variables · 123aaf77
      Shin'ichiro Kawasaki authored
      
      Commit da52f8ad ("f2fs: get the right gc victim section when section
      has several segments") added code to count blocks of each section using
      variables with type 'unsigned short', which has 2 bytes size in many
      systems. However, the counts can be larger than the 2 bytes range and
      type conversion results in wrong values. Especially when the f2fs
      sections have blocks as many as USHRT_MAX + 1, the count is handled as 0.
      This triggers eternal loop in init_dirty_segmap() at mount system call.
      Fix this by changing the type of the variables to block_t.
      
      Fixes: da52f8ad ("f2fs: get the right gc victim section when section has several segments")
      Signed-off-by: default avatarShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      123aaf77
  2. Sep 07, 2020
    • Filipe Manana's avatar
      btrfs: fix NULL pointer dereference after failure to create snapshot · 2d892ccd
      Filipe Manana authored
      
      When trying to get a new fs root for a snapshot during the transaction
      at transaction.c:create_pending_snapshot(), if btrfs_get_new_fs_root()
      fails we leave "pending->snap" pointing to an error pointer, and then
      later at ioctl.c:create_snapshot() we dereference that pointer, resulting
      in a crash:
      
        [12264.614689] BUG: kernel NULL pointer dereference, address: 00000000000007c4
        [12264.615650] #PF: supervisor write access in kernel mode
        [12264.616487] #PF: error_code(0x0002) - not-present page
        [12264.617436] PGD 0 P4D 0
        [12264.618328] Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
        [12264.619150] CPU: 0 PID: 2310635 Comm: fsstress Tainted: G        W         5.9.0-rc3-btrfs-next-67 #1
        [12264.619960] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        [12264.621769] RIP: 0010:btrfs_mksubvol+0x438/0x4a0 [btrfs]
        [12264.622528] Code: bc ef ff ff (...)
        [12264.624092] RSP: 0018:ffffaa6fc7277cd8 EFLAGS: 00010282
        [12264.624669] RAX: 00000000fffffff4 RBX: ffff9d3e8f151a60 RCX: 0000000000000000
        [12264.625249] RDX: 0000000000000001 RSI: ffffffff9d56c9be RDI: fffffffffffffff4
        [12264.625830] RBP: ffff9d3e8f151b48 R08: 0000000000000000 R09: 0000000000000000
        [12264.626413] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000fffffff4
        [12264.626994] R13: ffff9d3ede380538 R14: ffff9d3ede380500 R15: ffff9d3f61b2eeb8
        [12264.627582] FS:  00007f140d5d8200(0000) GS:ffff9d3fb5e00000(0000) knlGS:0000000000000000
        [12264.628176] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [12264.628773] CR2: 00000000000007c4 CR3: 000000020f8e8004 CR4: 00000000003706f0
        [12264.629379] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [12264.629994] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [12264.630594] Call Trace:
        [12264.631227]  btrfs_mksnapshot+0x7b/0xb0 [btrfs]
        [12264.631840]  __btrfs_ioctl_snap_create+0x16f/0x1a0 [btrfs]
        [12264.632458]  btrfs_ioctl_snap_create_v2+0xb0/0xf0 [btrfs]
        [12264.633078]  btrfs_ioctl+0x1864/0x3130 [btrfs]
        [12264.633689]  ? do_sys_openat2+0x1a7/0x2d0
        [12264.634295]  ? kmem_cache_free+0x147/0x3a0
        [12264.634899]  ? __x64_sys_ioctl+0x83/0xb0
        [12264.635488]  __x64_sys_ioctl+0x83/0xb0
        [12264.636058]  do_syscall_64+0x33/0x80
        [12264.636616]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        (gdb) list *(btrfs_mksubvol+0x438)
        0x7c7b8 is in btrfs_mksubvol (fs/btrfs/ioctl.c:858).
        853		ret = 0;
        854		pending_snapshot->anon_dev = 0;
        855	fail:
        856		/* Prevent double freeing of anon_dev */
        857		if (ret && pending_snapshot->snap)
        858			pending_snapshot->snap->anon_dev = 0;
        859		btrfs_put_root(pending_snapshot->snap);
        860		btrfs_subvolume_release_metadata(root, &pending_snapshot->block_rsv);
        861	free_pending:
        862		if (pending_snapshot->anon_dev)
      
      So fix this by setting "pending->snap" to NULL if we get an error from the
      call to btrfs_get_new_fs_root() at transaction.c:create_pending_snapshot().
      
      Fixes: 2dfb1e43 ("btrfs: preallocate anon block device at first phase of snapshot creation")
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      2d892ccd
    • Josef Bacik's avatar
      btrfs: free data reloc tree on failed mount · 9e3aa805
      Josef Bacik authored
      
      While testing a weird problem with -o degraded, I noticed I was getting
      leaked root errors
      
        BTRFS warning (device loop0): writable mount is not allowed due to too many missing devices
        BTRFS error (device loop0): open_ctree failed
        BTRFS error (device loop0): leaked root -9-0 refcount 1
      
      This is the DATA_RELOC root, which gets read before the other fs roots,
      but is included in the fs roots radix tree.  Handle this by adding a
      btrfs_drop_and_free_fs_root() on the data reloc root if it exists.  This
      is ok to do here if we fail further up because we will only drop the ref
      if we delete the root from the radix tree, and all other cleanup won't
      be duplicated.
      
      CC: stable@vger.kernel.org # 5.8+
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      9e3aa805
    • Qu Wenruo's avatar
      btrfs: require only sector size alignment for parent eb bytenr · ea57788e
      Qu Wenruo authored
      [BUG]
      A completely sane converted fs will cause kernel warning at balance
      time:
      
        [ 1557.188633] BTRFS info (device sda7): relocating block group 8162107392 flags data
        [ 1563.358078] BTRFS info (device sda7): found 11722 extents
        [ 1563.358277] BTRFS info (device sda7): leaf 7989321728 gen 95 total ptrs 213 free space 3458 owner 2
        [ 1563.358280] 	item 0 key (7984947200 169 0) itemoff 16250 itemsize 33
        [ 1563.358281] 		extent refs 1 gen 90 flags 2
        [ 1563.358282] 		ref#0: tree block backref root 4
        [ 1563.358285] 	item 1 key (7985602560 169 0) itemoff 16217 itemsize 33
        [ 1563.358286] 		extent refs 1 gen 93 flags 258
        [ 1563.358287] 		ref#0: shared block backref parent 7985602560
        [ 1563.358288] 			(parent 7985602560 is NOT ALIGNED to nodesize 16384)
        [ 1563.358290] 	item 2 key (7985635328 169 0) itemoff 16184 itemsize 33
        ...
        [ 1563.358995] BTRFS error (device sda7): eb 7989321728 invalid extent inline ref type 182
        [ 1563.358996] ------------[ cut here ]------------
        [ 1563.359005] WARNING: CPU: 14 PID: 2930 at 0xffffffff9f231766
      
      Then with transaction abort, and obviously failed to balance the fs.
      
      [CAUSE]
      That mentioned inline ref type 182 is completely sane, it's
      BTRFS_SHARED_BLOCK_REF_KEY, it's some extra check making kernel to
      believe it's invalid.
      
      Commit 64ecdb64 ("Btrfs: add one more sanity check for shared ref
      type") introduced extra checks for backref type.
      
      One of the requirement is, parent bytenr must be aligned to node size,
      which is not correct.
      
      One example is like this:
      
      0	1G  1G+4K		2G 2G+4K
      	|   |///////////////////|//|  <- A chunk starts at 1G+4K
                  |   |	<- A tree block get reserved at bytenr 1G+4K
      
      Then we have a valid tree block at bytenr 1G+4K, but not aligned to
      nodesize (16K).
      
      Such chunk is not ideal, but current kernel can handle it pretty well.
      We may warn about such tree block in the future, but should not reject
      them.
      
      [FIX]
      Change the alignment requirement from node size alignment to sector size
      alignment.
      
      Also, to make our lives a little easier, also output @iref when
      btrfs_get_extent_inline_ref_type() failed, so we can locate the item
      easier.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=205475
      
      
      Fixes: 64ecdb64 ("Btrfs: add one more sanity check for shared ref type")
      CC: stable@vger.kernel.org # 4.14+
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      [ update comments and messages ]
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      ea57788e
    • Josef Bacik's avatar
      btrfs: fix lockdep splat in add_missing_dev · fccc0007
      Josef Bacik authored
      
      Nikolay reported a lockdep splat in generic/476 that I could reproduce
      with btrfs/187.
      
        ======================================================
        WARNING: possible circular locking dependency detected
        5.9.0-rc2+ #1 Tainted: G        W
        ------------------------------------------------------
        kswapd0/100 is trying to acquire lock:
        ffff9e8ef38b6268 (&delayed_node->mutex){+.+.}-{3:3}, at: __btrfs_release_delayed_node.part.0+0x3f/0x330
      
        but task is already holding lock:
        ffffffffa9d74700 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #2 (fs_reclaim){+.+.}-{0:0}:
      	 fs_reclaim_acquire+0x65/0x80
      	 slab_pre_alloc_hook.constprop.0+0x20/0x200
      	 kmem_cache_alloc_trace+0x3a/0x1a0
      	 btrfs_alloc_device+0x43/0x210
      	 add_missing_dev+0x20/0x90
      	 read_one_chunk+0x301/0x430
      	 btrfs_read_sys_array+0x17b/0x1b0
      	 open_ctree+0xa62/0x1896
      	 btrfs_mount_root.cold+0x12/0xea
      	 legacy_get_tree+0x30/0x50
      	 vfs_get_tree+0x28/0xc0
      	 vfs_kern_mount.part.0+0x71/0xb0
      	 btrfs_mount+0x10d/0x379
      	 legacy_get_tree+0x30/0x50
      	 vfs_get_tree+0x28/0xc0
      	 path_mount+0x434/0xc00
      	 __x64_sys_mount+0xe3/0x120
      	 do_syscall_64+0x33/0x40
      	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        -> #1 (&fs_info->chunk_mutex){+.+.}-{3:3}:
      	 __mutex_lock+0x7e/0x7e0
      	 btrfs_chunk_alloc+0x125/0x3a0
      	 find_free_extent+0xdf6/0x1210
      	 btrfs_reserve_extent+0xb3/0x1b0
      	 btrfs_alloc_tree_block+0xb0/0x310
      	 alloc_tree_block_no_bg_flush+0x4a/0x60
      	 __btrfs_cow_block+0x11a/0x530
      	 btrfs_cow_block+0x104/0x220
      	 btrfs_search_slot+0x52e/0x9d0
      	 btrfs_lookup_inode+0x2a/0x8f
      	 __btrfs_update_delayed_inode+0x80/0x240
      	 btrfs_commit_inode_delayed_inode+0x119/0x120
      	 btrfs_evict_inode+0x357/0x500
      	 evict+0xcf/0x1f0
      	 vfs_rmdir.part.0+0x149/0x160
      	 do_rmdir+0x136/0x1a0
      	 do_syscall_64+0x33/0x40
      	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        -> #0 (&delayed_node->mutex){+.+.}-{3:3}:
      	 __lock_acquire+0x1184/0x1fa0
      	 lock_acquire+0xa4/0x3d0
      	 __mutex_lock+0x7e/0x7e0
      	 __btrfs_release_delayed_node.part.0+0x3f/0x330
      	 btrfs_evict_inode+0x24c/0x500
      	 evict+0xcf/0x1f0
      	 dispose_list+0x48/0x70
      	 prune_icache_sb+0x44/0x50
      	 super_cache_scan+0x161/0x1e0
      	 do_shrink_slab+0x178/0x3c0
      	 shrink_slab+0x17c/0x290
      	 shrink_node+0x2b2/0x6d0
      	 balance_pgdat+0x30a/0x670
      	 kswapd+0x213/0x4c0
      	 kthread+0x138/0x160
      	 ret_from_fork+0x1f/0x30
      
        other info that might help us debug this:
      
        Chain exists of:
          &delayed_node->mutex --> &fs_info->chunk_mutex --> fs_reclaim
      
         Possible unsafe locking scenario:
      
      	 CPU0                    CPU1
      	 ----                    ----
          lock(fs_reclaim);
      				 lock(&fs_info->chunk_mutex);
      				 lock(fs_reclaim);
          lock(&delayed_node->mutex);
      
         *** DEADLOCK ***
      
        3 locks held by kswapd0/100:
         #0: ffffffffa9d74700 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30
         #1: ffffffffa9d65c50 (shrinker_rwsem){++++}-{3:3}, at: shrink_slab+0x115/0x290
         #2: ffff9e8e9da260e0 (&type->s_umount_key#48){++++}-{3:3}, at: super_cache_scan+0x38/0x1e0
      
        stack backtrace:
        CPU: 1 PID: 100 Comm: kswapd0 Tainted: G        W         5.9.0-rc2+ #1
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
        Call Trace:
         dump_stack+0x92/0xc8
         check_noncircular+0x12d/0x150
         __lock_acquire+0x1184/0x1fa0
         lock_acquire+0xa4/0x3d0
         ? __btrfs_release_delayed_node.part.0+0x3f/0x330
         __mutex_lock+0x7e/0x7e0
         ? __btrfs_release_delayed_node.part.0+0x3f/0x330
         ? __btrfs_release_delayed_node.part.0+0x3f/0x330
         ? lock_acquire+0xa4/0x3d0
         ? btrfs_evict_inode+0x11e/0x500
         ? find_held_lock+0x2b/0x80
         __btrfs_release_delayed_node.part.0+0x3f/0x330
         btrfs_evict_inode+0x24c/0x500
         evict+0xcf/0x1f0
         dispose_list+0x48/0x70
         prune_icache_sb+0x44/0x50
         super_cache_scan+0x161/0x1e0
         do_shrink_slab+0x178/0x3c0
         shrink_slab+0x17c/0x290
         shrink_node+0x2b2/0x6d0
         balance_pgdat+0x30a/0x670
         kswapd+0x213/0x4c0
         ? _raw_spin_unlock_irqrestore+0x46/0x60
         ? add_wait_queue_exclusive+0x70/0x70
         ? balance_pgdat+0x670/0x670
         kthread+0x138/0x160
         ? kthread_create_worker_on_cpu+0x40/0x40
         ret_from_fork+0x1f/0x30
      
      This is because we are holding the chunk_mutex when we call
      btrfs_alloc_device, which does a GFP_KERNEL allocation.  We don't want
      to switch that to a GFP_NOFS lock because this is the only place where
      it matters.  So instead use memalloc_nofs_save() around the allocation
      in order to avoid the lockdep splat.
      
      Reported-by: default avatarNikolay Borisov <nborisov@suse.com>
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      fccc0007
    • Ronnie Sahlberg's avatar
      cifs: fix DFS mount with cifsacl/modefromsid · 01ec372c
      Ronnie Sahlberg authored
      
      RHBZ: 1871246
      
      If during cifs_lookup()/get_inode_info() we encounter a DFS link
      and we use the cifsacl or modefromsid mount options we must suppress
      any -EREMOTE errors that triggers or else we will not be able to follow
      the DFS link and automount the target.
      
      This fixes an issue with modefromsid/cifsacl where these mountoptions
      would break DFS and we would no longer be able to access the share.
      
      Signed-off-by: default avatarRonnie Sahlberg <lsahlber@redhat.com>
      Reviewed-by: default avatarPaulo Alcantara (SUSE) <pc@cjr.nz>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      01ec372c
  3. Sep 05, 2020
  4. Sep 04, 2020
  5. Sep 03, 2020
  6. Sep 02, 2020
  7. Sep 01, 2020
  8. Aug 31, 2020
    • Max Staudt's avatar
      affs: fix basic permission bits to actually work · d3a84a8d
      Max Staudt authored
      
      The basic permission bits (protection bits in AmigaOS) have been broken
      in Linux' AFFS - it would only set bits, but never delete them.
      Also, contrary to the documentation, the Archived bit was not handled.
      
      Let's fix this for good, and set the bits such that Linux and classic
      AmigaOS can coexist in the most peaceful manner.
      
      Also, update the documentation to represent the current state of things.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMax Staudt <max@enpas.org>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      d3a84a8d
  9. Aug 28, 2020
  10. Aug 27, 2020
    • Jens Axboe's avatar
      io_uring: don't bounce block based -EAGAIN retry off task_work · fdee946d
      Jens Axboe authored
      
      These events happen inline from submission, so there's no need to
      bounce them through the original task. Just set them up for retry
      and issue retry directly instead of going over task_work.
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      fdee946d
    • Jens Axboe's avatar
      io_uring: fix IOPOLL -EAGAIN retries · eefdf30f
      Jens Axboe authored
      
      This normally isn't hit, as polling is mostly done on NVMe with deep
      queue depths. But if we do run into request starvation, we need to
      ensure that retries are properly serialized.
      
      Reported-by: default avatarAndres Freund <andres@anarazel.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      eefdf30f
    • Dan Carpenter's avatar
      afs: Remove erroneous fallthough annotation · 210e799e
      Dan Carpenter authored
      
      The fall through annotation comes after a return statement so it's not
      reachable.
      
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarGustavo A. R. Silva <gustavoars@kernel.org>
      210e799e
    • Darrick J. Wong's avatar
      xfs: initialize the shortform attr header padding entry · 125eac24
      Darrick J. Wong authored
      
      Don't leak kernel memory contents into the shortform attr fork.
      
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      125eac24
    • Qu Wenruo's avatar
      btrfs: tree-checker: fix the error message for transid error · f96d6960
      Qu Wenruo authored
      
      The error message for inode transid is the same as for inode generation,
      which makes us unable to detect the real problem.
      
      Reported-by: default avatarTyler Richmond <t.d.richmond@gmail.com>
      Fixes: 496245ca ("btrfs: tree-checker: Verify inode item")
      CC: stable@vger.kernel.org # 5.4+
      Reviewed-by: default avatarMarcos Paulo de Souza <mpdesouza@suse.com>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      f96d6960
    • Josef Bacik's avatar
      btrfs: set the lockdep class for log tree extent buffers · d3beaa25
      Josef Bacik authored
      
      These are special extent buffers that get rewound in order to lookup
      the state of the tree at a specific point in time.  As such they do not
      go through the normal initialization paths that set their lockdep class,
      so handle them appropriately when they are created and before they are
      locked.
      
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      d3beaa25
    • Josef Bacik's avatar
      btrfs: set the correct lockdep class for new nodes · ad244665
      Josef Bacik authored
      
      When flipping over to the rw_semaphore I noticed I'd get a lockdep splat
      in replace_path(), which is weird because we're swapping the reloc root
      with the actual target root.  Turns out this is because we're using the
      root->root_key.objectid as the root id for the newly allocated tree
      block when setting the lockdep class, however we need to be using the
      actual owner of this new block, which is saved in owner.
      
      The affected path is through btrfs_copy_root as all other callers of
      btrfs_alloc_tree_block (which calls init_new_buffer) have root_objectid
      == root->root_key.objectid .
      
      CC: stable@vger.kernel.org # 5.4+
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      ad244665
    • Josef Bacik's avatar
      btrfs: allocate scrub workqueues outside of locks · e89c4a9c
      Josef Bacik authored
      
      I got the following lockdep splat while testing:
      
        ======================================================
        WARNING: possible circular locking dependency detected
        5.8.0-rc7-00172-g021118712e59 #932 Not tainted
        ------------------------------------------------------
        btrfs/229626 is trying to acquire lock:
        ffffffff828513f0 (cpu_hotplug_lock){++++}-{0:0}, at: alloc_workqueue+0x378/0x450
      
        but task is already holding lock:
        ffff889dd3889518 (&fs_info->scrub_lock){+.+.}-{3:3}, at: btrfs_scrub_dev+0x11c/0x630
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #7 (&fs_info->scrub_lock){+.+.}-{3:3}:
      	 __mutex_lock+0x9f/0x930
      	 btrfs_scrub_dev+0x11c/0x630
      	 btrfs_dev_replace_by_ioctl.cold.21+0x10a/0x1d4
      	 btrfs_ioctl+0x2799/0x30a0
      	 ksys_ioctl+0x83/0xc0
      	 __x64_sys_ioctl+0x16/0x20
      	 do_syscall_64+0x50/0x90
      	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        -> #6 (&fs_devs->device_list_mutex){+.+.}-{3:3}:
      	 __mutex_lock+0x9f/0x930
      	 btrfs_run_dev_stats+0x49/0x480
      	 commit_cowonly_roots+0xb5/0x2a0
      	 btrfs_commit_transaction+0x516/0xa60
      	 sync_filesystem+0x6b/0x90
      	 generic_shutdown_super+0x22/0x100
      	 kill_anon_super+0xe/0x30
      	 btrfs_kill_super+0x12/0x20
      	 deactivate_locked_super+0x29/0x60
      	 cleanup_mnt+0xb8/0x140
      	 task_work_run+0x6d/0xb0
      	 __prepare_exit_to_usermode+0x1cc/0x1e0
      	 do_syscall_64+0x5c/0x90
      	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        -> #5 (&fs_info->tree_log_mutex){+.+.}-{3:3}:
      	 __mutex_lock+0x9f/0x930
      	 btrfs_commit_transaction+0x4bb/0xa60
      	 sync_filesystem+0x6b/0x90
      	 generic_shutdown_super+0x22/0x100
      	 kill_anon_super+0xe/0x30
      	 btrfs_kill_super+0x12/0x20
      	 deactivate_locked_super+0x29/0x60
      	 cleanup_mnt+0xb8/0x140
      	 task_work_run+0x6d/0xb0
      	 __prepare_exit_to_usermode+0x1cc/0x1e0
      	 do_syscall_64+0x5c/0x90
      	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        -> #4 (&fs_info->reloc_mutex){+.+.}-{3:3}:
      	 __mutex_lock+0x9f/0x930
      	 btrfs_record_root_in_trans+0x43/0x70
      	 start_transaction+0xd1/0x5d0
      	 btrfs_dirty_inode+0x42/0xd0
      	 touch_atime+0xa1/0xd0
      	 btrfs_file_mmap+0x3f/0x60
      	 mmap_region+0x3a4/0x640
      	 do_mmap+0x376/0x580
      	 vm_mmap_pgoff+0xd5/0x120
      	 ksys_mmap_pgoff+0x193/0x230
      	 do_syscall_64+0x50/0x90
      	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        -> #3 (&mm->mmap_lock#2){++++}-{3:3}:
      	 __might_fault+0x68/0x90
      	 _copy_to_user+0x1e/0x80
      	 perf_read+0x141/0x2c0
      	 vfs_read+0xad/0x1b0
      	 ksys_read+0x5f/0xe0
      	 do_syscall_64+0x50/0x90
      	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        -> #2 (&cpuctx_mutex){+.+.}-{3:3}:
      	 __mutex_lock+0x9f/0x930
      	 perf_event_init_cpu+0x88/0x150
      	 perf_event_init+0x1db/0x20b
      	 start_kernel+0x3ae/0x53c
      	 secondary_startup_64+0xa4/0xb0
      
        -> #1 (pmus_lock){+.+.}-{3:3}:
      	 __mutex_lock+0x9f/0x930
      	 perf_event_init_cpu+0x4f/0x150
      	 cpuhp_invoke_callback+0xb1/0x900
      	 _cpu_up.constprop.26+0x9f/0x130
      	 cpu_up+0x7b/0xc0
      	 bringup_nonboot_cpus+0x4f/0x60
      	 smp_init+0x26/0x71
      	 kernel_init_freeable+0x110/0x258
      	 kernel_init+0xa/0x103
      	 ret_from_fork+0x1f/0x30
      
        -> #0 (cpu_hotplug_lock){++++}-{0:0}:
      	 __lock_acquire+0x1272/0x2310
      	 lock_acquire+0x9e/0x360
      	 cpus_read_lock+0x39/0xb0
      	 alloc_workqueue+0x378/0x450
      	 __btrfs_alloc_workqueue+0x15d/0x200
      	 btrfs_alloc_workqueue+0x51/0x160
      	 scrub_workers_get+0x5a/0x170
      	 btrfs_scrub_dev+0x18c/0x630
      	 btrfs_dev_replace_by_ioctl.cold.21+0x10a/0x1d4
      	 btrfs_ioctl+0x2799/0x30a0
      	 ksys_ioctl+0x83/0xc0
      	 __x64_sys_ioctl+0x16/0x20
      	 do_syscall_64+0x50/0x90
      	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        other info that might help us debug this:
      
        Chain exists of:
          cpu_hotplug_lock --> &fs_devs->device_list_mutex --> &fs_info->scrub_lock
      
         Possible unsafe locking scenario:
      
      	 CPU0                    CPU1
      	 ----                    ----
          lock(&fs_info->scrub_lock);
      				 lock(&fs_devs->device_list_mutex);
      				 lock(&fs_info->scrub_lock);
          lock(cpu_hotplug_lock);
      
         *** DEADLOCK ***
      
        2 locks held by btrfs/229626:
         #0: ffff88bfe8bb86e0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: btrfs_scrub_dev+0xbd/0x630
         #1: ffff889dd3889518 (&fs_info->scrub_lock){+.+.}-{3:3}, at: btrfs_scrub_dev+0x11c/0x630
      
        stack backtrace:
        CPU: 15 PID: 229626 Comm: btrfs Kdump: loaded Not tainted 5.8.0-rc7-00172-g021118712e59 #932
        Hardware name: Quanta Tioga Pass Single Side 01-0030993006/Tioga Pass Single Side, BIOS F08_3A18 12/20/2018
        Call Trace:
         dump_stack+0x78/0xa0
         check_noncircular+0x165/0x180
         __lock_acquire+0x1272/0x2310
         lock_acquire+0x9e/0x360
         ? alloc_workqueue+0x378/0x450
         cpus_read_lock+0x39/0xb0
         ? alloc_workqueue+0x378/0x450
         alloc_workqueue+0x378/0x450
         ? rcu_read_lock_sched_held+0x52/0x80
         __btrfs_alloc_workqueue+0x15d/0x200
         btrfs_alloc_workqueue+0x51/0x160
         scrub_workers_get+0x5a/0x170
         btrfs_scrub_dev+0x18c/0x630
         ? start_transaction+0xd1/0x5d0
         btrfs_dev_replace_by_ioctl.cold.21+0x10a/0x1d4
         btrfs_ioctl+0x2799/0x30a0
         ? do_sigaction+0x102/0x250
         ? lockdep_hardirqs_on_prepare+0xca/0x160
         ? _raw_spin_unlock_irq+0x24/0x30
         ? trace_hardirqs_on+0x1c/0xe0
         ? _raw_spin_unlock_irq+0x24/0x30
         ? do_sigaction+0x102/0x250
         ? ksys_ioctl+0x83/0xc0
         ksys_ioctl+0x83/0xc0
         __x64_sys_ioctl+0x16/0x20
         do_syscall_64+0x50/0x90
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      This happens because we're allocating the scrub workqueues under the
      scrub and device list mutex, which brings in a whole host of other
      dependencies.
      
      Because the work queue allocation is done with GFP_KERNEL, it can
      trigger reclaim, which can lead to a transaction commit, which in turns
      needs the device_list_mutex, it can lead to a deadlock. A different
      problem for which this fix is a solution.
      
      Fix this by moving the actual allocation outside of the
      scrub lock, and then only take the lock once we're ready to actually
      assign them to the fs_info.  We'll now have to cleanup the workqueues in
      a few more places, so I've added a helper to do the refcount dance to
      safely free the workqueues.
      
      CC: stable@vger.kernel.org # 5.4+
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      e89c4a9c
    • Josef Bacik's avatar
      btrfs: fix potential deadlock in the search ioctl · a48b73ec
      Josef Bacik authored
      
      With the conversion of the tree locks to rwsem I got the following
      lockdep splat:
      
        ======================================================
        WARNING: possible circular locking dependency detected
        5.8.0-rc7-00165-g04ec4da5f45f-dirty #922 Not tainted
        ------------------------------------------------------
        compsize/11122 is trying to acquire lock:
        ffff889fabca8768 (&mm->mmap_lock#2){++++}-{3:3}, at: __might_fault+0x3e/0x90
      
        but task is already holding lock:
        ffff889fe720fe40 (btrfs-fs-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x39/0x180
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #2 (btrfs-fs-00){++++}-{3:3}:
      	 down_write_nested+0x3b/0x70
      	 __btrfs_tree_lock+0x24/0x120
      	 btrfs_search_slot+0x756/0x990
      	 btrfs_lookup_inode+0x3a/0xb4
      	 __btrfs_update_delayed_inode+0x93/0x270
      	 btrfs_async_run_delayed_root+0x168/0x230
      	 btrfs_work_helper+0xd4/0x570
      	 process_one_work+0x2ad/0x5f0
      	 worker_thread+0x3a/0x3d0
      	 kthread+0x133/0x150
      	 ret_from_fork+0x1f/0x30
      
        -> #1 (&delayed_node->mutex){+.+.}-{3:3}:
      	 __mutex_lock+0x9f/0x930
      	 btrfs_delayed_update_inode+0x50/0x440
      	 btrfs_update_inode+0x8a/0xf0
      	 btrfs_dirty_inode+0x5b/0xd0
      	 touch_atime+0xa1/0xd0
      	 btrfs_file_mmap+0x3f/0x60
      	 mmap_region+0x3a4/0x640
      	 do_mmap+0x376/0x580
      	 vm_mmap_pgoff+0xd5/0x120
      	 ksys_mmap_pgoff+0x193/0x230
      	 do_syscall_64+0x50/0x90
      	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        -> #0 (&mm->mmap_lock#2){++++}-{3:3}:
      	 __lock_acquire+0x1272/0x2310
      	 lock_acquire+0x9e/0x360
      	 __might_fault+0x68/0x90
      	 _copy_to_user+0x1e/0x80
      	 copy_to_sk.isra.32+0x121/0x300
      	 search_ioctl+0x106/0x200
      	 btrfs_ioctl_tree_search_v2+0x7b/0xf0
      	 btrfs_ioctl+0x106f/0x30a0
      	 ksys_ioctl+0x83/0xc0
      	 __x64_sys_ioctl+0x16/0x20
      	 do_syscall_64+0x50/0x90
      	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        other info that might help us debug this:
      
        Chain exists of:
          &mm->mmap_lock#2 --> &delayed_node->mutex --> btrfs-fs-00
      
         Possible unsafe locking scenario:
      
      	 CPU0                    CPU1
      	 ----                    ----
          lock(btrfs-fs-00);
      				 lock(&delayed_node->mutex);
      				 lock(btrfs-fs-00);
          lock(&mm->mmap_lock#2);
      
         *** DEADLOCK ***
      
        1 lock held by compsize/11122:
         #0: ffff889fe720fe40 (btrfs-fs-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x39/0x180
      
        stack backtrace:
        CPU: 17 PID: 11122 Comm: compsize Kdump: loaded Not tainted 5.8.0-rc7-00165-g04ec4da5f45f-dirty #922
        Hardware name: Quanta Tioga Pass Single Side 01-0030993006/Tioga Pass Single Side, BIOS F08_3A18 12/20/2018
        Call Trace:
         dump_stack+0x78/0xa0
         check_noncircular+0x165/0x180
         __lock_acquire+0x1272/0x2310
         lock_acquire+0x9e/0x360
         ? __might_fault+0x3e/0x90
         ? find_held_lock+0x72/0x90
         __might_fault+0x68/0x90
         ? __might_fault+0x3e/0x90
         _copy_to_user+0x1e/0x80
         copy_to_sk.isra.32+0x121/0x300
         ? btrfs_search_forward+0x2a6/0x360
         search_ioctl+0x106/0x200
         btrfs_ioctl_tree_search_v2+0x7b/0xf0
         btrfs_ioctl+0x106f/0x30a0
         ? __do_sys_newfstat+0x5a/0x70
         ? ksys_ioctl+0x83/0xc0
         ksys_ioctl+0x83/0xc0
         __x64_sys_ioctl+0x16/0x20
         do_syscall_64+0x50/0x90
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The problem is we're doing a copy_to_user() while holding tree locks,
      which can deadlock if we have to do a page fault for the copy_to_user().
      This exists even without my locking changes, so it needs to be fixed.
      Rework the search ioctl to do the pre-fault and then
      copy_to_user_nofault for the copying.
      
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      a48b73ec
    • Josef Bacik's avatar
      btrfs: drop path before adding new uuid tree entry · 9771a5cf
      Josef Bacik authored
      
      With the conversion of the tree locks to rwsem I got the following
      lockdep splat:
      
        ======================================================
        WARNING: possible circular locking dependency detected
        5.8.0-rc7-00167-g0d7ba0c5b375-dirty #925 Not tainted
        ------------------------------------------------------
        btrfs-uuid/7955 is trying to acquire lock:
        ffff88bfbafec0f8 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x39/0x180
      
        but task is already holding lock:
        ffff88bfbafef2a8 (btrfs-uuid-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x39/0x180
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #1 (btrfs-uuid-00){++++}-{3:3}:
      	 down_read_nested+0x3e/0x140
      	 __btrfs_tree_read_lock+0x39/0x180
      	 __btrfs_read_lock_root_node+0x3a/0x50
      	 btrfs_search_slot+0x4bd/0x990
      	 btrfs_uuid_tree_add+0x89/0x2d0
      	 btrfs_uuid_scan_kthread+0x330/0x390
      	 kthread+0x133/0x150
      	 ret_from_fork+0x1f/0x30
      
        -> #0 (btrfs-root-00){++++}-{3:3}:
      	 __lock_acquire+0x1272/0x2310
      	 lock_acquire+0x9e/0x360
      	 down_read_nested+0x3e/0x140
      	 __btrfs_tree_read_lock+0x39/0x180
      	 __btrfs_read_lock_root_node+0x3a/0x50
      	 btrfs_search_slot+0x4bd/0x990
      	 btrfs_find_root+0x45/0x1b0
      	 btrfs_read_tree_root+0x61/0x100
      	 btrfs_get_root_ref.part.50+0x143/0x630
      	 btrfs_uuid_tree_iterate+0x207/0x314
      	 btrfs_uuid_rescan_kthread+0x12/0x50
      	 kthread+0x133/0x150
      	 ret_from_fork+0x1f/0x30
      
        other info that might help us debug this:
      
         Possible unsafe locking scenario:
      
      	 CPU0                    CPU1
      	 ----                    ----
          lock(btrfs-uuid-00);
      				 lock(btrfs-root-00);
      				 lock(btrfs-uuid-00);
          lock(btrfs-root-00);
      
         *** DEADLOCK ***
      
        1 lock held by btrfs-uuid/7955:
         #0: ffff88bfbafef2a8 (btrfs-uuid-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x39/0x180
      
        stack backtrace:
        CPU: 73 PID: 7955 Comm: btrfs-uuid Kdump: loaded Not tainted 5.8.0-rc7-00167-g0d7ba0c5b375-dirty #925
        Hardware name: Quanta Tioga Pass Single Side 01-0030993006/Tioga Pass Single Side, BIOS F08_3A18 12/20/2018
        Call Trace:
         dump_stack+0x78/0xa0
         check_noncircular+0x165/0x180
         __lock_acquire+0x1272/0x2310
         lock_acquire+0x9e/0x360
         ? __btrfs_tree_read_lock+0x39/0x180
         ? btrfs_root_node+0x1c/0x1d0
         down_read_nested+0x3e/0x140
         ? __btrfs_tree_read_lock+0x39/0x180
         __btrfs_tree_read_lock+0x39/0x180
         __btrfs_read_lock_root_node+0x3a/0x50
         btrfs_search_slot+0x4bd/0x990
         btrfs_find_root+0x45/0x1b0
         btrfs_read_tree_root+0x61/0x100
         btrfs_get_root_ref.part.50+0x143/0x630
         btrfs_uuid_tree_iterate+0x207/0x314
         ? btree_readpage+0x20/0x20
         btrfs_uuid_rescan_kthread+0x12/0x50
         kthread+0x133/0x150
         ? kthread_create_on_node+0x60/0x60
         ret_from_fork+0x1f/0x30
      
      This problem exists because we have two different rescan threads,
      btrfs_uuid_scan_kthread which creates the uuid tree, and
      btrfs_uuid_tree_iterate that goes through and updates or deletes any out
      of date roots.  The problem is they both do things in different order.
      btrfs_uuid_scan_kthread() reads the tree_root, and then inserts entries
      into the uuid_root.  btrfs_uuid_tree_iterate() scans the uuid_root, but
      then does a btrfs_get_fs_root() which can read from the tree_root.
      
      It's actually easy enough to not be holding the path in
      btrfs_uuid_scan_kthread() when we add a uuid entry, as we already drop
      it further down and re-start the search when we loop.  So simply move
      the path release before we add our entry to the uuid tree.
      
      This also fixes a problem where we're holding a path open after we do
      btrfs_end_transaction(), which has it's own problems.
      
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      9771a5cf
    • Marcos Paulo de Souza's avatar
      btrfs: block-group: fix free-space bitmap threshold · e3e39c72
      Marcos Paulo de Souza authored
      [BUG]
      After commit 9afc6649 ("btrfs: block-group: refactor how we read one
      block group item"), cache->length is being assigned after calling
      btrfs_create_block_group_cache. This causes a problem since
      set_free_space_tree_thresholds calculates the free-space threshold to
      decide if the free-space tree should convert from extents to bitmaps.
      
      The current code calls set_free_space_tree_thresholds with cache->length
      being 0, which then makes cache->bitmap_high_thresh zero. This implies
      the system will always use bitmap instead of extents, which is not
      desired if the block group is not fragmented.
      
      This behavior can be seen by a test that expects to repair systems
      with FREE_SPACE_EXTENT and FREE_SPACE_BITMAP, but the current code only
      created FREE_SPACE_BITMAP.
      
      [FIX]
      Call set_free_space_tree_thresholds after setting cache->length. There
      is now a WARN_ON in set_free_space_tree_thresholds to help preventing
      the same mistake to happen again in the future.
      
      Link: https://github.com/kdave/btrfs-progs/issues/251
      
      
      Fixes: 9afc6649 ("btrfs: block-group: refactor how we read one block group item")
      CC: stable@vger.kernel.org # 5.8+
      Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarMarcos Paulo de Souza <mpdesouza@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      e3e39c72
    • Jens Axboe's avatar
      io_uring: clear req->result on IOPOLL re-issue · 56450c20
      Jens Axboe authored
      
      Make sure we clear req->result, which was set to -EAGAIN for retry
      purposes, when moving it to the reissue list. Otherwise we can end up
      retrying a request more than once, which leads to weird results in
      the io-wq handling (and other spots).
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarAndres Freund <andres@anarazel.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      56450c20
    • Olga Kornievskaia's avatar
      NFSv4.1 handle ERR_DELAY error reclaiming locking state on delegation recall · 3d7a9520
      Olga Kornievskaia authored
      
      A client should be able to handle getting an ERR_DELAY error
      while doing a LOCK call to reclaim state due to delegation being
      recalled. This is a transient error that can happen due to server
      moving its volumes and invalidating its file location cache and
      upon reference to it during the LOCK call needing to do an
      expensive lookup (leading to an ERR_DELAY error on a PUTFH).
      
      Signed-off-by: default avatarOlga Kornievskaia <kolga@netapp.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      3d7a9520
  11. Aug 26, 2020
    • Eric Sandeen's avatar
      xfs: fix boundary test in xfs_attr_shortform_verify · f4020438
      Eric Sandeen authored
      
      The boundary test for the fixed-offset parts of xfs_attr_sf_entry in
      xfs_attr_shortform_verify is off by one, because the variable array
      at the end is defined as nameval[1] not nameval[].
      Hence we need to subtract 1 from the calculation.
      
      This can be shown by:
      
      # touch file
      # setfattr -n root.a file
      
      and verifications will fail when it's written to disk.
      
      This only matters for a last attribute which has a single-byte name
      and no value, otherwise the combination of namelen & valuelen will
      push endp further out and this test won't fail.
      
      Fixes: 1e1bbd8e ("xfs: create structure verifier function for shortform xattrs")
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      f4020438
    • Brian Foster's avatar
      xfs: fix off-by-one in inode alloc block reservation calculation · 657f1019
      Brian Foster authored
      
      The inode chunk allocation transaction reserves inobt_maxlevels-1
      blocks to accommodate a full split of the inode btree. A full split
      requires an allocation for every existing level and a new root
      block, which means inobt_maxlevels is the worst case block
      requirement for a transaction that inserts to the inobt. This can
      lead to a transaction block reservation overrun when tmpfile
      creation allocates an inode chunk and expands the inobt to its
      maximum depth. This problem has been observed in conjunction with
      overlayfs, which makes frequent use of tmpfiles internally.
      
      The existing reservation code goes back as far as the Linux git repo
      history (v2.6.12). It was likely never observed as a problem because
      the traditional file/directory creation transactions also include
      worst case block reservation for directory modifications, which most
      likely is able to make up for a single block deficiency in the inode
      allocation portion of the calculation. tmpfile support is relatively
      more recent (v3.15), less heavily used, and only includes the inode
      allocation block reservation as tmpfiles aren't linked into the
      directory tree on creation.
      
      Fix up the inode alloc block reservation macro and a couple of the
      block allocator minleft parameters that enforce an allocation to
      leave enough free blocks in the AG for a full inobt split.
      
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      657f1019
    • Brian Foster's avatar
      xfs: finish dfops on every insert range shift iteration · 9c516e0e
      Brian Foster authored
      
      The recent change to make insert range an atomic operation used the
      incorrect transaction rolling mechanism. The explicit transaction
      roll does not finish deferred operations. This means that intents
      for rmapbt updates caused by extent shifts are not logged until the
      final transaction commits. Thus if a crash occurs during an insert
      range, log recovery might leave the rmapbt in an inconsistent state.
      This was discovered by repeated runs of generic/455.
      
      Update insert range to finish dfops on every shift iteration. This
      is similar to collapse range and ensures that intents are logged
      with the transactions that make associated changes.
      
      Fixes: dd87f87d ("xfs: rework insert range into an atomic operation")
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      9c516e0e
    • Jens Axboe's avatar
      io_uring: make offset == -1 consistent with preadv2/pwritev2 · 0fef9483
      Jens Axboe authored
      
      The man page for io_uring generally claims were consistent with what
      preadv2 and pwritev2 accept, but turns out there's a slight discrepancy
      in how offset == -1 is handled for pipes/streams. preadv doesn't allow
      it, but preadv2 does. This currently causes io_uring to return -EINVAL
      if that is attempted, but we should allow that as documented.
      
      This change makes us consistent with preadv2/pwritev2 for just passing
      in a NULL ppos for streams if the offset is -1.
      
      Cc: stable@vger.kernel.org # v5.7+
      Reported-by: default avatarBenedikt Ames <wisp3rwind@posteo.eu>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      0fef9483
  12. Aug 25, 2020
Loading