Skip to content
Snippets Groups Projects
  1. Mar 22, 2023
  2. Mar 17, 2023
    • Seth Forshee's avatar
      filelocks: use mount idmapping for setlease permission check · c20fb060
      Seth Forshee authored
      
      commit 42d0c4bd upstream.
      
      A user should be allowed to take out a lease via an idmapped mount if
      the fsuid matches the mapped uid of the inode. generic_setlease() is
      checking the unmapped inode uid, causing these operations to be denied.
      
      Fix this by comparing against the mapped inode uid instead of the
      unmapped uid.
      
      Fixes: 9caccd41 ("fs: introduce MOUNT_ATTR_IDMAP")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSeth Forshee (DigitalOcean) <sforshee@kernel.org>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c20fb060
    • Jan Kara's avatar
      ext4: Fix deadlock during directory rename · 352c7286
      Jan Kara authored
      
      [ Upstream commit 3c92792d ]
      
      As lockdep properly warns, we should not be locking i_rwsem while having
      transactions started as the proper lock ordering used by all directory
      handling operations is i_rwsem -> transaction start. Fix the lock
      ordering by moving the locking of the directory earlier in
      ext4_rename().
      
      Reported-by: default avatar <syzbot+9d16c39efb5fade84574@syzkaller.appspotmail.com>
      Fixes: 0813299c ("ext4: Fix possible corruption when moving a directory")
      Link: https://syzkaller.appspot.com/bug?extid=9d16c39efb5fade84574
      
      
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20230301141004.15087-1-jack@suse.cz
      
      
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      352c7286
    • Gao Xiang's avatar
      erofs: Revert "erofs: fix kvcalloc() misuse with __GFP_NOFAIL" · cd3e080b
      Gao Xiang authored
      [ Upstream commit 647dd2c3 ]
      
      Let's revert commit 12724ba3 ("erofs: fix kvcalloc() misuse with
      __GFP_NOFAIL") since kvmalloc() already supports __GFP_NOFAIL in commit
      a421ef30 ("mm: allow !GFP_KERNEL allocations for kvmalloc").  So
      the original fix was wrong.
      
      Actually there was some issue as [1] discussed, so before that mm fix
      is landed, the warn could still happen but applying this commit first
      will cause less.
      
      [1] https://lore.kernel.org/r/20230305053035.1911-1-hsiangkao@linux.alibaba.com
      
      
      
      Fixes: 12724ba3 ("erofs: fix kvcalloc() misuse with __GFP_NOFAIL")
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Link: https://lore.kernel.org/r/20230309053148.9223-1-hsiangkao@linux.alibaba.com
      
      
      Signed-off-by: default avatarGao Xiang <hsiangkao@linux.alibaba.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      cd3e080b
    • Chuck Lever's avatar
      NFSD: Protect against filesystem freezing · 81d3e32c
      Chuck Lever authored
      
      [ Upstream commit fd9a2e1d ]
      
      Flole observes this WARNING on occasion:
      
      [1210423.486503] WARNING: CPU: 8 PID: 1524732 at fs/ext4/ext4_jbd2.c:75 ext4_journal_check_start+0x68/0xb0
      
      Reported-by: default avatar <flole@flole.de>
      Suggested-by: default avatarJan Kara <jack@suse.cz>
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=217123
      
      
      Fixes: 73da852e ("nfsd: use vfs_iter_read/write")
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      81d3e32c
    • Filipe Manana's avatar
      btrfs: fix extent map logging bit not cleared for split maps after dropping range · 895c7c45
      Filipe Manana authored
      
      [ Upstream commit e4cc1483 ]
      
      At btrfs_drop_extent_map_range() we are clearing the EXTENT_FLAG_LOGGING
      bit on a 'flags' variable that was not initialized. This makes static
      checkers complain about it, so initialize the 'flags' variable before
      clearing the bit.
      
      In practice this has no consequences, because EXTENT_FLAG_LOGGING should
      not be set when btrfs_drop_extent_map_range() is called, as an fsync locks
      the inode in exclusive mode, locks the inode's mmap semaphore in exclusive
      mode too and it always flushes all delalloc.
      
      Also add a comment about why we clear EXTENT_FLAG_LOGGING on a copy of the
      flags of the split extent map.
      
      Reported-by: default avatarDan Carpenter <error27@gmail.com>
      Link: https://lore.kernel.org/linux-btrfs/Y%2FyipSVozUDEZKow@kili/
      
      
      Fixes: db21370b ("btrfs: drop extent map range more efficiently")
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      895c7c45
    • Jan Kara's avatar
      ext4: Fix possible corruption when moving a directory · 291cd19d
      Jan Kara authored
      
      [ Upstream commit 0813299c ]
      
      When we are renaming a directory to a different directory, we need to
      update '..' entry in the moved directory. However nothing prevents moved
      directory from being modified and even converted from the inline format
      to the normal format. When such race happens the rename code gets
      confused and we crash. Fix the problem by locking the moved directory.
      
      CC: stable@vger.kernel.org
      Fixes: 32f7f22c ("ext4: let ext4_rename handle inline dir")
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20230126112221.11866-1-jack@suse.cz
      
      
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      291cd19d
    • Jan Kara's avatar
      udf: Fix off-by-one error when discarding preallocation · 70b91843
      Jan Kara authored
      
      [ Upstream commit f54aa97f ]
      
      The condition determining whether the preallocation can be used had
      an off-by-one error so we didn't discard preallocation when new
      allocation was just following it. This can then confuse code in
      inode_getblk().
      
      CC: stable@vger.kernel.org
      Fixes: 16d05565 ("udf: Discard preallocation before extending file with a hole")
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      70b91843
    • Zhihao Cheng's avatar
      ext4: zero i_disksize when initializing the bootloader inode · 9e9a4cc5
      Zhihao Cheng authored
      commit f5361da1 upstream.
      
      If the boot loader inode has never been used before, the
      EXT4_IOC_SWAP_BOOT inode will initialize it, including setting the
      i_size to 0.  However, if the "never before used" boot loader has a
      non-zero i_size, then i_disksize will be non-zero, and the
      inconsistency between i_size and i_disksize can trigger a kernel
      warning:
      
       WARNING: CPU: 0 PID: 2580 at fs/ext4/file.c:319
       CPU: 0 PID: 2580 Comm: bb Not tainted 6.3.0-rc1-00004-g703695902cfa
       RIP: 0010:ext4_file_write_iter+0xbc7/0xd10
       Call Trace:
        vfs_write+0x3b1/0x5c0
        ksys_write+0x77/0x160
        __x64_sys_write+0x22/0x30
        do_syscall_64+0x39/0x80
      
      Reproducer:
       1. create corrupted image and mount it:
             mke2fs -t ext4 /tmp/foo.img 200
             debugfs -wR "sif <5> size 25700" /tmp/foo.img
             mount -t ext4 /tmp/foo.img /mnt
             cd /mnt
             echo 123 > file
       2. Run the reproducer program:
             posix_memalign(&buf, 1024, 1024)
             fd = open("file", O_RDWR | O_DIRECT);
             ioctl(fd, EXT4_IOC_SWAP_BOOT);
             write(fd, buf, 1024);
      
      Fix this by setting i_disksize as well as i_size to zero when
      initiaizing the boot loader inode.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=217159
      
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarZhihao Cheng <chengzhihao1@huawei.com>
      Link: https://lore.kernel.org/r/20230308032643.641113-1-chengzhihao1@huawei.com
      
      
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9e9a4cc5
    • Ye Bin's avatar
      ext4: fix WARNING in ext4_update_inline_data · 92eee6a8
      Ye Bin authored
      
      commit 2b96b4a5 upstream.
      
      Syzbot found the following issue:
      EXT4-fs (loop0): mounted filesystem 00000000-0000-0000-0000-000000000000 without journal. Quota mode: none.
      fscrypt: AES-256-CTS-CBC using implementation "cts-cbc-aes-aesni"
      fscrypt: AES-256-XTS using implementation "xts-aes-aesni"
      ------------[ cut here ]------------
      WARNING: CPU: 0 PID: 5071 at mm/page_alloc.c:5525 __alloc_pages+0x30a/0x560 mm/page_alloc.c:5525
      Modules linked in:
      CPU: 1 PID: 5071 Comm: syz-executor263 Not tainted 6.2.0-rc1-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
      RIP: 0010:__alloc_pages+0x30a/0x560 mm/page_alloc.c:5525
      RSP: 0018:ffffc90003c2f1c0 EFLAGS: 00010246
      RAX: ffffc90003c2f220 RBX: 0000000000000014 RCX: 0000000000000000
      RDX: 0000000000000028 RSI: 0000000000000000 RDI: ffffc90003c2f248
      RBP: ffffc90003c2f2d8 R08: dffffc0000000000 R09: ffffc90003c2f220
      R10: fffff52000785e49 R11: 1ffff92000785e44 R12: 0000000000040d40
      R13: 1ffff92000785e40 R14: dffffc0000000000 R15: 1ffff92000785e3c
      FS:  0000555556c0d300(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f95d5e04138 CR3: 00000000793aa000 CR4: 00000000003506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       __alloc_pages_node include/linux/gfp.h:237 [inline]
       alloc_pages_node include/linux/gfp.h:260 [inline]
       __kmalloc_large_node+0x95/0x1e0 mm/slab_common.c:1113
       __do_kmalloc_node mm/slab_common.c:956 [inline]
       __kmalloc+0xfe/0x190 mm/slab_common.c:981
       kmalloc include/linux/slab.h:584 [inline]
       kzalloc include/linux/slab.h:720 [inline]
       ext4_update_inline_data+0x236/0x6b0 fs/ext4/inline.c:346
       ext4_update_inline_dir fs/ext4/inline.c:1115 [inline]
       ext4_try_add_inline_entry+0x328/0x990 fs/ext4/inline.c:1307
       ext4_add_entry+0x5a4/0xeb0 fs/ext4/namei.c:2385
       ext4_add_nondir+0x96/0x260 fs/ext4/namei.c:2772
       ext4_create+0x36c/0x560 fs/ext4/namei.c:2817
       lookup_open fs/namei.c:3413 [inline]
       open_last_lookups fs/namei.c:3481 [inline]
       path_openat+0x12ac/0x2dd0 fs/namei.c:3711
       do_filp_open+0x264/0x4f0 fs/namei.c:3741
       do_sys_openat2+0x124/0x4e0 fs/open.c:1310
       do_sys_open fs/open.c:1326 [inline]
       __do_sys_openat fs/open.c:1342 [inline]
       __se_sys_openat fs/open.c:1337 [inline]
       __x64_sys_openat+0x243/0x290 fs/open.c:1337
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Above issue happens as follows:
      ext4_iget
         ext4_find_inline_data_nolock ->i_inline_off=164 i_inline_size=60
      ext4_try_add_inline_entry
         __ext4_mark_inode_dirty
            ext4_expand_extra_isize_ea ->i_extra_isize=32 s_want_extra_isize=44
               ext4_xattr_shift_entries
      	 ->after shift i_inline_off is incorrect, actually is change to 176
      ext4_try_add_inline_entry
        ext4_update_inline_dir
          get_max_inline_xattr_value_size
            if (EXT4_I(inode)->i_inline_off)
      	entry = (struct ext4_xattr_entry *)((void *)raw_inode +
      			EXT4_I(inode)->i_inline_off);
              free += EXT4_XATTR_SIZE(le32_to_cpu(entry->e_value_size));
      	->As entry is incorrect, then 'free' may be negative
         ext4_update_inline_data
            value = kzalloc(len, GFP_NOFS);
            -> len is unsigned int, maybe very large, then trigger warning when
               'kzalloc()'
      
      To resolve the above issue we need to update 'i_inline_off' after
      'ext4_xattr_shift_entries()'.  We do not need to set
      EXT4_STATE_MAY_INLINE_DATA flag here, since ext4_mark_inode_dirty()
      already sets this flag if needed.  Setting EXT4_STATE_MAY_INLINE_DATA
      when it is needed may trigger a BUG_ON in ext4_writepages().
      
      Reported-by: default avatar <syzbot+d30838395804afc2fa6f@syzkaller.appspotmail.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarYe Bin <yebin10@huawei.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20230307015253.2232062-3-yebin@huaweicloud.com
      
      
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      92eee6a8
    • Ye Bin's avatar
      ext4: move where set the MAY_INLINE_DATA flag is set · 8aa3cb00
      Ye Bin authored
      
      commit 1dcdce59 upstream.
      
      The only caller of ext4_find_inline_data_nolock() that needs setting of
      EXT4_STATE_MAY_INLINE_DATA flag is ext4_iget_extra_inode().  In
      ext4_write_inline_data_end() we just need to update inode->i_inline_off.
      Since we are going to add one more caller that does not need to set
      EXT4_STATE_MAY_INLINE_DATA, just move setting of EXT4_STATE_MAY_INLINE_DATA
      out to ext4_iget_extra_inode().
      
      Signed-off-by: default avatarYe Bin <yebin10@huawei.com>
      Cc: stable@kernel.org
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20230307015253.2232062-2-yebin@huaweicloud.com
      
      
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8aa3cb00
    • Darrick J. Wong's avatar
      ext4: fix another off-by-one fsmap error on 1k block filesystems · 15ebade3
      Darrick J. Wong authored
      
      commit c993799b upstream.
      
      Apparently syzbot figured out that issuing this FSMAP call:
      
      struct fsmap_head cmd = {
      	.fmh_count	= ...;
      	.fmh_keys	= {
      		{ .fmr_device = /* ext4 dev */, .fmr_physical = 0, },
      		{ .fmr_device = /* ext4 dev */, .fmr_physical = 0, },
      	},
      ...
      };
      ret = ioctl(fd, FS_IOC_GETFSMAP, &cmd);
      
      Produces this crash if the underlying filesystem is a 1k-block ext4
      filesystem:
      
      kernel BUG at fs/ext4/ext4.h:3331!
      invalid opcode: 0000 [#1] PREEMPT SMP
      CPU: 3 PID: 3227965 Comm: xfs_io Tainted: G        W  O       6.2.0-rc8-achx
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
      RIP: 0010:ext4_mb_load_buddy_gfp+0x47c/0x570 [ext4]
      RSP: 0018:ffffc90007c03998 EFLAGS: 00010246
      RAX: ffff888004978000 RBX: ffffc90007c03a20 RCX: ffff888041618000
      RDX: 0000000000000000 RSI: 00000000000005a4 RDI: ffffffffa0c99b11
      RBP: ffff888012330000 R08: ffffffffa0c2b7d0 R09: 0000000000000400
      R10: ffffc90007c03950 R11: 0000000000000000 R12: 0000000000000001
      R13: 00000000ffffffff R14: 0000000000000c40 R15: ffff88802678c398
      FS:  00007fdf2020c880(0000) GS:ffff88807e100000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007ffd318a5fe8 CR3: 000000007f80f001 CR4: 00000000001706e0
      Call Trace:
       <TASK>
       ext4_mballoc_query_range+0x4b/0x210 [ext4 dfa189daddffe8fecd3cdfd00564e0f265a8ab80]
       ext4_getfsmap_datadev+0x713/0x890 [ext4 dfa189daddffe8fecd3cdfd00564e0f265a8ab80]
       ext4_getfsmap+0x2b7/0x330 [ext4 dfa189daddffe8fecd3cdfd00564e0f265a8ab80]
       ext4_ioc_getfsmap+0x153/0x2b0 [ext4 dfa189daddffe8fecd3cdfd00564e0f265a8ab80]
       __ext4_ioctl+0x2a7/0x17e0 [ext4 dfa189daddffe8fecd3cdfd00564e0f265a8ab80]
       __x64_sys_ioctl+0x82/0xa0
       do_syscall_64+0x2b/0x80
       entry_SYSCALL_64_after_hwframe+0x46/0xb0
      RIP: 0033:0x7fdf20558aff
      RSP: 002b:00007ffd318a9e30 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      RAX: ffffffffffffffda RBX: 00000000000200c0 RCX: 00007fdf20558aff
      RDX: 00007fdf1feb2010 RSI: 00000000c0c0583b RDI: 0000000000000003
      RBP: 00005625c0634be0 R08: 00005625c0634c40 R09: 0000000000000001
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007fdf1feb2010
      R13: 00005625be70d994 R14: 0000000000000800 R15: 0000000000000000
      
      For GETFSMAP calls, the caller selects a physical block device by
      writing its block number into fsmap_head.fmh_keys[01].fmr_device.
      To query mappings for a subrange of the device, the starting byte of the
      range is written to fsmap_head.fmh_keys[0].fmr_physical and the last
      byte of the range goes in fsmap_head.fmh_keys[1].fmr_physical.
      
      IOWs, to query what mappings overlap with bytes 3-14 of /dev/sda, you'd
      set the inputs as follows:
      
      	fmh_keys[0] = { .fmr_device = major(8, 0), .fmr_physical = 3},
      	fmh_keys[1] = { .fmr_device = major(8, 0), .fmr_physical = 14},
      
      Which would return you whatever is mapped in the 12 bytes starting at
      physical offset 3.
      
      The crash is due to insufficient range validation of keys[1] in
      ext4_getfsmap_datadev.  On 1k-block filesystems, block 0 is not part of
      the filesystem, which means that s_first_data_block is nonzero.
      ext4_get_group_no_and_offset subtracts this quantity from the blocknr
      argument before cracking it into a group number and a block number
      within a group.  IOWs, block group 0 spans blocks 1-8192 (1-based)
      instead of 0-8191 (0-based) like what happens with larger blocksizes.
      
      The net result of this encoding is that blocknr < s_first_data_block is
      not a valid input to this function.  The end_fsb variable is set from
      the keys that are copied from userspace, which means that in the above
      example, its value is zero.  That leads to an underflow here:
      
      	blocknr = blocknr - le32_to_cpu(es->s_first_data_block);
      
      The division then operates on -1:
      
      	offset = do_div(blocknr, EXT4_BLOCKS_PER_GROUP(sb)) >>
      		EXT4_SB(sb)->s_cluster_bits;
      
      Leaving an impossibly large group number (2^32-1) in blocknr.
      ext4_getfsmap_check_keys checked that keys[0].fmr_physical and
      keys[1].fmr_physical are in increasing order, but
      ext4_getfsmap_datadev adjusts keys[0].fmr_physical to be at least
      s_first_data_block.  This implies that we have to check it again after
      the adjustment, which is the piece that I forgot.
      
      Reported-by: default avatar <syzbot+6be2b977c89f79b6b153@syzkaller.appspotmail.com>
      Fixes: 4a495624 ("ext4: fix off-by-one fsmap error on 1k block filesystems")
      Link: https://syzkaller.appspot.com/bug?id=79d5768e9bfe362911ac1a5057a36fc6b5c30002
      
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Link: https://lore.kernel.org/r/Y+58NPTH7VNGgzdd@magnolia
      
      
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      15ebade3
    • Eric Whitney's avatar
      ext4: fix RENAME_WHITEOUT handling for inline directories · 24acd859
      Eric Whitney authored
      
      commit c9f62c8b upstream.
      
      A significant number of xfstests can cause ext4 to log one or more
      warning messages when they are run on a test file system where the
      inline_data feature has been enabled.  An example:
      
      "EXT4-fs warning (device vdc): ext4_dirblock_csum_set:425: inode
       #16385: comm fsstress: No space for directory leaf checksum. Please
      run e2fsck -D."
      
      The xfstests include: ext4/057, 058, and 307; generic/013, 051, 068,
      070, 076, 078, 083, 232, 269, 270, 390, 461, 475, 476, 482, 579, 585,
      589, 626, 631, and 650.
      
      In this situation, the warning message indicates a bug in the code that
      performs the RENAME_WHITEOUT operation on a directory entry that has
      been stored inline.  It doesn't detect that the directory is stored
      inline, and incorrectly attempts to compute a dirent block checksum on
      the whiteout inode when creating it.  This attempt fails as a result
      of the integrity checking in get_dirent_tail (usually due to a failure
      to match the EXT4_FT_DIR_CSUM magic cookie), and the warning message
      is then emitted.
      
      Fix this by simply collecting the inlined data state at the time the
      search for the source directory entry is performed.  Existing code
      handles the rest, and this is sufficient to eliminate all spurious
      warning messages produced by the tests above.  Go one step further
      and do the same in the code that resets the source directory entry in
      the event of failure.  The inlined state should be present in the
      "old" struct, but given the possibility of a race there's no harm
      in taking a conservative approach and getting that information again
      since the directory entry is being reread anyway.
      
      Fixes: b7ff91fd ("ext4: find old entry again if failed to rename whiteout")
      Cc: stable@kernel.org
      Signed-off-by: default avatarEric Whitney <enwlinux@gmail.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20230210173244.679890-1-enwlinux@gmail.com
      
      
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      24acd859
    • Eric Biggers's avatar
      ext4: fix cgroup writeback accounting with fs-layer encryption · a9e0ecc0
      Eric Biggers authored
      
      commit ffec85d5 upstream.
      
      When writing a page from an encrypted file that is using
      filesystem-layer encryption (not inline encryption), ext4 encrypts the
      pagecache page into a bounce page, then writes the bounce page.
      
      It also passes the bounce page to wbc_account_cgroup_owner().  That's
      incorrect, because the bounce page is a newly allocated temporary page
      that doesn't have the memory cgroup of the original pagecache page.
      This makes wbc_account_cgroup_owner() not account the I/O to the owner
      of the pagecache page as it should.
      
      Fix this by always passing the pagecache page to
      wbc_account_cgroup_owner().
      
      Fixes: 001e4a87 ("ext4: implement cgroup writeback support")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Link: https://lore.kernel.org/r/20230203005503.141557-1-ebiggers@kernel.org
      
      
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a9e0ecc0
    • Gao Xiang's avatar
      erofs: fix wrong kunmap when using LZMA on HIGHMEM platforms · 28aea8ae
      Gao Xiang authored
      
      commit 8f121dfb upstream.
      
      As the call trace shown, the root cause is kunmap incorrect pages:
      
       BUG: kernel NULL pointer dereference, address: 00000000
       CPU: 1 PID: 40 Comm: kworker/u5:0 Not tainted 6.2.0-rc5 #4
       Workqueue: erofs_worker z_erofs_decompressqueue_work
       EIP: z_erofs_lzma_decompress+0x34b/0x8ac
        z_erofs_decompress+0x12/0x14
        z_erofs_decompress_queue+0x7e7/0xb1c
        z_erofs_decompressqueue_work+0x32/0x60
        process_one_work+0x24b/0x4d8
        ? process_one_work+0x1a4/0x4d8
        worker_thread+0x14c/0x3fc
        kthread+0xe6/0x10c
        ? rescuer_thread+0x358/0x358
        ? kthread_complete_and_exit+0x18/0x18
        ret_from_fork+0x1c/0x28
       ---[ end trace 0000000000000000 ]---
      
      The bug is trivial and should be fixed now.  It has no impact on
      !HIGHMEM platforms.
      
      Fixes: 622ceadd ("erofs: lzma compression support")
      Cc: <stable@vger.kernel.org> # 5.16+
      Reviewed-by: default avatarYue Hu <huyue2@coolpad.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarGao Xiang <hsiangkao@linux.alibaba.com>
      Link: https://lore.kernel.org/r/20230305134455.88236-1-hsiangkao@linux.alibaba.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      28aea8ae
    • Filipe Manana's avatar
      btrfs: fix block group item corruption after inserting new block group · ce5f68fb
      Filipe Manana authored
      
      commit 675dfe12 upstream.
      
      We can often end up inserting a block group item, for a new block group,
      with a wrong value for the used bytes field.
      
      This happens if for the new allocated block group, in the same transaction
      that created the block group, we have tasks allocating extents from it as
      well as tasks removing extents from it.
      
      For example:
      
      1) Task A creates a metadata block group X;
      
      2) Two extents are allocated from block group X, so its "used" field is
         updated to 32K, and its "commit_used" field remains as 0;
      
      3) Transaction commit starts, by some task B, and it enters
         btrfs_start_dirty_block_groups(). There it tries to update the block
         group item for block group X, which currently has its "used" field with
         a value of 32K. But that fails since the block group item was not yet
         inserted, and so on failure update_block_group_item() sets the
         "commit_used" field of the block group back to 0;
      
      4) The block group item is inserted by task A, when for example
         btrfs_create_pending_block_groups() is called when releasing its
         transaction handle. This results in insert_block_group_item() inserting
         the block group item in the extent tree (or block group tree), with a
         "used" field having a value of 32K, but without updating the
         "commit_used" field in the block group, which remains with value of 0;
      
      5) The two extents are freed from block X, so its "used" field changes
         from 32K to 0;
      
      6) The transaction commit by task B continues, it enters
         btrfs_write_dirty_block_groups() which calls update_block_group_item()
         for block group X, and there it decides to skip the block group item
         update, because "used" has a value of 0 and "commit_used" has a value
         of 0 too.
      
         As a result, we end up with a block item having a 32K "used" field but
         no extents allocated from it.
      
      When this issue happens, a btrfs check reports an error like this:
      
         [1/7] checking root items
         [2/7] checking extents
         block group [1104150528 1073741824] used 39796736 but extent items used 0
         ERROR: errors found in extent allocation tree or chunk allocation
         (...)
      
      Fix this by making insert_block_group_item() update the block group's
      "commit_used" field.
      
      Fixes: 7248e0ce ("btrfs: skip update of block group item if used bytes are the same")
      CC: stable@vger.kernel.org # 6.2+
      Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ce5f68fb
    • Johannes Thumshirn's avatar
      btrfs: fix percent calculation for bg reclaim message · f0097196
      Johannes Thumshirn authored
      
      commit 95cd356c upstream.
      
      We have a report, that the info message for block-group reclaim is
      crossing the 100% used mark.
      
      This is happening as we were truncating the divisor for the division
      (the block_group->length) to a 32bit value.
      
      Fix this by using div64_u64() to not truncate the divisor.
      
      In the worst case, it can lead to a div by zero error and should be
      possible to trigger on 4 disks RAID0, and each device is large enough:
      
        $ mkfs.btrfs  -f /dev/test/scratch[1234] -m raid1 -d raid0
        btrfs-progs v6.1
        [...]
        Filesystem size:    40.00GiB
        Block group profiles:
          Data:             RAID0             4.00GiB <<<
          Metadata:         RAID1           256.00MiB
          System:           RAID1             8.00MiB
      
      Reported-by: default avatarForza <forza@tnonline.net>
      Link: https://lore.kernel.org/linux-btrfs/e99483.c11a58d.1863591ca52@tnonline.net/
      
      
      Fixes: 5f93e776 ("btrfs: zoned: print unusable percentage when reclaiming block groups")
      CC: stable@vger.kernel.org # 5.15+
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      [ add Qu's note ]
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f0097196
    • Naohiro Aota's avatar
      btrfs: fix unnecessary increment of read error stat on write error · ee637a45
      Naohiro Aota authored
      
      commit 98e8d36a upstream.
      
      Current btrfs_log_dev_io_error() increases the read error count even if the
      erroneous IO is a WRITE request. This is because it forget to use "else
      if", and all the error WRITE requests counts as READ error as there is (of
      course) no REQ_RAHEAD bit set.
      
      Fixes: c3a62baf ("btrfs: use chained bios when cloning")
      CC: stable@vger.kernel.org # 6.1+
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarNaohiro Aota <naohiro.aota@wdc.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ee637a45
    • Theodore Ts'o's avatar
      fs: prevent out-of-bounds array speculation when closing a file descriptor · eea8e4e0
      Theodore Ts'o authored
      
      commit 609d5444 upstream.
      
      Google-Bug-Id: 114199369
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eea8e4e0
  3. Mar 11, 2023
Loading