Skip to content
Snippets Groups Projects
  1. Aug 02, 2022
  2. Jul 22, 2022
  3. Jul 14, 2022
  4. Jul 06, 2022
  5. Jun 26, 2022
    • Christian Brauner's avatar
      attr: port attribute changes to new types · b27c82e1
      Christian Brauner authored
      Now that we introduced new infrastructure to increase the type safety
      for filesystems supporting idmapped mounts port the first part of the
      vfs over to them.
      
      This ports the attribute changes codepaths to rely on the new better
      helpers using a dedicated type.
      
      Before this change we used to take a shortcut and place the actual
      values that would be written to inode->i_{g,u}id into struct iattr. This
      had the advantage that we moved idmappings mostly out of the picture
      early on but it made reasoning about changes more difficult than it
      should be.
      
      The filesystem was never explicitly told that it dealt with an idmapped
      mount. The transition to the value that needed to be stored in
      inode->i_{g,u}id appeared way too early and increased the probability of
      bugs in various codepaths.
      
      We know place the same value in struct iattr no matter if this is an
      idmapped mount or not. The vfs will only deal with type safe
      vfs{g,u}id_t. This makes it massively safer to perform permission checks
      as the type will tell us what checks we need to perform and what helpers
      we need to use.
      
      Fileystems raising FS_ALLOW_IDMAP can't simply write ia_vfs{g,u}id to
      inode->i_{g,u}id since they are different types. Instead they need to
      use the dedicated vfs{g,u}id_to_k{g,u}id() helpers that map the
      vfs{g,u}id into the filesystem.
      
      The other nice effect is that filesystems like overlayfs don't need to
      care about idmappings explicitly anymore and can simply set up struct
      iattr accordingly directly.
      
      Link: https://lore.kernel.org/lkml/CAHk-=win6+ahs1EwLkcq8apqLi_1wXFWbrPf340zYEhObpz4jA@mail.gmail.com [1]
      Link: https://lore.kernel.org/r/20220621141454.2914719-9-brauner@kernel.org
      
      
      Cc: Seth Forshee <sforshee@digitalocean.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Aleksa Sarai <cyphar@cyphar.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      CC: linux-fsdevel@vger.kernel.org
      Reviewed-by: default avatarSeth Forshee <sforshee@digitalocean.com>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      b27c82e1
    • Christian Brauner's avatar
      quota: port quota helpers mount ids · 71e7b535
      Christian Brauner authored
      Port the is_quota_modification() and dqout_transfer() helper to type
      safe vfs{g,u}id_t. Since these helpers are only called by a few
      filesystems don't introduce a new helper but simply extend the existing
      helpers to pass down the mount's idmapping.
      
      Note, that this is a non-functional change, i.e. nothing will have
      happened here or at the end of this series to how quota are done! This
      a change necessary because we will at the end of this series make
      ownership changes easier to reason about by keeping the original value
      in struct iattr for both non-idmapped and idmapped mounts.
      
      For now we always pass the initial idmapping which makes the idmapping
      functions these helpers call nops.
      
      This is done because we currently always pass the actual value to be
      written to i_{g,u}id via struct iattr. While this allowed us to treat
      the {g,u}id values in struct iattr as values that can be directly
      written to inode->i_{g,u}id it also increases the potential for
      confusion for filesystems.
      
      Now that we are have dedicated types to prevent this confusion we will
      ultimately only map the value from the idmapped mount into a filesystem
      value that can be written to inode->i_{g,u}id when the filesystem
      actually updates the inode. So pass down the initial idmapping until we
      finished that conversion at which point we pass down the mount's
      idmapping.
      
      Since struct iattr uses an anonymous union with overlapping types as
      supported by the C standard, filesystems that haven't converted to
      ia_vfs{g,u}id won't see any difference and things will continue to work
      as before. In other words, no functional changes intended with this
      change.
      
      Link: https://lore.kernel.org/r/20220621141454.2914719-7-brauner@kernel.org
      
      
      Cc: Seth Forshee <sforshee@digitalocean.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Aleksa Sarai <cyphar@cyphar.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      CC: linux-fsdevel@vger.kernel.org
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarSeth Forshee <sforshee@digitalocean.com>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      71e7b535
  6. Jun 10, 2022
  7. Jun 08, 2022
    • Damien Le Moal's avatar
      zonefs: fix zonefs_iomap_begin() for reads · c1c1204c
      Damien Le Moal authored
      
      If a readahead is issued to a sequential zone file with an offset
      exactly equal to the current file size, the iomap type is set to
      IOMAP_UNWRITTEN, which will prevent an IO, but the iomap length is
      calculated as 0. This causes a WARN_ON() in iomap_iter():
      
      [17309.548939] WARNING: CPU: 3 PID: 2137 at fs/iomap/iter.c:34 iomap_iter+0x9cf/0xe80
      [...]
      [17309.650907] RIP: 0010:iomap_iter+0x9cf/0xe80
      [...]
      [17309.754560] Call Trace:
      [17309.757078]  <TASK>
      [17309.759240]  ? lock_is_held_type+0xd8/0x130
      [17309.763531]  iomap_readahead+0x1a8/0x870
      [17309.767550]  ? iomap_read_folio+0x4c0/0x4c0
      [17309.771817]  ? lockdep_hardirqs_on_prepare+0x400/0x400
      [17309.778848]  ? lock_release+0x370/0x750
      [17309.784462]  ? folio_add_lru+0x217/0x3f0
      [17309.790220]  ? reacquire_held_locks+0x4e0/0x4e0
      [17309.796543]  read_pages+0x17d/0xb60
      [17309.801854]  ? folio_add_lru+0x238/0x3f0
      [17309.807573]  ? readahead_expand+0x5f0/0x5f0
      [17309.813554]  ? policy_node+0xb5/0x140
      [17309.819018]  page_cache_ra_unbounded+0x27d/0x450
      [17309.825439]  filemap_get_pages+0x500/0x1450
      [17309.831444]  ? filemap_add_folio+0x140/0x140
      [17309.837519]  ? lock_is_held_type+0xd8/0x130
      [17309.843509]  filemap_read+0x28c/0x9f0
      [17309.848953]  ? zonefs_file_read_iter+0x1ea/0x4d0 [zonefs]
      [17309.856162]  ? trace_contention_end+0xd6/0x130
      [17309.862416]  ? __mutex_lock+0x221/0x1480
      [17309.868151]  ? zonefs_file_read_iter+0x166/0x4d0 [zonefs]
      [17309.875364]  ? filemap_get_pages+0x1450/0x1450
      [17309.881647]  ? __mutex_unlock_slowpath+0x15e/0x620
      [17309.888248]  ? wait_for_completion_io_timeout+0x20/0x20
      [17309.895231]  ? lock_is_held_type+0xd8/0x130
      [17309.901115]  ? lock_is_held_type+0xd8/0x130
      [17309.906934]  zonefs_file_read_iter+0x356/0x4d0 [zonefs]
      [17309.913750]  new_sync_read+0x2d8/0x520
      [17309.919035]  ? __x64_sys_lseek+0x1d0/0x1d0
      
      Furthermore, this causes iomap_readahead() to loop forever as
      iomap_readahead_iter() always returns 0, making no progress.
      
      Fix this by treating reads after the file size as access to holes,
      setting the iomap type to IOMAP_HOLE, the iomap addr to IOMAP_NULL_ADDR
      and using the length argument as is for the iomap length. To simplify
      the code with this change, zonefs_iomap_begin() is split into the read
      variant, zonefs_read_iomap_begin() and zonefs_read_iomap_ops, and the
      write variant, zonefs_write_iomap_begin() and zonefs_write_iomap_ops.
      
      Reported-by: default avatarJorgen Hansen <Jorgen.Hansen@wdc.com>
      Fixes: 8dcc1a9d ("fs: New zonefs file system")
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@opensource.wdc.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: default avatarJorgen Hansen <Jorgen.Hansen@wdc.com>
      c1c1204c
    • Damien Le Moal's avatar
      zonefs: Do not ignore explicit_open with active zone limit · 96eca145
      Damien Le Moal authored
      
      A zoned device may have no limit on the number of open zones but may
      have a limit on the number of active zones it can support. In such
      case, the explicit_open mount option should not be ignored to ensure
      that the open() system call activates the zone with an explicit zone
      open command, thus guaranteeing that the zone can be written.
      
      Enforce this by ignoring the explicit_open mount option only for
      devices that have both the open and active zone limits equal to 0.
      
      Fixes: 87c9ce3f ("zonefs: Add active seq file accounting")
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@opensource.wdc.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      96eca145
    • Damien Le Moal's avatar
      zonefs: fix handling of explicit_open option on mount · a2a513be
      Damien Le Moal authored
      
      Ignoring the explicit_open mount option on mount for devices that do not
      have a limit on the number of open zones must be done after the mount
      options are parsed and set in s_mount_opts. Move the check to ignore
      the explicit_open option after the call to zonefs_parse_options() in
      zonefs_fill_super().
      
      Fixes: b5c00e97 ("zonefs: open/close zone on file open/close")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@opensource.wdc.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      a2a513be
  8. May 24, 2022
  9. May 16, 2022
  10. May 10, 2022
  11. May 09, 2022
  12. Apr 20, 2022
    • Damien Le Moal's avatar
      zonefs: Fix management of open zones · 1da18a29
      Damien Le Moal authored
      
      The mount option "explicit_open" manages the device open zone
      resources to ensure that if an application opens a sequential file for
      writing, the file zone can always be written by explicitly opening
      the zone and accounting for that state with the s_open_zones counter.
      
      However, if some zones are already open when mounting, the device open
      zone resource usage status will be larger than the initial s_open_zones
      value of 0. Ensure that this inconsistency does not happen by closing
      any sequential zone that is open when mounting.
      
      Furthermore, with ZNS drives, closing an explicitly open zone that has
      not been written will change the zone state to "closed", that is, the
      zone will remain in an active state. Since this can then cause failures
      of explicit open operations on other zones if the drive active zone
      resources are exceeded, we need to make sure that the zone is not
      active anymore by resetting it instead of closing it. To address this,
      zonefs_zone_mgmt() is modified to change a REQ_OP_ZONE_CLOSE request
      into a REQ_OP_ZONE_RESET for sequential zones that have not been
      written.
      
      Fixes: b5c00e97 ("zonefs: open/close zone on file open/close")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@opensource.wdc.com>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: default avatarHans Holmberg <hans.holmberg@wdc.com>
      1da18a29
    • Damien Le Moal's avatar
      zonefs: Clear inode information flags on inode creation · 694852ea
      Damien Le Moal authored
      
      Ensure that the i_flags field of struct zonefs_inode_info is cleared to
      0 when initializing a zone file inode, avoiding seeing the flag
      ZONEFS_ZONE_OPEN being incorrectly set.
      
      Fixes: b5c00e97 ("zonefs: open/close zone on file open/close")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@opensource.wdc.com>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: default avatarChaitanya Kulkarni <kch@nvidia.com>
      Reviewed-by: default avatarHans Holmberg <hans.holmberg@wdc.com>
      694852ea
    • Damien Le Moal's avatar
      zonefs: Add active seq file accounting · 87c9ce3f
      Damien Le Moal authored
      
      Modify struct zonefs_sb_info to add the s_active_seq_files atomic to
      count the number of seq files representing a zone that is partially
      written or explicitly open, that is, to count sequential files with
      a zone that is in an active state on the device.
      
      The helper function zonefs_account_active() is introduced to update
      this counter whenever a file is written or truncated. This helper is
      also used in the zonefs_seq_file_write_open() and
      zonefs_seq_file_write_close() functions when the explicit_open mount
      option is used.
      
      The s_active_seq_files counter is exported through sysfs using the
      read-only attribute nr_active_seq_files. The device maximum number of
      active zones is also exported through sysfs with the read-only attribute
      max_active_seq_files.
      
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@opensource.wdc.com>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: default avatarHans Holmberg <hans.holmberg@wdc.com>
      87c9ce3f
    • Damien Le Moal's avatar
      zonefs: Export open zone resource information through sysfs · 9277a6d4
      Damien Le Moal authored
      
      To allow applications to easily check the current usage status of the
      open zone resources of the mounted device, export through sysfs the
      counter of write open sequential files s_wro_seq_files field of
      struct zonefs_sb_info. The attribute is named nr_wro_seq_files and is
      read only.
      
      The maximum number of write open sequential files (zones) indicated by
      the s_max_wro_seq_files field of struct zonefs_sb_info is also exported
      as the read only attribute max_wro_seq_files.
      
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@opensource.wdc.com>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: default avatarHans Holmberg <hans.holmberg@wdc.com>
      9277a6d4
    • Damien Le Moal's avatar
      zonefs: Always do seq file write open accounting · 7d6dfbe0
      Damien Le Moal authored
      
      The explicit_open mount option forces an explicitly open of the zone of
      sequential files that are open for writing to ensure that the open file
      can be written without the device failing write operations due to open
      zone resources limit being exceeded. To implement this, zonefs accounts
      all write open seq file when this mount option is used.
      
      This accounting however can be easily performed even when the
      explicit_open mount option is not used, thus allowing applications to
      control zone resources on their own, without relying on open() system
      call failures from zonefs.
      
      To implement this, the helper zonefs_file_use_exp_open() is removed and
      replaced with the helper zonefs_seq_file_need_wro() which test if a file
      is a sequential file being open with write access. zonefs_open_zone()
      and zonefs_close_zone() are renamed respectively to
      zonefs_seq_file_write_open() and zonefs_seq_file_write_close() and
      modified to update the s_wro_seq_files counter regardless of the
      explicit_open mount option use.
      
      If the explicit_open mount option is used, zonefs_seq_file_write_open()
      execute an explicit zone open operation for a sequential file open for
      writing for the first time, as before.
      
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@opensource.wdc.com>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: default avatarHans Holmberg <hans.holmberg@wdc.com>
      7d6dfbe0
    • Damien Le Moal's avatar
      zonefs: Rename super block information fields · 2b95a23c
      Damien Le Moal authored
      
      The s_open_zones field of struct zonefs_sb_info is used to count the
      number of files that are open for writing and may not necessarilly
      correspond to the number of open zones on the device. For instance, an
      application may open for writing a sequential zone file, fully write it
      and keep the file open. In such case, the zone of the file is not open
      anymore (it is in the full state).
      
      Avoid confusion about this counter meaning by renaming it to
      s_wro_seq_files. To keep things consistent, the field s_max_open_zones
      is renamed to s_max_wro_seq_files.
      
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@opensource.wdc.com>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: default avatarHans Holmberg <hans.holmberg@wdc.com>
      2b95a23c
    • Damien Le Moal's avatar
      zonefs: Fix management of open zones · 19139539
      Damien Le Moal authored
      
      The mount option "explicit_open" manages the device open zone
      resources to ensure that if an application opens a sequential file for
      writing, the file zone can always be written by explicitly opening
      the zone and accounting for that state with the s_open_zones counter.
      
      However, if some zones are already open when mounting, the device open
      zone resource usage status will be larger than the initial s_open_zones
      value of 0. Ensure that this inconsistency does not happen by closing
      any sequential zone that is open when mounting.
      
      Furthermore, with ZNS drives, closing an explicitly open zone that has
      not been written will change the zone state to "closed", that is, the
      zone will remain in an active state. Since this can then cause failures
      of explicit open operations on other zones if the drive active zone
      resources are exceeded, we need to make sure that the zone is not
      active anymore by resetting it instead of closing it. To address this,
      zonefs_zone_mgmt() is modified to change a REQ_OP_ZONE_CLOSE request
      into a REQ_OP_ZONE_RESET for sequential zones that have not been
      written.
      
      Fixes: b5c00e97 ("zonefs: open/close zone on file open/close")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@opensource.wdc.com>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: default avatarHans Holmberg <hans.holmberg@wdc.com>
      19139539
    • Damien Le Moal's avatar
      zonefs: Clear inode information flags on inode creation · b954ebba
      Damien Le Moal authored
      
      Ensure that the i_flags field of struct zonefs_inode_info is cleared to
      0 when initializing a zone file inode, avoiding seeing the flag
      ZONEFS_ZONE_OPEN being incorrectly set.
      
      Fixes: b5c00e97 ("zonefs: open/close zone on file open/close")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@opensource.wdc.com>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: default avatarChaitanya Kulkarni <kch@nvidia.com>
      Reviewed-by: default avatarHans Holmberg <hans.holmberg@wdc.com>
      b954ebba
  13. Apr 18, 2022
  14. Mar 22, 2022
  15. Mar 15, 2022
  16. Mar 07, 2022
  17. Feb 02, 2022
  18. Dec 17, 2021
  19. Oct 24, 2021
    • Andreas Gruenbacher's avatar
      iomap: Add done_before argument to iomap_dio_rw · 4fdccaa0
      Andreas Gruenbacher authored
      
      Add a done_before argument to iomap_dio_rw that indicates how much of
      the request has already been transferred.  When the request succeeds, we
      report that done_before additional bytes were tranferred.  This is
      useful for finishing a request asynchronously when part of the request
      has already been completed synchronously.
      
      We'll use that to allow iomap_dio_rw to be used with page faults
      disabled: when a page fault occurs while submitting a request, we
      synchronously complete the part of the request that has already been
      submitted.  The caller can then take care of the page fault and call
      iomap_dio_rw again for the rest of the request, passing in the number of
      bytes already tranferred.
      
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      4fdccaa0
  20. Oct 18, 2021
  21. Jul 16, 2021
  22. Jul 13, 2021
  23. Jun 29, 2021
  24. Apr 19, 2021
  25. Mar 16, 2021
Loading