1. 02 Nov, 2018 1 commit
  2. 25 Oct, 2018 3 commits
    • Damien Le Moal's avatar
      block: Introduce blk_revalidate_disk_zones() · bf505456
      Damien Le Moal authored
      Drivers exposing zoned block devices have to initialize and maintain
      correctness (i.e. revalidate) of the device zone bitmaps attached to
      the device request queue (seq_zones_bitmap and seq_zones_wlock).
      
      To simplify coding this, introduce a generic helper function
      blk_revalidate_disk_zones() suitable for most (and likely all) cases.
      This new function always update the seq_zones_bitmap and seq_zones_wlock
      bitmaps as well as the queue nr_zones field when called for a disk
      using a request based queue. For a disk using a BIO based queue, only
      the number of zones is updated since these queues do not have
      schedulers and so do not need the zone bitmaps.
      
      With this change, the zone bitmap initialization code in sd_zbc.c can be
      replaced with a call to this function in sd_zbc_read_zones(), which is
      called from the disk revalidate block operation method.
      
      A call to blk_revalidate_disk_zones() is also added to the null_blk
      driver for devices created with the zoned mode enabled.
      
      Finally, to ensure that zoned devices created with dm-linear or
      dm-flakey expose the correct number of zones through sysfs, a call to
      blk_revalidate_disk_zones() is added to dm_table_set_restrictions().
      
      The zone bitmaps allocated and initialized with
      blk_revalidate_disk_zones() are freed automatically from
      __blk_release_queue() using the block internal function
      blk_queue_free_zone_bitmaps().
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      bf505456
    • Christoph Hellwig's avatar
      block: add a report_zones method · e76239a3
      Christoph Hellwig authored
      Dispatching a report zones command through the request queue is a major
      pain due to the command reply payload rewriting necessary. Given that
      blkdev_report_zones() is executing everything synchronously, implement
      report zones as a block device file operation instead, allowing major
      simplification of the code in many places.
      
      sd, null-blk, dm-linear and dm-flakey being the only block device
      drivers supporting exposing zoned block devices, these drivers are
      modified to provide the device side implementation of the
      report_zones() block device file operation.
      
      For device mappers, a new report_zones() target type operation is
      defined so that the upper block layer calls blkdev_report_zones() can
      be propagated down to the underlying devices of the dm targets.
      Implementation for this new operation is added to the dm-linear and
      dm-flakey targets.
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      [Damien]
      * Changed method block_device argument to gendisk
      * Various bug fixes and improvements
      * Added support for null_blk, dm-linear and dm-flakey.
      Reviewed-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e76239a3
    • Damien Le Moal's avatar
      block: Introduce blkdev_nr_zones() helper · a91e1380
      Damien Le Moal authored
      Introduce the blkdev_nr_zones() helper function to get the total
      number of zones of a zoned block device. This number is always 0 for a
      regular block device (q->limits.zoned == BLK_ZONED_NONE case).
      
      Replace hard-coded number of zones calculation in dmz_get_zoned_device()
      with a call to this helper.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      a91e1380
  3. 22 Oct, 2018 3 commits
  4. 18 Oct, 2018 14 commits
    • Damien Le Moal's avatar
      dm zoned: fix various dmz_get_mblock() issues · 3d4e7383
      Damien Le Moal authored
      dmz_fetch_mblock() called from dmz_get_mblock() has a race since the
      allocation of the new metadata block descriptor and its insertion in
      the cache rbtree with the READING state is not atomic. Two different
      contexts requesting the same block may end up each adding two different
      descriptors of the same block to the cache.
      
      Another problem for this function is that the BIO for processing the
      block read is allocated after the metadata block descriptor is inserted
      in the cache rbtree. If the BIO allocation fails, the metadata block
      descriptor is freed without first being removed from the rbtree.
      
      Fix the first problem by checking again if the requested block is not in
      the cache right before inserting the newly allocated descriptor,
      atomically under the mblk_lock spinlock. The second problem is fixed by
      simply allocating the BIO before inserting the new block in the cache.
      
      Finally, since dmz_fetch_mblock() also increments a block reference
      counter, rename the function to dmz_get_mblock_slow(). To be symmetric
      and clear, also rename dmz_lookup_mblock() to dmz_get_mblock_fast() and
      increment the block reference counter directly in that function rather
      than in dmz_get_mblock().
      
      Fixes: 3b1a94c8 ("dm zoned: drive-managed zoned block device target")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      3d4e7383
    • Damien Le Moal's avatar
      dm zoned: fix metadata block ref counting · 33c2865f
      Damien Le Moal authored
      Since the ref field of struct dmz_mblock is always used with the
      spinlock of struct dmz_metadata locked, there is no need to use an
      atomic_t type. Change the type of the ref field to an unsigne
      integer.
      
      Fixes: 3b1a94c8 ("dm zoned: drive-managed zoned block device target")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      33c2865f
    • Heinz Mauelshagen's avatar
      dm raid: avoid bitmap with raid4/5/6 journal device · d857ad75
      Heinz Mauelshagen authored
      With raid4/5/6, journal device and write intent bitmap are mutually exclusive.
      Signed-off-by: default avatarHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      d857ad75
    • Guoqing Jiang's avatar
      md-cluster: remove suspend_info · ea89238c
      Guoqing Jiang authored
      Previously, we allow multiple nodes can resync device, but we
      had changed it to only support one node can do resync at one
      time, but suspend_info is still used.
      
      Now, let's remove the structure and use suspend_lo/hi to record
      the range.
      Reviewed-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      ea89238c
    • Guoqing Jiang's avatar
      md-cluster: send BITMAP_NEEDS_SYNC message if reshaping is interrupted · cb9ee154
      Guoqing Jiang authored
      We need to continue the reshaping if it was interrupted in
      original node. So original node should call resync_bitmap
      in case reshaping is aborted.
      
      Then BITMAP_NEEDS_SYNC message is broadcasted to other nodes,
      node which continues the reshaping should restart reshape from
      mddev->reshape_position instead of from the first beginning.
      Reviewed-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      cb9ee154
    • Guoqing Jiang's avatar
      md-cluster/bitmap: don't call md_bitmap_sync_with_cluster during reshaping stage · cbce6863
      Guoqing Jiang authored
      When reshape is happening in one node, other nodes could receive
      lots of RESYNCING messages, so md_bitmap_sync_with_cluster is called.
      
      Since the resyncing window is typically small in these RESYNCING
      messages, so WARN is always triggered, so we should not call the
      func when reshape is happening.
      Reviewed-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      cbce6863
    • Guoqing Jiang's avatar
      md-cluster/raid10: don't call remove_and_add_spares during reshaping stage · ca1e98e0
      Guoqing Jiang authored
      remove_and_add_spares is not needed if reshape is
      happening in another node, because raid10_add_disk
      called inside raid10_start_reshape would handle the
      role changes of disk. Plus, remove_and_add_spares
      can't deal with the role change due to reshape.
      Reviewed-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      ca1e98e0
    • Guoqing Jiang's avatar
      md-cluster/raid10: call update_size in md_reap_sync_thread · aefb2e5f
      Guoqing Jiang authored
      We need to change the capacity in all nodes after one node
      finishs reshape. And as we did before, we can't change the
      capacity directly in md_do_sync, instead, the capacity should
      be only changed in update_size or received CHANGE_CAPACITY
      msg.
      
      So master node calls update_size after completes reshape in
      md_reap_sync_thread, but we need to skip ops->update_size if
      MD_CLOSING is set since reshaping could not be finish.
      Reviewed-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      aefb2e5f
    • Guoqing Jiang's avatar
      md-cluster: introduce resync_info_get interface for sanity check · 5ebaf80b
      Guoqing Jiang authored
      Since the resync region from suspend_info means one node
      is reshaping this area, so the position of reshape_progress
      should be included in the area.
      Reviewed-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      5ebaf80b
    • Guoqing Jiang's avatar
      md-cluster/raid10: support add disk under grow mode · 7564beda
      Guoqing Jiang authored
      For clustered raid10 scenario, we need to let all the nodes
      know about that a new disk is added to the array, and the
      reshape caused by add new member just need to be happened in
      one node, but other nodes should know about the change.
      
      Since reshape means read data from somewhere (which is already
      used by array) and write data to unused region. Obviously, it
      is awful if one node is reading data from address while another
      node is writing to the same address. Considering we have
      implemented suspend writes in the resyncing area, so we can
      just broadcast the reading address to other nodes to avoid the
      trouble.
      
      For master node, it would call reshape_request then update sb
      during the reshape period. To avoid above trouble, we call
      resync_info_update to send RESYNC message in reshape_request.
      
      Then from slave node's view, it receives two type messages:
      1. RESYNCING message
      Slave node add the address (where master node reading data from)
      to suspend list.
      
      2. METADATA_UPDATED message
      Once slave nodes know the reshaping is started in master node,
      it is time to update reshape position and call start_reshape to
      follow master node's step. After reshape is done, only reshape
      position is need to be updated, so the majority task of reshaping
      is happened on the master node.
      Reviewed-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      7564beda
    • Guoqing Jiang's avatar
      md-cluster/raid10: resize all the bitmaps before start reshape · afd75628
      Guoqing Jiang authored
      To support add disk under grow mode, we need to resize
      all the bitmaps of each node before reshape, so that we
      can ensure all nodes have the same view of the bitmap of
      the clustered raid.
      
      So after the master node resized the bitmap, it broadcast
      a message to other slave nodes, and it checks the size of
      each bitmap are same or not by compare pages. We can only
      continue the reshaping after all nodes update the bitmap
      to the same size (by checking the pages), otherwise revert
      bitmap size to previous value.
      
      The resize_bitmaps interface and BITMAP_RESIZE message are
      introduced in md-cluster.c for the purpose.
      Reviewed-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      afd75628
    • Michał Mirosław's avatar
      dm crypt: make workqueue names device-specific · ed0302e8
      Michał Mirosław authored
      Make cpu-usage debugging easier by naming workqueues per device.
      
      Example ps output:
      
      root       413  0.0  0.0      0     0 ?        I<   paź02   0:00  [kcryptd_io/253:0]
      root       414  0.0  0.0      0     0 ?        I<   paź02   0:00  [kcryptd/253:0]
      root       415  0.0  0.0      0     0 ?        S    paź02   1:10  [dmcrypt_write/253:0]
      root       465  0.0  0.0      0     0 ?        I<   paź02   0:00  [kcryptd_io/253:2]
      root       466  0.0  0.0      0     0 ?        I<   paź02   0:00  [kcryptd/253:2]
      root       467  0.0  0.0      0     0 ?        S    paź02   2:06  [dmcrypt_write/253:2]
      root     15359  0.2  0.0      0     0 ?        I<   19:43   0:25  [kworker/u17:8-kcryptd/253:0]
      root     16563  0.2  0.0      0     0 ?        I<   20:10   0:18  [kworker/u17:0-kcryptd/253:2]
      root     23205  0.1  0.0      0     0 ?        I<   21:21   0:04  [kworker/u17:4-kcryptd/253:0]
      root     13383  0.1  0.0      0     0 ?        I<   21:32   0:02  [kworker/u17:2-kcryptd/253:2]
      root      2610  0.1  0.0      0     0 ?        I<   21:42   0:01  [kworker/u17:12-kcryptd/253:2]
      root     20124  0.1  0.0      0     0 ?        I<   21:56   0:01  [kworker/u17:1-kcryptd/253:2]
      Signed-off-by: default avatarMichał Mirosław <mirq-linux@rere.qmqm.pl>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      ed0302e8
    • Michał Mirosław's avatar
      dm: add dm_table_device_name() · f349b0a3
      Michał Mirosław authored
      Add a shortcut for dm_device_name(dm_table_get_md(t)).
      Signed-off-by: default avatarMichał Mirosław <mirq-linux@rere.qmqm.pl>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      f349b0a3
    • Wenwen Wang's avatar
      dm ioctl: harden copy_params()'s copy_from_user() from malicious users · 800a7340
      Wenwen Wang authored
      In copy_params(), the struct 'dm_ioctl' is first copied from the user
      space buffer 'user' to 'param_kernel' and the field 'data_size' is
      checked against 'minimum_data_size' (size of 'struct dm_ioctl' payload
      up to its 'data' member).  If the check fails, an error code EINVAL will be
      returned.  Otherwise, param_kernel->data_size is used to do a second copy,
      which copies from the same user-space buffer to 'dmi'.  After the second
      copy, only 'dmi->data_size' is checked against 'param_kernel->data_size'.
      Given that the buffer 'user' resides in the user space, a malicious
      user-space process can race to change the content in the buffer between
      the two copies.  This way, the attacker can inject inconsistent data
      into 'dmi' (versus previously validated 'param_kernel').
      
      Fix redundant copying of 'minimum_data_size' from user-space buffer by
      using the first copy stored in 'param_kernel'.  Also remove the
      'data_size' check after the second copy because it is now unnecessary.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarWenwen Wang <wang6495@umn.edu>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      800a7340
  5. 16 Oct, 2018 3 commits
  6. 15 Oct, 2018 1 commit
  7. 11 Oct, 2018 4 commits
  8. 10 Oct, 2018 3 commits
  9. 09 Oct, 2018 2 commits
    • Damien Le Moal's avatar
      dm: fix report zone remapping to account for partition offset · 9864cd5d
      Damien Le Moal authored
      If dm-linear or dm-flakey are layered on top of a partition of a zoned
      block device, remapping of the start sector and write pointer position
      of the zones reported by a report zones BIO must be modified to account
      for the target table entry mapping (start offset within the device and
      entry mapping with the dm device).  If the target's backing device is a
      partition of a whole disk, the start sector on the physical device of
      the partition must also be accounted for when modifying the zone
      information.  However, dm_remap_zone_report() was not considering this
      last case, resulting in incorrect zone information remapping with
      targets using disk partitions.
      
      Fix this by calculating the target backing device start sector using
      the position of the completed report zones BIO and the unchanged
      position and size of the original report zone BIO. With this value
      calculated, the start sector and write pointer position of the target
      zones can be correctly remapped.
      
      Fixes: 10999307 ("dm: introduce dm_remap_zone_report()")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      9864cd5d
    • Shenghui Wang's avatar
      dm cache: destroy migration_cache if cache target registration failed · c7cd5550
      Shenghui Wang authored
      Commit 7e6358d2 ("dm: fix various targets to dm_register_target
      after module __init resources created") inadvertently introduced this
      bug when it moved dm_register_target() after the call to KMEM_CACHE().
      
      Fixes: 7e6358d2 ("dm: fix various targets to dm_register_target after module __init resources created")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarShenghui Wang <shhuiw@foxmail.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      c7cd5550
  10. 08 Oct, 2018 6 commits