1. 30 May, 2018 1 commit
  2. 18 Feb, 2018 1 commit
  3. 02 Nov, 2017 2 commits
    • Greg Kroah-Hartman's avatar
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman authored
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      How this work was done:
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
      All documentation files were explicitly excluded.
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
         For non */uapi/* files that summary was:
         SPDX license identifier                            # files
         GPL-2.0                                              11139
         and resulted in the first patch in this series.
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
         SPDX license identifier                            # files
         GPL-2.0 WITH Linux-syscall-note                        930
         and resulted in the second patch in this series.
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
         SPDX license identifier                            # files
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
         and that resulted in the third patch in this series.
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: default avatarKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: default avatarPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    • Guoqing Jiang's avatar
      md-cluster: Use a small window for raid10 resync · 8db87912
      Guoqing Jiang authored
      Suspending the entire device for resync could take
      too long. Resync in small chunks.
      cluster's resync window is maintained in r10conf as
      cluster_sync_low and cluster_sync_high, and processed
      in raid10's sync_request(). If the current resync is
      outside the cluster resync window:
      1. Set the cluster_sync_low to curr_resync_completed.
      2. Set cluster_sync_high to cluster_sync_low + stripe
      3. Send a message to all nodes so they may add it in
         their suspension list.
      We only support "near" raid10 so far, resync a far or
      offset raid10 array could have trouble. So raid10_run
      checks the layout of clustered raid10, it will refuse
      to run if the layout is not correct.
      With the "near" layout we process one stripe at a time
      progressing monotonically through the address space.
      So we can have a sliding window of whole-stripes which
      moves through the array suspending IO on other nodes,
      and both resync which uses array addresses and recovery
      which uses device addresses can stay within this window.
      Signed-off-by: default avatarGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
  4. 11 Apr, 2017 1 commit
    • NeilBrown's avatar
      md/raid10: simplify the splitting of requests. · fc9977dd
      NeilBrown authored
      raid10 splits requests in two different ways for two different
      First, bio_split() is used to ensure the bio fits with a chunk.
      Second, multiple r10bio structures are allocated to represent the
      different sections that need to go to different devices, to avoid
      known bad blocks.
      This can be simplified to just use bio_split() once, and not to use
      multiple r10bios.
      We delay the split until we know a maximum bio size that can
      be handled with a single r10bio, and then split the bio and queue
      the remainder for later handling.
      As with raid1, we allocate a new bio_set to help with the splitting.
      It is not correct to use fs_bio_set in a device driver.
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
  5. 22 Nov, 2016 1 commit
    • NeilBrown's avatar
      md/raid10: add failfast handling for reads. · 8d3ca83d
      NeilBrown authored
      If a device is marked FailFast, and it is not the only
      device we can read from, we mark the bio as MD_FAILFAST.
      If this does fail-fast, we don't try read repair but just
      allow failure.
      If it was the last device, it doesn't get marked Faulty so
      the retry happens on the same device - this time without
      FAILFAST.  A subsequent failure will not retry but will just
      pass up the error.
      During resync we may use FAILFAST requests, and on a failure
      we will simply use the other device(s).
      During recovery we will only use FAILFAST in the unusual
      case were there are multiple places to read from - i.e. if
      there are > 2 devices.  If we get a failure we will fail the
      device and complete the resync/recovery with remaining
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
  6. 19 Jul, 2016 1 commit
    • Tomasz Majchrzak's avatar
      raid10: improve random reads performance · 0e5313e2
      Tomasz Majchrzak authored
      RAID10 random read performance is lower than expected due to excessive spinlock
      utilisation which is required mostly for rebuild/resync. Simplify allow_barrier
      as it's in IO path and encounters a lot of unnecessary congestion.
      As lower_barrier just takes a lock in order to decrement a counter, convert
      counter (nr_pending) into atomic variable and remove the spin lock. There is
      also a congestion for wake_up (it uses lock internally) so call it only when
      it's really needed. As wake_up is not called constantly anymore, ensure process
      waiting to raise a barrier is notified when there are no more waiting IOs.
      Signed-off-by: default avatarTomasz Majchrzak <tomasz.majchrzak@intel.com>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
  7. 31 Aug, 2015 1 commit
    • NeilBrown's avatar
      md/raid10: ensure device failure recorded before write request returns. · 95af587e
      NeilBrown authored
      When a write to one of the legs of a RAID10 fails, the failure is
      recorded in the metadata of the other legs so that after a restart
      the data on the failed drive wont be trusted even if that drive seems
      to be working again (maybe a cable was unplugged).
      Currently there is no interlock between the write request completing
      and the metadata update.  So it is possible that the write will
      complete, the app will confirm success in some way, and then the
      machine will crash before the metadata update completes.
      This is an extremely small hole for a racy to fit in, but it is
      theoretically possible and so should be closed.
       - set MD_CHANGE_PENDING when requesting a metadata update for a
         failed device, so we can know with certainty when it completes
       - queue requests that experienced an error on a new queue which
         is only processed after the metadata update completes
       - call raid_end_bio_io() on bios in that queue when the time comes.
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
  8. 03 Feb, 2015 1 commit
    • NeilBrown's avatar
      md: make ->congested robust against personality changes. · 5c675f83
      NeilBrown authored
      There is currently no locking around calls to the 'congested'
      bdi function.  If called at an awkward time while an array is
      being converted from one level (or personality) to another, there
      is a tiny chance of running code in an unreferenced module etc.
      So add a 'congested' function to the md_personality operations
      structure, and call it with appropriate locking from a central
      When the array personality is changing the array will be 'suspended'
      so no IO is processed.
      If mddev_congested detects this, it simply reports that the
      array is congested, which is a safe guess.
      As mddev_suspend calls synchronize_rcu(), mddev_congested can
      avoid races by included the whole call inside an rcu_read_lock()
      This require that the congested functions for all subordinate devices
      can be run under rcu_lock.  Fortunately this is the case.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
  9. 26 Feb, 2013 1 commit
    • Jonathan Brassow's avatar
      MD RAID10: Improve redundancy for 'far' and 'offset' algorithms (part 1) · 475901af
      Jonathan Brassow authored
      The MD RAID10 'far' and 'offset' algorithms make copies of entire stripe
      widths - copying them to a different location on the same devices after
      shifting the stripe.  An example layout of each follows below:
      	        "far" algorithm
      	dev1 dev2 dev3 dev4 dev5 dev6
      	==== ==== ==== ==== ==== ====
      	 A    B    C    D    E    F
      	 G    H    I    J    K    L
      	 F    A    B    C    D    E  --> Copy of stripe0, but shifted by 1
      	 L    G    H    I    J    K
      		"offset" algorithm
      	dev1 dev2 dev3 dev4 dev5 dev6
      	==== ==== ==== ==== ==== ====
      	 A    B    C    D    E    F
      	 F    A    B    C    D    E  --> Copy of stripe0, but shifted by 1
      	 G    H    I    J    K    L
      	 L    G    H    I    J    K
      Redundancy for these algorithms is gained by shifting the copied stripes
      one device to the right.  This patch proposes that array be divided into
      sets of adjacent devices and when the stripe copies are shifted, they wrap
      on set boundaries rather than the array size boundary.  That is, for the
      purposes of shifting, the copies are confined to their sets within the
      array.  The sets are 'near_copies * far_copies' in size.
      The above "far" algorithm example would change to:
      	        "far" algorithm
      	dev1 dev2 dev3 dev4 dev5 dev6
      	==== ==== ==== ==== ==== ====
      	 A    B    C    D    E    F
      	 G    H    I    J    K    L
      	 B    A    D    C    F    E  --> Copy of stripe0, shifted 1, 2-dev sets
      	 H    G    J    I    L    K      Dev sets are 1-2, 3-4, 5-6
      This has the affect of improving the redundancy of the array.  We can
      always sustain at least one failure, but sometimes more than one can
      be handled.  In the first examples, the pairs of devices that CANNOT fail
      together are:
      	(1,2) (2,3) (3,4) (4,5) (5,6) (1, 6) [40% of possible pairs]
      In the example where the copies are confined to sets, the pairs of
      devices that cannot fail together are:
      	(1,2) (3,4) (5,6)                    [20% of possible pairs]
      We cannot simply replace the old algorithms, so the 17th bit of the 'layout'
      variable is used to indicate whether we use the old or new method of computing
      the shift.  (This is similar to the way the 16th bit indicates whether the
      "far" algorithm or the "offset" algorithm is being used.)
      This patch only handles the cases where the number of total raid disks is
      a multiple of 'far_copies'.  A follow-on patch addresses the condition where
      this is not true.
      Signed-off-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
  10. 17 Aug, 2012 1 commit
    • NeilBrown's avatar
      md/raid10: fix problem with on-stack allocation of r10bio structure. · e0ee7785
      NeilBrown authored
      A 'struct r10bio' has an array of per-copy information at the end.
      This array is declared with size [0] and r10bio_pool_alloc allocates
      enough extra space to store the per-copy information depending on the
      number of copies needed.
      So declaring a 'struct r10bio on the stack isn't going to work.  It
      won't allocate enough space, and memory corruption will ensue.
      So in the two places where this is done, declare a sufficiently large
      structure and use that instead.
      The two call-sites of this bug were introduced in 3.4 and 3.5
      so this is suitable for both those kernels.  The patch will have to
      be modified for 3.4 as it only has one bug.
      Cc: stable@vger.kernel.org
      Reported-by: default avatarIvan Vasilyev <ivan.vasilyev@gmail.com>
      Tested-by: default avatarIvan Vasilyev <ivan.vasilyev@gmail.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
  11. 31 Jul, 2012 3 commits
    • Jonathan Brassow's avatar
      MD RAID10: Export md_raid10_congested · cc4d1efd
      Jonathan Brassow authored
      md/raid10: Export is_congested test.
      In similar fashion to commits
      we export the RAID10 congestion checking function so that dm-raid.c can
      make use of it and make use of the personality.  The 'queue' and 'gendisk'
      structures will not be available to the MD code when device-mapper sets
      up the device, so we conditionalize access to these fields also.
      Signed-off-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
    • Jonathan Brassow's avatar
      MD: Move macros from raid1*.h to raid1*.c · 473e87ce
      Jonathan Brassow authored
      MD RAID1/RAID10: Move some macros from .h file to .c file
      There are three macros (IO_BLOCKED,IO_MADE_GOOD,BIO_SPECIAL) which are defined
      in both raid1.h and raid10.h.  They are only used in there respective .c files.
      However, if we wish to make RAID10 accessible to the device-mapper RAID
      target (dm-raid.c), then we need to move these macros into the .c files where
      they are used so that they do not conflict with each other.
      The macros from the two files are identical and could be moved into md.h, but
      I chose to leave the duplication and have them remain in the personality
      Signed-off-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
    • Jonathan Brassow's avatar
      MD RAID10: rename mirror_info structure · dc280d98
      Jonathan Brassow authored
      MD RAID10: Rename the structure 'mirror_info' to 'raid10_info'
      The same structure name ('mirror_info') is used by raid1.  Each of these
      structures are defined in there respective header files.  If dm-raid is
      to support both RAID1 and RAID10, the header files will be included and
      the structure names must not collide.
      Signed-off-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
  12. 22 May, 2012 1 commit
    • NeilBrown's avatar
      md/raid10: add reshape support · 3ea7daa5
      NeilBrown authored
      A 'near' or 'offset' lay RAID10 array can be reshaped to a different
      'near' or 'offset' layout, a different chunk size, and a different
      number of devices.
      However the number of copies cannot change.
      Unlike RAID5/6, we do not support having user-space backup data that
      is being relocated during a 'critical section'.  Rather, the
      data_offset of each device must change so that when writing any block
      to a new location, it will not over-write any data that is still
      This means that RAID10 reshape is not supportable on v0.90 metadata.
      The different between the old data_offset and the new_offset must be
      at least the larger of the chunksize multiplied by offset copies of
      each of the old and new layout. (for 'near' mode, offset_copies == 1).
      A larger difference of around 64M seems useful for in-place reshapes
      as more data can be moved between metadata updates.
      Very large differences (e.g. 512M) seem to slow the process down due
      to lots of long seeks (on oldish consumer graded devices at least).
      Metadata needs to be updated whenever the place we are about to write
      to is considered - by the current metadata - to still contain data in
      the old layout.
      [unbalanced locking fix from Dan Carpenter <dan.carpenter@oracle.com>]
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
  13. 20 May, 2012 2 commits
    • NeilBrown's avatar
      md/raid10: Introduce 'prev' geometry to support reshape. · f8c9e74f
      NeilBrown authored
      When RAID10 supports reshape it will need a 'previous' and a 'current'
      geometry, so introduce that here.
      Use the 'prev' geometry when before the reshape_position, and the
      current 'geo' when beyond it.  At other times, use both as
      For now, both are identical (And reshape_position is never set).
      When we use the 'prev' geometry, we must use the old data_offset.
      When we use the current (And a reshape is happening) we must use
      the new_data_offset.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
    • NeilBrown's avatar
      md/raid10: collect some geometry fields into a dedicated structure. · 5cf00fcd
      NeilBrown authored
      We will shortly be adding reshape support for RAID10 which will
      require it having 2 concurrent geometries (before and after).
      To make that easier, collect most geometry fields into 'struct geom'
      and access them from there.  Then we will more easily be able to add
      a second set of fields.
      Note that 'copies' is not in this struct and so cannot be changed.
      There is little need to change this number and doing so is a lot
      more difficult as it requires reallocating more things.
      So leave it out for now.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
  14. 22 Dec, 2011 1 commit
  15. 11 Oct, 2011 7 commits
  16. 28 Jul, 2011 3 commits
    • NeilBrown's avatar
      md/raid10: Handle write errors by updating badblock log. · bd870a16
      NeilBrown authored
      When we get a write error (in the data area, not in metadata),
      update the badblock log rather than failing the whole device.
      As the write may well be many blocks, we trying writing each
      block individually and only log the ones which fail.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
    • NeilBrown's avatar
      md/raid10: clear bad-block record when write succeeds. · 749c55e9
      NeilBrown authored
      If we succeed in writing to a block that was recorded as
      being bad, we clear the bad-block record.
      This requires some delayed handling as the bad-block-list update has
      to happen in process-context.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
    • NeilBrown's avatar
      md/raid10: avoid reading from known bad blocks - part 1 · 856e08e2
      NeilBrown authored
      This patch just covers the basic read path:
       1/ read_balance needs to check for badblocks, and return not only
          the chosen slot, but also how many good blocks are available
       2/ read submission must be ready to issue multiple reads to
          different devices as different bad blocks on different devices
          could mean that a single large read cannot be served by any one
          device, but can still be served by the array.
          This requires keeping count of the number of outstanding requests
          per bio.  This count is stored in 'bi_phys_segments'
      On read error we currently just fail the request if another target
      cannot handle the whole request.  Next patch refines that a bit.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
  17. 27 Jul, 2011 1 commit
    • NeilBrown's avatar
      md/raid10: Make use of new recovery_disabled handling · 2bb77736
      NeilBrown authored
      When we get a read error during recovery, RAID10 previously
      arranged for the recovering device to appear to fail so that
      the recovery stops and doesn't restart.  This is misleading and wrong.
      Instead, make use of the new recovery_disabled handling and mark
      the target device and having recovery disabled.
      Add appropriate checks in add_disk and remove_disk so that devices
      are removed and not re-added when recovery is disabled.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
  18. 31 Mar, 2011 1 commit
  19. 24 Jun, 2010 1 commit
    • NeilBrown's avatar
      md: fix handling of array level takeover that re-arranges devices. · e93f68a1
      NeilBrown authored
      Most array level changes leave the list of devices largely unchanged,
      possibly causing one at the end to become redundant.
      However conversions between RAID0 and RAID10 need to renumber
      all devices (except 0).
      This renumbering is currently being done in the ->run method when the
      new personality takes over.  However this is too late as the common
      code in md.c might already have invalidated some of the devices if
      they had a ->raid_disk number that appeared to high.
      Moving it into the ->takeover method is too early as the array is
      still active at that time and wrong ->raid_disk numbers could cause
      So add a ->new_raid_disk field to mdk_rdev_s and use it to communicate
      the new raid_disk number.
      Now the common code knows exactly which devices need to be renumbered,
      and which can be invalidated, and can do it all at a convenient time
      when the array is suspend.
      It can also update some symlinks in sysfs which previously were not be
      updated correctly.
      Reported-by: default avatarMaciej Trela <maciej.trela@intel.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
  20. 18 May, 2010 1 commit
  21. 16 Jun, 2009 1 commit
    • NeilBrown's avatar
      md: remove mddev_to_conf "helper" macro · 070ec55d
      NeilBrown authored
      Having a macro just to cast a void* isn't really helpful.
      I would must rather see that we are simply de-referencing ->private,
      than have to know what the macro does.
      So open code the macro everywhere and remove the pointless cast.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
  22. 31 Mar, 2009 2 commits
  23. 03 Oct, 2006 1 commit
  24. 26 Jun, 2006 1 commit
  25. 06 Jan, 2006 3 commits