Commit dd00a99e authored by NeilBrown's avatar NeilBrown Committed by Linus Torvalds
Browse files

md: avoid a possibility that a read error can wrongly propagate through md/raid1 to a filesystem.

When a raid1 has only one working drive, we want read error to propagate up
to the filesystem as there is no point failing the last drive in an array.

Currently the code perform this check is racy.  If a write and a read a
both submitted to a device on a 2-drive raid1, and the write fails followed
by the read failing, the read will see that there is only one working drive
and will pass the failure up, even though the one working drive is actually
the *other* one.

So, tighten up the locking.
Signed-off-by: default avatarNeil Brown <>
Cc: <>
Signed-off-by: default avatarAndrew Morton <>
Signed-off-by: default avatarLinus Torvalds <>
parent c5ddb547
...@@ -271,21 +271,25 @@ static int raid1_end_read_request(struct bio *bio, unsigned int bytes_done, int ...@@ -271,21 +271,25 @@ static int raid1_end_read_request(struct bio *bio, unsigned int bytes_done, int
*/ */
update_head_pos(mirror, r1_bio); update_head_pos(mirror, r1_bio);
if (uptodate || (conf->raid_disks - conf->mddev->degraded) <= 1) { if (uptodate)
/* set_bit(R1BIO_Uptodate, &r1_bio->state);
* Set R1BIO_Uptodate in our master bio, so that else {
* we will return a good error code for to the higher /* If all other devices have failed, we want to return
* levels even if IO on some other mirrored buffer fails. * the error upwards rather than fail the last device.
* * Here we redefine "uptodate" to mean "Don't want to retry"
* The 'master' represents the composite IO operation to
* user-side. So if something waits for IO, then it will
* wait for the 'master' bio.
*/ */
if (uptodate) unsigned long flags;
set_bit(R1BIO_Uptodate, &r1_bio->state); spin_lock_irqsave(&conf->device_lock, flags);
if (r1_bio->mddev->degraded == conf->raid_disks ||
(r1_bio->mddev->degraded == conf->raid_disks-1 &&
!test_bit(Faulty, &conf->mirrors[mirror].rdev->flags)))
uptodate = 1;
spin_unlock_irqrestore(&conf->device_lock, flags);
if (uptodate)
raid_end_bio_io(r1_bio); raid_end_bio_io(r1_bio);
} else { else {
/* /*
* oops, read error: * oops, read error:
*/ */
...@@ -992,13 +996,14 @@ static void error(mddev_t *mddev, mdk_rdev_t *rdev) ...@@ -992,13 +996,14 @@ static void error(mddev_t *mddev, mdk_rdev_t *rdev)
unsigned long flags; unsigned long flags;
spin_lock_irqsave(&conf->device_lock, flags); spin_lock_irqsave(&conf->device_lock, flags);
mddev->degraded++; mddev->degraded++;
set_bit(Faulty, &rdev->flags);
spin_unlock_irqrestore(&conf->device_lock, flags); spin_unlock_irqrestore(&conf->device_lock, flags);
/* /*
* if recovery is running, make sure it aborts. * if recovery is running, make sure it aborts.
*/ */
set_bit(MD_RECOVERY_ERR, &mddev->recovery); set_bit(MD_RECOVERY_ERR, &mddev->recovery);
} } else
set_bit(Faulty, &rdev->flags); set_bit(Faulty, &rdev->flags);
set_bit(MD_CHANGE_DEVS, &mddev->flags); set_bit(MD_CHANGE_DEVS, &mddev->flags);
printk(KERN_ALERT "raid1: Disk failure on %s, disabling device. \n" printk(KERN_ALERT "raid1: Disk failure on %s, disabling device. \n"
" Operation continuing on %d devices\n", " Operation continuing on %d devices\n",
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment