[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-changelog] [xen-unstable] Fix a number of flaws in the blktap userspace daemon when dealing
# HG changeset patch # User Andrew Warfield <andy@xxxxxxxxxxxxx> # Node ID 69efe6730fb1e2972f1413a877d88a6c09ca4451 # Parent c4225c95dbcd8956ff88d088d04c30009c887cc8 Fix a number of flaws in the blktap userspace daemon when dealing with I/O errors. There are a number of flaws in the blktap userspace daemon when dealing with I/O errors. - The backends which use AIO check the io_events.res member to determine if an I/O error occurred. Which is good. But when calling the callback to signal completion of the I/O, they pass the io_events.res2 member Now this seems fine at first glance[1] "res is the usual result of an I/O operation: the number of bytes transfered, or a negative error code. res2 is a second status value which will be returned to the user" Except that "currently (2.6.0-test9), callers of aio_complete() within the kernel always set res2 to zero." And this hasn't changed anytime since 2.6.0, so by passing through the status from 'res2', the callback thinks the I/O operation succeeded even when it failed :-( The fix is simple instead of passing 'res2', just pass ep->res == io->u.c.nbytes ? 0 : 1 This would solve the error reporting to the guest, except that there is a second flaw... - The tapdisk I/O completion callback checks the status parameter passed in, syslog's it and then returns. It never bothers to send the I/O completion response back to the blktap kernel driver when a failure occurrs. Fortunately the fix for this is also simple. Instead of returning from the callback when dealing with an error, we simply toggle the status field for the pending response to BLKIF_RSP_ERROR and then continue with the normal codepath. So the error eventually gets back to the guest. The scenario I used to discover the problem and test the patch is thus: - In dom0 create a filesystem with only 200 MB of free space - Create a 1 GB sparse file on this volume. - Configure the guest so this sparse file appears as /dev/xvdb - In the domU create a single partition on /dev/xvdb and format it with ext3. - In the DomU, mount /dev/xvdb1 on /mnt and then run dd if=/dev/zero of=/mnt/data.bin bs=1GB count=1 Without this patch, the 'dd' command would succeed in writing 1 GB of data even though the underlying disk in Dom0 was only 200 MB in size. More complex tests of copying a whole directory heirarchy across resulted in catastrophic data corruption of the filessytem itself. Manual fsck was needed to fixup the filesystem & there were many very bad errors needing fixing. With this patch applied the DomU sees the I/O failures and kernel logs messages Dec 1 11:02:53 dhcp-5-203 kernel: end_request: I/O error, dev xvdc, sector 722127 Dec 1 11:02:53 dhcp-5-203 kernel: end_request: I/O error, dev xvdc, sector 730327 Dec 1 11:02:53 dhcp-5-203 kernel: end_request: I/O error, dev xvdc, sector 738527 Dec 1 11:02:53 dhcp-5-203 kernel: end_request: I/O error, dev xvdc, sector 746727 Dec 1 11:02:53 dhcp-5-203 kernel: end_request: I/O error, dev xvdc, sector 754927 Dec 1 11:02:53 dhcp-5-203 kernel: end_request: I/O error, dev xvdc, sector 763127 Dec 1 11:02:53 dhcp-5-203 kernel: end_request: I/O error, dev xvdc, sector 771327 Dec 1 11:02:53 dhcp-5-203 kernel: end_request: I/O error, dev xvdc, sector 779527 Dec 1 11:02:53 dhcp-5-203 kernel: end_request: I/O error, dev xvdc, sector 792399 It will retry the I/O operation until it runs out of sectors to try, and then fail the operation. The filesystem is not seriously damaged - ext3 journal recovery will trivially cleanup if the guest is rebooted after the disk in Dom0 is enlarged. Signed-off-by: Daniel P. Berrange <berrange@xxxxxxxxxx> [1] http://lwn.net/Articles/24366/ --- tools/blktap/drivers/block-aio.c | 8 ++------ tools/blktap/drivers/block-qcow.c | 9 +-------- tools/blktap/drivers/tapdisk.c | 3 +-- 3 files changed, 4 insertions(+), 16 deletions(-) diff -r c4225c95dbcd -r 69efe6730fb1 tools/blktap/drivers/block-aio.c --- a/tools/blktap/drivers/block-aio.c Fri Dec 01 16:31:36 2006 +0000 +++ b/tools/blktap/drivers/block-aio.c Fri Dec 01 09:01:04 2006 -0800 @@ -311,12 +311,8 @@ int tdaio_do_callbacks(struct td_state * struct pending_aio *pio; pio = &prv->pending_aio[(long)io->data]; - - if (ep->res != io->u.c.nbytes) { - /* TODO: handle this case better. */ - DPRINTF("AIO did less than I asked it to. \n"); - } - rsp += pio->cb(s, ep->res2, pio->id, pio->private); + rsp += pio->cb(s, ep->res == io->u.c.nbytes ? 0 : 1, + pio->id, pio->private); prv->iocb_free[prv->iocb_free_count++] = io; } diff -r c4225c95dbcd -r 69efe6730fb1 tools/blktap/drivers/block-qcow.c --- a/tools/blktap/drivers/block-qcow.c Fri Dec 01 16:31:36 2006 +0000 +++ b/tools/blktap/drivers/block-qcow.c Fri Dec 01 09:01:04 2006 -0800 @@ -1145,13 +1145,6 @@ int tdqcow_do_callbacks(struct td_state pio = &prv->pending_aio[(long)io->data]; - if (ep->res != io->u.c.nbytes) { - /* TODO: handle this case better. */ - ptr = (int *)&ep->res; - DPRINTF("AIO did less than I asked it to " - "[%lu,%lu,%d]\n", - ep->res, io->u.c.nbytes, *ptr); - } aio_unlock(prv, pio->sector); if (pio->id >= 0) { if (prv->crypt_method) @@ -1162,7 +1155,7 @@ int tdqcow_do_callbacks(struct td_state &prv->aes_decrypt_key); prv->nr_reqs[pio->qcow_idx]--; if (prv->nr_reqs[pio->qcow_idx] == 0) - rsp += pio->cb(s, ep->res2, pio->id, + rsp += pio->cb(s, ep->res == io->u.c.nbytes ? 0 : 1, pio->id, pio->private); } else if (pio->id == -2) free(pio->buf); diff -r c4225c95dbcd -r 69efe6730fb1 tools/blktap/drivers/tapdisk.c --- a/tools/blktap/drivers/tapdisk.c Fri Dec 01 16:31:36 2006 +0000 +++ b/tools/blktap/drivers/tapdisk.c Fri Dec 01 09:01:04 2006 -0800 @@ -424,8 +424,7 @@ int send_responses(struct td_state *s, i } if (res != 0) { - DPRINTF("*** request error %d! \n", res); - return 0; + blkif->pending_list[idx].status = BLKIF_RSP_ERROR; } blkif->pending_list[idx].count--; _______________________________________________ Xen-changelog mailing list Xen-changelog@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-changelog
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |