[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] driver domain crash and reconnect handling

On 24/01/13 11:45, George Shuklin wrote:

I expect the outage due to the proto-suspend is dwarfed by the outage
caused by a backend going away for however long it takes to notice,
rebuild, reset the hardware, etc etc.
Indeed, probably the backend restoration would take at least 5
seconds. Compared to that, the suspend-resume and the frontend device
reinit is much shorter.
Probably in storage driver domains it's better to suspend the guest
immediately when the backend is gone, as the guest can easily crash if
the block device is inaccessible for a long time. In case of network
access, this isn't such a big problem.

Some notes about guest suspend during IO.

I tested that way for storage reboot (pause all domains, reboot ISCSI
storage and resume every domain). If pause is short (less that 2
minutes), guest can survive. If pause is longer than 2 minutes, guests
in state of waiting for io completion, detects IO timeout after
resuming  and cause IO error on virtual block devices. (PV).

Good point! I haven't considered that even if the guest is paused, during coming back it will still notice that timers expired. I think the original idea came from Paul, CCing him to raise awareness about this problem.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.