[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] driver domain crash and reconnect handling



On 24/01/13 09:59, Ian Campbell wrote:
Actually I've used the xc_domain_resume_any() function from libxc to
resume the guests. It worked with PV guests, however with some hacks in
the hypervisor to silently discarding the error condicions, and not
returning from the hypercall with an error. The two guests I've used,
and their problems with the hypercall return values:

- SLES 11 SP1 (2.6.32.12) crashes because VCPUOP_register_vcpu_info
hypercall returns EINVAL, as ( v->arch.vcpu_info_mfn != INVALID_MFN )
- Debian Squeeze 6.0 (2.6.32-5) crashes because EVTCHNOP_bind_virq
returns EEXISTS, as ( v->virq_to_evtchnvirq != 0 )
- (these hypercalls were made right after guest comes back from the
suspend hypercall)

The toolstack might need to do EVTCHNOP_reset or do some other cleanup?
Yep, that might be an another solution, to reset these values from toolstack via hypercall(s), but as far as I checked all the current hypercalls which are changing these things, doing a lot of other stuff which we not necessarily want. So it might be necessary to define a new hypercall specifically for this use-case. Probably it's easier than make Xen aware that a suspend/resume happened, and the guest remained in the same domain.

Pausing guests when one of their supporting driver domains goes away
does seem like a good idea.

I suppose the flip side is that a domain which isn't using a disk which
goes away briefly would see a hiccup it wouldn't have otherwise seen.
Well, I think it would be quite complicated to watch the ring buffer for activities while there is no backend connected. I would say this is an acceptable loss.

Zoli

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.