|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] How to deal with hypercalls returning -EFAULT
Currently the release of Xen 4.11 is blocked due to a sporadic failure
of the OSSTEST guest-saverestore[.2]. During that test a hypercall
issued by libxc via the Linux privcmd driver returns -EFAULT in spite
of all hypercall buffers locked in memory via mlock() (or similar flags
specified in a mmap() call).
My analysis has revealed that modern Linux kernels might make such
locked user pages unaccessible for very short periods of time. This can
happen e.g. when pages are subject to compaction or migration.
There are multiple ways to mitigate this problem:
1. Trying to switch page migration or compaction off in dom0.
Pros: - no change in Xen necessary
Cons: - new cases might come up in the future
- easy to miss, failures are really very sporadic and might
happen only after updating the kernel
2. Add a bandaid to Xen tools by retrying hypercalls which have failed
with -EFAULT (either for all or only for some hypercalls)
Pros: - no interface change necessary
Cons: - not all hypercalls might be just repeatable
- problem isn't solved but just worked around
3. Modify the interface to the privcmd driver to pass information about
used buffers to the kernel in order to lock them there. Either add a
new interface for hypercall buffer management or add the list of
buffers to the privcmd ioctl data structure.
Pros: - problem is really solved
Cons: - split solution between kernel and Xen, both must be changed
4. Modify the interface between hypervisor and kernel: instead of just
returning -EFAULT let the hypervisor behave more like copy_to_user by
raising a page fault which can then be fixed up in the kernel. This
change must be activated by the kernel, of course.
Pros: - rather simple change in the kernel "doing the right thing"
- hypercall bounce buffer handling in libxc/libxencall can be
switched off for a kernel supporting this chnage
Cons: - split solution between kernel and Xen, both must be changed
- not sure how complex the required hypervisor change will be
It should be noted that we can either select only one of above solutions
or one of 3/4 and additionally one of 1/2 as a fallback for old kernels.
How to proceed?
I'd like to have an answer as fast as possible to unblock 4.11 release.
Juergen
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |