[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v5][XSA-97] x86/paging: make log-dirty operations preemptible



On 05/09/14 11:47, Jan Beulich wrote:
> Both the freeing and the inspection of the bitmap get done in (nested)
> loops which - besides having a rather high iteration count in general,
> albeit that would be covered by XSA-77 - have the number of non-trivial
> iterations they need to perform (indirectly) controllable by both the
> guest they are for and any domain controlling the guest (including the
> one running qemu for it).
>
> Note that the tying of the continuations to the invoking domain (which
> previously [wrongly] used the invoking vCPU instead) implies that the
> tools requesting such operations have to make sure they don't issue
> multiple similar operations in parallel.
>
> Note further that this breaks supervisor-mode kernel assumptions in
> hypercall_create_continuation() (where regs->eip gets rewound to the
> current hypercall stub beginning), but otoh
> hypercall_cancel_continuation() doesn't work in that mode either.
> Perhaps time to rip out all the remains of that feature?
>
> This is part of CVE-2014-5146 / XSA-97.
>
> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
> Reviewed-by: Tim Deegan <tim@xxxxxxx>

Unfortunately XenRT is finding reliable issues with this version of the
patch.

Taking two builds of XenServer, identical other than this patch
(Xen-4.4.1 based adjusting for -EAGAIN/-ERESTART), the build without is
fine, but the build with appears to show page accounting issues.

The logs below are from a standard vmlifecycle ops test of RHEL6.2 with
a 32bit and 64bit PV guest undergoing tests in tandem.

E.g:

(XEN) [ 4141.838508] mm.c:2352:d0v1 Bad type (saw 7400000000000001 !=
exp 1000000000000000) for mfn 2317f0 (pfn 14436)
(XEN) [ 4141.838512] mm.c:2995:d0v1 Error while pinning mfn 2317f0

Failure to pin a batch of domain 78's pagetables on restore. 

(XEN) [ 7832.953068] mm.c:827:d0v0 pg_owner 100 l1e_owner 100, but
real_pg_owner 99
(XEN) [ 7832.953072] mm.c:898:d0v0 Error getting mfn 854c3 (pfn 2c820)
from L1 entry 00000000854c3025 for l1e_owner=100, pg_owner=100
(XEN) [ 7832.953076] mm.c:1221:d0v0 Failure in alloc_l1_table: entry 488
(XEN) [ 7832.953083] mm.c:2099:d0v0 Error while validating mfn 12406d
(pfn 18fbe) for type 1000000000000000: caf=8000000000000003
taf=1000000000000001
(XEN) [ 7832.953086] mm.c:906:d0v0 Attempt to create linear p.t. with
write perms
(XEN) [ 7832.953089] mm.c:1297:d0v0 Failure in alloc_l2_table: entry 4
(XEN) [ 7832.953100] mm.c:2099:d0v0 Error while validating mfn 23ebe4
(pfn 1db65) for type 2000000000000000: caf=8000000000000003
taf=2000000000000001
(XEN) [ 7832.953104] mm.c:948:d0v0 Attempt to create linear p.t. with
write perms
(XEN) [ 7832.953106] mm.c:1379:d0v0 Failure in alloc_l3_table: entry 0
(XEN) [ 7832.953110] mm.c:2099:d0v0 Error while validating mfn 2019db
(pfn 18eaf) for type 3000000000000000: caf=8000000000000003
taf=3000000000000001
(XEN) [ 7832.953113] mm.c:2995:d0v0 Error while pinning mfn 2019db

Failure to pin a batch of domain 100's pagetables on restore.

In both of these cases, the save side succeeds, which means the
pagetable normalisation found fully complete and correct pagetables
(i.e. the p2m and m2p agreed), and
xc_get_pfn_type_batch()/xc_map_foreign_bulk() didn't fail any domain
ownership tests.

On inspection of the libxc logs, I am feeing quite glad I left this
debugging message in:

xenguest-75-save[11876]: xc: detail: Bitmap contained more entries than
expected...
xenguest-83-save[32123]: xc: detail: Bitmap contained more entries than
expected...
xenguest-84-save[471]: xc: detail: Bitmap contained more entries than
expected...
xenguest-88-save[3823]: xc: detail: Bitmap contained more entries than
expected...
xenguest-89-save[4656]: xc: detail: Bitmap contained more entries than
expected...
xenguest-95-save[9379]: xc: detail: Bitmap contained more entries than
expected...
xenguest-98-save[11784]: xc: detail: Bitmap contained more entries than
expected...

This means that periodically, a XEN_DOMCTL_SHADOW_OP_{CLEAN,PEEK}
hypercall gives us back a bitmap with more set bits than
stats.dirty_count which it hands back at the same time.

Domain 75 (the 46bit was the first with the bitmap error, migrated to
domain 76, then to 78 which suffered a pinning failure.  Beyond this
point, on the 32bit domain continues testing, and suffers a similar
problem later.

I have found a bug in my accounting code (need to change two set_bit()s
to test_and_set_bit()s before blindly incrementing the stat), but the
precondition which tickles this bug indicates something is going awry
with the final logdirty bitmap as used by the migration code.

Unfortunately, I am now out of the office for 6 working days (back on
Monday 22nd), but will be sporadically on email during that time.

~Andrew


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.