[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [Qemu-devel] [RFC/BUG] xen-mapcache: buggy invalidate map cache?
On Mon, 10 Apr 2017 00:36:02 +0800 hrg <hrgstephen@xxxxxxxxx> wrote: Hi, > On Sun, Apr 9, 2017 at 11:55 PM, hrg <hrgstephen@xxxxxxxxx> wrote: > > On Sun, Apr 9, 2017 at 11:52 PM, hrg <hrgstephen@xxxxxxxxx> wrote: > >> Hi, > >> > >> In xen_map_cache_unlocked(), map to guest memory maybe in entry->next > >> instead of first level entry (if map to rom other than guest memory > >> comes first), while in xen_invalidate_map_cache(), when VM ballooned > >> out memory, qemu did not invalidate cache entries in linked > >> list(entry->next), so when VM balloon back in memory, gfns probably > >> mapped to different mfns, thus if guest asks device to DMA to these > >> GPA, qemu may DMA to stale MFNs. > >> > >> So I think in xen_invalidate_map_cache() linked lists should also be > >> checked and invalidated. > >> > >> What’s your opinion? Is this a bug? Is my analyze correct? > > > > Added Jun Nakajima and Alexander Graf > And correct Stefano Stabellini's email address. There is a real issue with the xen-mapcache corruption in fact. I encountered it a few months ago while experimenting with Q35 support on Xen. Q35 emulation uses an AHCI controller by default, along with NCQ mode enabled. The issue can be (somewhat) easily reproduced there, though using a normal i440 emulation might possibly allow to reproduce the issue as well, using a dedicated test code from a guest side. In case of Q35+NCQ the issue can be reproduced "as is". The issue occurs when a guest domain performs an intensive disk I/O, ex. while guest OS booting. QEMU crashes with "Bad ram offset 980aa000" message logged, where the address is different each time. The hard thing with this issue is that it has a very low reproducibility rate. The corruption happens when there are multiple I/O commands in the NCQ queue. So there are overlapping emulated DMA operations in flight and QEMU uses a sequence of mapcache actions which can be executed in the "wrong" order thus leading to an inconsistent xen-mapcache - so a bad address from the wrong entry is returned. The bad thing with this issue is that QEMU crash due to "Bad ram offset" appearance is a relatively good situation in the sense that this is a caught error. But there might be a much worse (artificial) situation where the returned address looks valid but points to a different mapped memory. The fix itself is not hard (ex. an additional checked field in MapCacheEntry), but there is a need of some reliable way to test it considering the low reproducibility rate. Regards, Alex _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |