|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] xen-mapcache: Fix the bug when overlapping emulated DMA operations may cause inconsistency in guest memory mappings
On Tue, 11 Jul 2017, Alexey G wrote:
> Under certain circumstances normal xen-mapcache functioning may be broken
> by guest's actions. This may lead to either QEMU performing exit() due to
> a caught bad pointer (and with QEMU process gone the guest domain simply
> appears hung afterwards) or actual use of the incorrect pointer inside
> QEMU address space -- a write to unmapped memory is possible. The bug is
> hard to reproduce on a i440 machine as multiple DMA sources are required
> (though it's possible in theory, using multiple emulated devices), but can
> be reproduced somewhat easily on a Q35 machine using an emulated AHCI
> controller -- each NCQ queue command slot may be used as an independent
> DMA source ex. using READ FPDMA QUEUED command, so a single storage
> device on the AHCI controller port will be enough to produce multiple DMAs
> (up to 32). The detailed description of the issue follows.
>
> Xen-mapcache provides an ability to map parts of a guest memory into
> QEMU's own address space to work with.
>
> There are two types of cache lookups:
> - translating a guest physical address into a pointer in QEMU's address
> space, mapping a part of guest domain memory if necessary (while trying
> to reduce a number of such (re)mappings to a minimum)
> - translating a QEMU's pointer back to its physical address in guest RAM
>
> These lookups are managed via two linked-lists of structures.
> MapCacheEntry is used for forward cache lookups, while MapCacheRev -- for
> reverse lookups.
>
> Every guest physical address is broken down into 2 parts:
> address_index = phys_addr >> MCACHE_BUCKET_SHIFT;
> address_offset = phys_addr & (MCACHE_BUCKET_SIZE - 1);
>
> MCACHE_BUCKET_SHIFT depends on a system (32/64) and is equal to 20 for
> a 64-bit system (which assumed for the further description). Basically,
> this means that we deal with 1 MB chunks and offsets within those 1 MB
> chunks. All mappings are created with 1MB-granularity, i.e. 1MB/2MB/3MB
> etc. Most DMA transfers typically are less than 1MB, however, if the
> transfer crosses any 1MB border(s) - than a nearest larger mapping size
> will be used, so ex. a 512-byte DMA transfer with the start address
> 700FFF80h will actually require a 2MB range.
>
> Current implementation assumes that MapCacheEntries are unique for a given
> address_index and size pair and that a single MapCacheEntry may be reused
> by multiple requests -- in this case the 'lock' field will be larger than
> 1. On other hand, each requested guest physical address (with 'lock' flag)
> is described by each own MapCacheRev. So there may be multiple MapCacheRev
> entries corresponding to a single MapCacheEntry. The xen-mapcache code
> uses MapCacheRev entries to retrieve the address_index & size pair which
> in turn used to find a related MapCacheEntry. The 'lock' field within
> a MapCacheEntry structure is actually a reference counter which shows
> a number of corresponding MapCacheRev entries.
>
> The bug lies in ability for the guest to indirectly manipulate with the
> xen-mapcache MapCacheEntries list via a special sequence of DMA
> operations, typically for storage devices. In order to trigger the bug,
> guest needs to issue DMA operations in specific order and timing.
> Although xen-mapcache is protected by the mutex lock -- this doesn't help
> in this case, as the bug is not due to a race condition.
>
> Suppose we have 3 DMA transfers, namely A, B and C, where
> - transfer A crosses 1MB border and thus uses a 2MB mapping
> - transfers B and C are normal transfers within 1MB range
> - and all 3 transfers belong to the same address_index
>
> In this case, if all these transfers are to be executed one-by-one
> (without overlaps), no special treatment necessary -- each transfer's
> mapping lock will be set and then cleared on unmap before starting
> the next transfer.
> The situation changes when DMA transfers overlap in time, ex. like this:
>
> |===== transfer A (2MB) =====|
>
> |===== transfer B (1MB) =====|
>
> |===== transfer C (1MB) =====|
> time --->
>
> In this situation the following sequence of actions happens:
>
> 1. transfer A creates a mapping to 2MB area (lock=1)
> 2. transfer B (1MB) tries to find available mapping but cannot find one
> because transfer A is still in progress, and it has 2MB size + non-zero
> lock. So transfer B creates another mapping -- same address_index,
> but 1MB size.
> 3. transfer A completes, making 1st mapping entry available by setting its
> lock to 0
> 4. transfer C starts and tries to find available mapping entry and sees
> that 1st entry has lock=0, so it uses this entry but remaps the mapping
> to a 1MB size
> 5. transfer B completes and by this time
> - there are two locked entries in the MapCacheEntry list with the SAME
> values for both address_index and size
> - the entry for transfer B actually resides farther in list while
> transfer C's entry is first
> 6. xen_ram_addr_from_mapcache() for transfer B gets correct address_index
> and size pair from corresponding MapCacheRev entry, but then it starts
> looking for MapCacheEntry with these values and finds the first entry
> -- which belongs to transfer C.
>
> At this point there may be following possible (bad) consequences:
>
> 1. xen_ram_addr_from_mapcache() will use a wrong entry->vaddr_base value
> in this statement:
>
> raddr = (reventry->paddr_index << MCACHE_BUCKET_SHIFT) +
> ((unsigned long) ptr - (unsigned long) entry->vaddr_base);
>
> resulting in an incorrent raddr value returned from the function. The
> (ptr - entry->vaddr_base) expression may produce both positive and negative
> numbers and its actual value may differ greatly as there are many
> map/unmap operations take place. If the value will be beyond guest RAM
> limits then a "Bad RAM offset" error will be triggered and logged,
> followed by exit() in QEMU.
>
> 2. If raddr value won't exceed guest RAM boundaries, the same sequence
> of actions will be performed for xen_invalidate_map_cache_entry() on DMA
> unmap, resulting in a wrong MapCacheEntry being unmapped while DMA
> operation which uses it is still active. The above example must
> be extended by one more DMA transfer in order to allow unmapping as the
> first mapping in the list is sort of resident.
Thanks for the well written description of the problem!
> The patch modifies the behavior in which MapCacheEntry's are added to the
> list, avoiding duplicates.
I take that the idea is to always go through the whole list to check for
duplicate locked entries, right?
> Signed-off-by: Alexey Gerasimenko <x1917x@xxxxxxxxx>
> ---
> hw/i386/xen/xen-mapcache.c | 33 ++++++++++++++++++++++++++++-----
> 1 file changed, 28 insertions(+), 5 deletions(-)
>
> diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c
> index e60156c..84f25ef 100644
> --- a/hw/i386/xen/xen-mapcache.c
> +++ b/hw/i386/xen/xen-mapcache.c
> @@ -206,6 +206,7 @@ static uint8_t *xen_map_cache_unlocked(hwaddr phys_addr,
> hwaddr size,
> uint8_t lock, bool dma)
> {
> MapCacheEntry *entry, *pentry = NULL;
> + MapCacheEntry *avl_entry = NULL, *avl_entry_prev = NULL;
> hwaddr address_index;
> hwaddr address_offset;
> hwaddr cache_size = size;
> @@ -251,14 +252,36 @@ tryagain:
>
> entry = &mapcache->entry[address_index % mapcache->nr_buckets];
>
> - while (entry && entry->lock && entry->vaddr_base &&
> - (entry->paddr_index != address_index || entry->size !=
> cache_size ||
> - !test_bits(address_offset >> XC_PAGE_SHIFT,
> - test_bit_size >> XC_PAGE_SHIFT,
> - entry->valid_mapping))) {
> + /* find a remappable entry. An existing locked entry which can be reused
> + * has a priority over all other entries (with lock=0, etc).
> + * Normally there will be just few entries for a given address_index
> + * bucket, typically 1-2 entries only
> + */
> + while (entry) {
> + if (entry->lock &&
> + entry->paddr_index == address_index &&
> + entry->size == cache_size &&
> + test_bits(address_offset >> XC_PAGE_SHIFT,
> + test_bit_size >> XC_PAGE_SHIFT, entry->valid_mapping)) {
> + break;
> + }
> + else if (!entry->lock || !entry->vaddr_base) {
> + avl_entry = entry;
> + avl_entry_prev = pentry;
> + }
> +
> pentry = entry;
> entry = entry->next;
> }
> +
> + /* if the reuseable entry was not found, use any available.
> + * Otherwise, a new entry will be created
> + */
> + if (avl_entry && !entry) {
> + pentry = avl_entry_prev;
> + entry = avl_entry;
> + }
Yes, I think this would work, but we should make sure to scan the whole
list only when lock == true. Something like the following:
diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c
index 2a1fbd1..f964143 100644
--- a/hw/i386/xen/xen-mapcache.c
+++ b/hw/i386/xen/xen-mapcache.c
@@ -234,7 +234,8 @@ static void xen_remap_bucket(MapCacheEntry *entry,
static uint8_t *xen_map_cache_unlocked(hwaddr phys_addr, hwaddr size,
uint8_t lock, bool dma)
{
- MapCacheEntry *entry, *pentry = NULL;
+ MapCacheEntry *entry, *pentry = NULL,
+ *free_entry = NULL, *free_pentry = NULL;
hwaddr address_index;
hwaddr address_offset;
hwaddr cache_size = size;
@@ -281,14 +282,22 @@ tryagain:
entry = &mapcache->entry[address_index % mapcache->nr_buckets];
- while (entry && entry->lock && entry->vaddr_base &&
+ while (entry && (lock || entry->lock) && entry->vaddr_base &&
(entry->paddr_index != address_index || entry->size != cache_size
||
!test_bits(address_offset >> XC_PAGE_SHIFT,
test_bit_size >> XC_PAGE_SHIFT,
entry->valid_mapping))) {
+ if (!free_entry && !entry->lock) {
+ free_entry = entry;
+ free_pentry = pentry;
+ }
pentry = entry;
entry = entry->next;
}
+ if (!entry && free_entry) {
+ entry = free_entry;
+ pentry = free_pentry;
+ }
if (!entry) {
entry = g_malloc0(sizeof (MapCacheEntry));
pentry->next = entry;
Would this work?
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |