Xen project Mailing List

Re: [Xen-devel] frequently ballooning results in qemu exit

To: Tim Deegan <tim@xxxxxxx>

From: Hanweidong <hanweidong@xxxxxxxxxx>

Date: Mon, 25 Mar 2013 12:40:55 +0000

Accept-language: zh-CN, en-US

Cc: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>, Yanqiangjun <yanqiangjun@xxxxxxxxxx>, "qemu-devel@xxxxxxxxxx" <qemu-devel@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>, "Gonglei \(Arei\)" <arei.gonglei@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Wangzhenguo <wangzhenguo@xxxxxxxxxx>

Delivery-date: Mon, 25 Mar 2013 12:41:34 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Thread-index: Ac4f8axrfDlF3aBUTkuCXRXIcf5P8wAaGV8AAAC91wAAFrrbcP//i+2A//7EJoCADBWKgP//bAsQ//iaxZA=

Thread-topic: [Xen-devel] frequently ballooning results in qemu exit

We fixed this issue by below patch which computed the correct size for test_bits(). qemu_get_ram_ptr() and qemu_safe_ram_ptr() will call xen_map_cache() with size is 0 if the requested address is in the RAM. Then xen_map_cache() will pass the size 0 to test_bits() for checking if the corresponding pfn was mapped in cache. But test_bits() will always return 1 when size is 0 without any bit testing. Actually, for this case, test_bits should check one bit. So this patch introduced a __test_bit_size which is greater than 0 and a multiple of XC_PAGE_SIZE, then test_bits can work correctly with __test_bit_size >> XC_PAGE_SHIFT as its size. Signed-off-by: Zhenguo Wang <wangzhenguo@xxxxxxxxxx> Signed-off-by: Weidong Han <hanweidong@xxxxxxxxxx> diff --git a/xen-mapcache.c b/xen-mapcache.c index 31c06dc..bd4a97f 100644 --- a/xen-mapcache.c +++ b/xen-mapcache.c @@ -202,6 +202,7 @@ uint8_t *xen_map_cache(hwaddr phys_addr, hwaddr size, hwaddr address_index; hwaddr address_offset; hwaddr __size = size; + hwaddr __test_bit_size = size; bool translated = false; tryagain: @@ -210,7 +211,23 @@ tryagain: trace_xen_map_cache(phys_addr); - if (address_index == mapcache->last_address_index && !lock && !__size) { + entry = &mapcache->entry[address_index % mapcache->nr_buckets]; + + /* __test_bit_size is always a multiple of XC_PAGE_SIZE */ + if (size) { + __test_bit_size = size + (phys_addr & (XC_PAGE_SIZE - 1)); + + if (__test_bit_size % XC_PAGE_SIZE) { + __test_bit_size += XC_PAGE_SIZE - (__test_bit_size % XC_PAGE_SIZE); + } + } else { + __test_bit_size = XC_PAGE_SIZE; + } + + if (address_index == mapcache->last_address_index && !lock && !__size && + test_bits(address_offset >> XC_PAGE_SHIFT, + __test_bit_size >> XC_PAGE_SHIFT, + entry->valid_mapping)) { trace_xen_map_cache_return(mapcache->last_address_vaddr + address_offset); return mapcache->last_address_vaddr + address_offset; } @@ -225,11 +242,10 @@ tryagain: __size = MCACHE_BUCKET_SIZE; } - entry = &mapcache->entry[address_index % mapcache->nr_buckets]; - while (entry && entry->lock && entry->vaddr_base && (entry->paddr_index != address_index || entry->size != __size || - !test_bits(address_offset >> XC_PAGE_SHIFT, size >> XC_PAGE_SHIFT, + !test_bits(address_offset >> XC_PAGE_SHIFT, + __test_bit_size >> XC_PAGE_SHIFT, entry->valid_mapping))) { pentry = entry; entry = entry->next; @@ -241,13 +257,15 @@ tryagain: } else if (!entry->lock) { if (!entry->vaddr_base || entry->paddr_index != address_index || entry->size != __size || - !test_bits(address_offset >> XC_PAGE_SHIFT, size >> XC_PAGE_SHIFT, + !test_bits(address_offset >> XC_PAGE_SHIFT, + __test_bit_size >> XC_PAGE_SHIFT, entry->valid_mapping)) { xen_remap_bucket(entry, __size, address_index); } } - if(!test_bits(address_offset >> XC_PAGE_SHIFT, size >> XC_PAGE_SHIFT, + if(!test_bits(address_offset >> XC_PAGE_SHIFT, + __test_bit_size >> XC_PAGE_SHIFT, entry->valid_mapping)) { mapcache->last_address_index = -1; if (!translated && mapcache->phys_offset_to_gaddr) { > -----Original Message----- > From: xen-devel-bounces@xxxxxxxxxxxxx [mailto:xen-devel- > bounces@xxxxxxxxxxxxx] On Behalf Of Hanweidong > Sent: 2013年3月21日 21:33 > To: Tim Deegan > Cc: George Dunlap; Andrew Cooper; Yanqiangjun; xen-devel@xxxxxxxxxxxxx; > Gonglei (Arei); Anthony PERARD > Subject: Re: [Xen-devel] frequently ballooning results in qemu exit > > > -----Original Message----- > > From: xen-devel-bounces@xxxxxxxxxxxxx [mailto:xen-devel- > > bounces@xxxxxxxxxxxxx] On Behalf Of Tim Deegan > > Sent: 2013年3月21日 20:16 > > To: Hanweidong > > Cc: George Dunlap; Andrew Cooper; Yanqiangjun; xen- > devel@xxxxxxxxxxxxx; > > Gonglei (Arei); Anthony PERARD > > Subject: Re: [Xen-devel] frequently ballooning results in qemu exit > > > > At 05:54 +0000 on 15 Mar (1363326854), Hanweidong wrote: > > > > > I'm also curious about this. There is a window between memory > > balloon > > > > out > > > > > and QEMU invalidate mapcache. > > > > > > > > That by itself is OK; I don't think we need to provide any > > meaningful > > > > semantics if the guest is accessing memory that it's ballooned > out. > > > > > > > > The question is where the SIGBUS comes from: either qemu has a > > mapping > > > > of the old memory, in which case it can write to it safely, or it > > > > doesn't, in which case it shouldn't try. > > > > > > The error always happened at memcpy in if (is_write) branch in > > > address_space_rw. > > > > Sure, but _why_? Why does this access cause SIGBUS? Presumably > > there's > > some part of the mapcache code that thinks it has a mapping there > when > > it doesn't. > > > > > We found that, after the last xen_invalidate_map_cache, the > mapcache > > entry related to the failed address was mapped: > > > ==xen_map_cache== phys_addr=7a3c1ec0 size=0 lock=0 > > > ==xen_remap_bucket== begin size=1048576 ,address_index=7a3 > > > ==xen_remap_bucket== end entry->paddr_index=7a3,entry- > > >vaddr_base=2a2d9000,size=1048576,address_index=7a3 > > > > OK, so that's 0x2a2d9000 -- 0x2a3d8fff. > > > > > ==address_space_rw== ptr=2a39aec0 > > > ==xen_map_cache== phys_addr=7a3c1ec4 size=0 lock=0 > > > ==xen_map_cache==first return 2a2d9000+c1ec4=2a39aec4 > > > ==address_space_rw== ptr=2a39aec4 > > > ==xen_map_cache== phys_addr=7a3c1ec8 size=0 lock=0 > > > ==xen_map_cache==first return 2a2d9000+c1ec8=2a39aec8 > > > ==address_space_rw== ptr=2a39aec8 > > > ==xen_map_cache== phys_addr=7a3c1ecc size=0 lock=0 > > > ==xen_map_cache==first return 2a2d9000+c1ecc=2a39aecc > > > ==address_space_rw== ptr=2a39aecc > > > > These are all to page 0x2a3e9a___. > > > > > ==xen_map_cache== phys_addr=7a16c108 size=0 lock=0 > > > ==xen_map_cache== return 92a407000+6c108=2a473108 > > > ==xen_map_cache== phys_addr=7a16c10c size=0 lock=0 > > > ==xen_map_cache==first return 2a407000+6c10c=2a47310c > > > ==xen_map_cache== phys_addr=7a16c110 size=0 lock=0 > > > ==xen_map_cache==first return 2a407000+6c110=2a473110 > > > ==xen_map_cache== phys_addr=7a395000 size=0 lock=0 > > > ==xen_map_cache== return 2a2d9000+95000=2a36e000 > > > ==address_space_rw== ptr=2a36e000 > > > > And this is to page 0x2a36e___, a different page in the same bucket. > > > > > here, the SIGBUS error occurred. > > > > So that page isn't mapped. Which means: > > - it was never mapped (and the mapcache code didn't handle the error > > correctly at map time); or > > - it was never mapped (and the mapcache hasn't checked its own > records > > before using the map); or > > - it was mapped (and something unmapped it in the meantime). > > > > Why not add some tests in xen_remap_bucket to check that all the > pages > > that qemu records as mapped are actually there? > > > > Hi Tim, > > We did more debugging these days, and root caused it just now. We found > the pfn which caused the SIGBUS was never mapped. We verified it by > printing *err* variable return from xc_map_foreign_bulk(), and its bit > was also not set in entry->valid_mapping. But test_bits didn't return > right value for this situation. qemu_get_ram_ptr() almost passed size 0 > to xen_map_cache(), and then xen_map_cache() calls test_bits() to test > valid_mapping, and test_bits always return 1 when size is 0 because > qemu's find_next_zero_bit() has following code: > if (offset >= size) { > return size; > } > > One way to fix test_bits() issue is to change size >> XC_PAGE_SHIFT to > (size + 1 << XC_PAGE_SHIFT) >> XC_PAGE_SHIFT in xen_map_cache(). But we > think it still cannot solve the problem perfectly, because test_bits > will only test one bit when size is 0. Why does qemu_get_ram_ptr() > always call xen_map_cache() with size is 0 when block->offset == 0? If > some operations need to access big range address (more than 1 page), > this way still has problem. So I think it should pass actual size to > xen_map_cache(). > > --weidong > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxx > http://lists.xen.org/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.