[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] dom0 linux 3.6.0-rc4, crash due to ballooning althoug dom0_mem=X, max:X set
On Wed, Sep 05, 2012 at 04:38:48PM +0200, Sander Eikelenboom wrote: > > Wednesday, September 5, 2012, 4:06:01 PM, you wrote: > > > On Tue, Sep 04, 2012 at 04:27:20PM -0400, Robert Phillips wrote: > >> Ben, > >> > >> You have asked me to provide the rationale behind the gnttab_old_mfn > >> patch, which you emailed to Sander earlier today. > >> Here are my findings. > >> > >> I found that xen_blkbk_map() in drivers/block/xen-blkback/blkback.c has > >> changed from our previous version. It now calls gnttab_map_refs() in > >> drivers/xen/grant-table.c. > >> > >> That function first calls > >> HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, ... ) and then calls > >> m2p_add_override() in p2m.c > > > And HYPERVISOR_grant_table_op .. would populate map_ops[i].bus_addr with > > the machine address.. > > >> which is where I made my change. > >> > >> The unpatched code was saving the pfn's old mfn in kmap_op->dev_bus_addr. > >> > >> kmap_op is of type struct gnttab_map_grant_ref. That data type is used to > >> record grant table mappings so later they can be unmapped correctly. > > > Right, but the blkback makes a distinction by passing NULL as kmap_op, > > which means it should > > use the old mechanism. Meaning that once the hypercall is done, the > > map_ops[i].bus_addr is not > > used anymore.. > > >> > >> The problem with saving the old mfn in kmap_op->dev_bus_addr is that it is > >> later overwritten by __gnttab_map_grant_ref() in xen/common/grant_table.c > > > Uh, so the problem of saving the old mfn in dev_bus_addr has been there for > > a long long time then? > > Even before this patch set? > >> > >> Since the storage holding the old mfn got overwritten, the unmapping was > >> being done incorrectly. The balloon code detected that and bugged at > >> drivers/xen/balloon.c:359 > >> > > > Hmm, I believe the storage for holding the old mfn was/is page->index. > > > >> My patch simply adds another member called old_mfn to struct > >> gnttab_map_grant_ref rather than trying to overload dev_bus_addr. > >> > >> I don't know if Sander's bug is the same or related. The BUG_ON at > >> drivers/xen/balloon.c:359 is quite general. It simply asserts that we are > >> not trying to re-map a valid mapping. > > > Right. Somehow he ends up with valid mappings where there should be none. > > And lots of them. > > It's something between kernel v3.4.1 and v3.5.3, haven't had time to narrow > it down yet. > Any suggestions for specific commits i could try to quickly bisect this one ? These are the ones that went in: ea61fc0 xen/p2m: Reserve 8MB of _brk space for P2M leafs when populating back. b9e0d95 xen: mark local pages as FOREIGN in the m2p_override 6878c32 xen/blkfront: Add WARN to deal with misbehaving backends. 5e62625 xen/setup: filter APERFMPERF cpuid feature out 8c9ce60 xen/blkback: Copy id field when doing BLKIF_DISCARD. 58b7b53 xen/balloon: Subtract from xen_released_pages the count that is populated. 780dbcd xen/pci: Check for PCI bridge before using it. 5e152e6 xen/events: Add WARN_ON when quick lookup found invalid type. 5842f57 xen/hvc: Check HVM_PARAM_CONSOLE_[EVTCHN|PFN] for correctness. a32c88b xen/hvc: Fix error cases around HVM_PARAM_CONSOLE_PFN 2e5ad6b xen/hvc: Collapse error logic. 7664810 xen: do not disable netfront in dom0 68c2c39 xen: do not map the same GSI twice in PVHVM guests. 201a52b hvc_xen: NULL dereference on allocation failure d79d595 xen: Add selfballoning memory reservation tunable. d2fb4c5 xenbus: Add support for xenbus backend in stub domain 2f1bd67 xen/smp: unbind irqworkX when unplugging vCPUs. 87e4baa x86/xen/apic: Add missing #include <xen/xen.h> 323f90a xen-acpi-processor: Add missing #include <xen/xen.h> 8605067 xen-blkfront: module exit handling adjustments e77c78c xen-blkfront: properly name all devices f62805f xen: enter/exit lazy_mmu_mode around m2p_override calls 211063d xen/acpi/sleep: Enable ACPI sleep via the __acpi_os_prepare_sleep 1ff2b0c xen: implement IRQ_WORK_VECTOR handler f447d56 xen: implement apic ipi interface 83d51ab xen/setup: update VA mapping when releasing memory during setup 96dc08b xen/setup: Combine the two hypercall functions - since they are quite similar. 2e2fb75 xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAM ca11823 xen/setup: Only print "Freeing XXX-YYY pfn range: Z pages freed" if Z > 0 9438ef7 x86/apic: Fix UP boot crash ab6ec39 xen/apic: implement io apic read with hypercall 27abd14 Revert "xen/x86: Workaround 'x86/ioapic: Add register level checks to detect bogus io-apic entries'" 31b3c9d xen/x86: Implement x86_apic_ops 4a8e2a3 x86/apic: Replace io_apic_ops with x86_io_apic_ops. 977f857 PCI: move mutex locking out of pci_dev_reset function 569ca5b xen/gnttab: add deferred freeing logic 9fe2a70 debugfs: Add support to print u32 array in debugfs 940713b xen/p2m: An early bootup variant of set_phys_to_machine d509685 xen/p2m: Collapse early_alloc_p2m_middle redundant checks. cef4cca xen/p2m: Allow alloc_p2m_middle to call reserve_brk depending on argument 3f3aaea xen/p2m: Move code around to allow for better re-usage. Narrowing this down (so ignore APIC bootup, drivers, etc) these could be it: b9e0d95 xen: mark local pages as FOREIGN in the m2p_override 58b7b53 xen/balloon: Subtract from xen_released_pages the count that is populated. d79d595 xen: Add selfballoning memory reservation tunable. f62805f xen: enter/exit lazy_mmu_mode around m2p_override calls 83d51ab xen/setup: update VA mapping when releasing memory during setup 96dc08b xen/setup: Combine the two hypercall functions - since they are quite similar. 2e2fb75 xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAM ca11823 xen/setup: Only print "Freeing XXX-YYY pfn range: Z pages freed" if Z > 0 940713b xen/p2m: An early bootup variant of set_phys_to_machine d509685 xen/p2m: Collapse early_alloc_p2m_middle redundant checks. cef4cca xen/p2m: Allow alloc_p2m_middle to call reserve_brk depending on argument 3f3aaea xen/p2m: Move code around to allow for better re-usage. About nine of them deal with dom0_mem=max ballooning up right, so if you ignore those: b9e0d95 xen: mark local pages as FOREIGN in the m2p_override d79d595 xen: Add selfballoning memory reservation tunable. f62805f xen: enter/exit lazy_mmu_mode around m2p_override calls Try reverting any of those. And if nothing works there then we can try to revert the ones that deal with 'dom0_mem=max:XX'.. I also need to be able to reproduce this. You said you can only reproduce this on your Intel box - is this a fast Intel machine? It also looks like you only have 2GB in the machine - and reserve 1GB to the dom0. If you manually (so don't start the guest), balloon down - say to 512MB and then launch a guest do you see this problem? _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |