[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0
On Mon, Jun 08, 2009 at 10:11:41PM +0300, Pasi Kärkkäinen wrote: > On Mon, Jun 08, 2009 at 08:05:43PM +0300, Pasi Kärkkäinen wrote: > > On Mon, Jun 08, 2009 at 07:21:46PM +0300, Pasi Kärkkäinen wrote: > > > On Mon, Jun 08, 2009 at 05:17:45PM +0100, Ian Campbell wrote: > > > > On Mon, 2009-06-08 at 12:13 -0400, Pasi Kärkkäinen wrote: > > > > > On Mon, Jun 08, 2009 at 05:00:58PM +0100, Ian Campbell wrote: > > > > > > On Mon, 2009-06-08 at 11:45 -0400, Ian Campbell wrote: > > > > > > > > > > > > > > > L4 at e1822000 is pinned contains L2 at e1977228 which points > > > > > > > > at an > > > > > > > L1 > > > > > > > > which is unpinned low mem address 0x8bf8000 > > > > > > > > > > > > > > OK so I think that is interesting. A pinned L4 referencing an > > > > > > > unpinned > > > > > > > L1 isn't supposed to happen, I don't think (Jeremy?). > > > > > > > > > > > > Interesting: > > > > > > > > > > > > pte_t *page_check_address(struct page *page, struct > > > > > > mm_struct *mm, > > > > > > [...] > > > > > > pte = pte_offset_map(pmd, address); /* A */ > > > > > > /* Make a quick check before getting the lock */ > > > > > > if (!sync && !pte_present(*pte)) { > > > > > > pte_unmap(pte); > > > > > > return NULL; > > > > > > } > > > > > > > > > > > > ptl = pte_lockptr(mm, pmd); > > > > > > spin_lock(ptl); > > > > > > [...] > > > > > > > > > > > > So at point A we make a new mapping of a PTE without yet holding the > > > > > > corresponding PTE lock and this is precisely the point at which > > > > > > things > > > > > > start to go wrong for us... (coincidence? I think not ;-)) > > > > > > > > > > > > I wonder how this interacts with the logic in > > > > > > arch/x86/xen/mmu.c:xen_pin_page() which holds the lock while > > > > > > waiting for > > > > > > the (deferred) pin multicall to occur? Hmm, no this is about the > > > > > > PagePinned flag on the struct page which is out of date WRT the > > > > > > actual > > > > > > pinned status as Xen sees it -- we update the PagePinned flag early > > > > > > in > > > > > > xen_pin_page() long before Xen the pin hypercall so this window is > > > > > > the > > > > > > other way round to what would be needed to trigger this bug. > > > > > > > > > > > > On the other hand xen_unpin_page() looks like it sets up something > > > > > > roughly like what we need for this issue to trigger. > > > > > > > > > > > > Pasi in additional to my other mad hack could you try this: > > > > > > > > > > > > > > > > Ok.. do you want me to try first without this patch? Or should I > > > > > cancel my > > > > > kernel compilation and apply this aswell? :) > > > > > > > > Can you try the first patch first then add this one please. > > > > > > > > > > Ok. Will do. > > > > > > I was already starting to feel like 'maybe my hardware is broken' but now > > > that > > > code looks like it might be an actual bug :) > > > > > > Let's see. > > > > > > > Crash with only the first patch applied: > > http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-log-05-with-highpte-no-swap-with-debug3.txt > > > > Now I'll try with the second one included aswell.. > > > > And here's one with the second patch applied aswell: > http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-log-06-with-highpte-no-swap-with-debug4.txt > > Seems to be different.. Xen is not complaining anymore.. > And here's one with only the second patch applied: http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-log-07-with-highpte-no-swap-with-debug5.txt Now Xen is complaining again.. does that sound correct? -- Pasi _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |