[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0



On Thu, 2009-06-04 at 16:26 -0400, Pasi KÃrkkÃinen wrote:
> What do you suggest me to try next? 

I'm at a bit of a loss to be honest...

It's interesting that it's always kswapd0 even in the case with no swap
configured. Were you running with CONFIG_SWAP=n or just with the swap
device turned off?

Judging from the backtrace the sequence of events seems to be roughly:
kswapd<0> runs and calls balance_pgdat which calls shrink_zone who in
turn calls shrink_active_list if inactive_anon_is_low() (so I think we
are dealing with anon pages). shrink_active_list() then iterates over a
list of pages calling page_referenced() on each one. page_referenced()
eventually calls down to page_referenced_one() (presumably via
page_referenced_anon()) and eventually to page_check_address() which
walks the page table and attempts to map the PTE page. This goes via
pte_offset_map() to kmap_atomic_pte() then xen_kmap_atomic_pte(). Here
we check if the page is pinned and then attempt to map it, since we
_think_ the page is not pinned the mapping is writable. However at this
point Xen reports that the page really is pinned (28000001 => Page Type
1 == L1 PT) and we are trying to make a writable mapping (e0000000 =>
Page Type 7 == Writable) which is disallowed.

Do you know which line of xen_set_pte() the fault is occurring at? I
assume either "ptep->pte_high =" or "ptep->pte_low =".

So the question is -- how come we have a page which is pinned but this
fact is not recorded in the struct page information? It might be
interesting to know if the corresponding L3 PT is pinned. If the mm is
active then this should always be the case and I _think_ it would be a
bug for the L3 to be pinned but not all the which L1s which it contains.
Can you try this patch which tries to notice this situation and prints
some potentially interesting information, similarly on the fault it
dumps a little more info. Since I can't repro I've only tested that it
doesn't break normal use, I've not actually seen the debug trigger...

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index abe8e4b..483bad7 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -285,7 +285,7 @@ check_v8086_mode(struct pt_regs *regs, unsigned long 
address,
                tsk->thread.screen_bitmap |= 1 << bit;
 }
 
-static void dump_pagetable(unsigned long address)
+void dump_pagetable(unsigned long address)
 {
        __typeof__(pte_val(__pte(0))) page;
 
@@ -603,6 +603,10 @@ show_fault_oops(struct pt_regs *regs, unsigned long 
error_code,
        printk_address(regs->ip, 1);
 
        dump_pagetable(address);
+       printk(KERN_CRIT "Fixmap KM_PTE0 @ %#lx\n", fix_to_virt(KM_PTE0));
+       dump_pagetable(fix_to_virt(KM_PTE0));
+       printk(KERN_CRIT "Fixmap KM_PTE0 @ %#lx\n", fix_to_virt(KM_PTE1));
+       dump_pagetable(fix_to_virt(KM_PTE1));
 }
 
 static noinline void
diff --git a/include/xen/swiotlb.h b/include/xen/swiotlb.h
index f35183b..5db8659 100644
--- a/include/xen/swiotlb.h
+++ b/include/xen/swiotlb.h
@@ -5,6 +5,10 @@ extern void xen_swiotlb_fixup(void *buf, size_t size, unsigned 
long nslabs);
 extern phys_addr_t xen_bus_to_phys(dma_addr_t daddr);
 extern dma_addr_t xen_phys_to_bus(phys_addr_t paddr);
 extern int xen_range_needs_mapping(phys_addr_t phys, size_t size);
+#ifdef CONFIG_XEN_PCI
 extern int xen_wants_swiotlb(void);
+#else
+static inline int xen_wants_swiotlb(void) { return 0; }
+#endif
 
 #endif /* _XEN_SWIOTLB_H */
diff --git a/mm/rmap.c b/mm/rmap.c
index 1652166..ae5d5a0 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -267,6 +267,7 @@ unsigned long page_address_in_vma(struct page *page, struct 
vm_area_struct *vma)
 pte_t *page_check_address(struct page *page, struct mm_struct *mm,
                          unsigned long address, spinlock_t **ptlp, int sync)
 {
+       struct page *pgd_page, *pte_page;
        pgd_t *pgd;
        pud_t *pud;
        pmd_t *pmd;
@@ -285,6 +286,22 @@ pte_t *page_check_address(struct page *page, struct 
mm_struct *mm,
        if (!pmd_present(*pmd))
                return NULL;
 
+       pgd_page = virt_to_page(mm->pgd);
+       pte_page = pmd_page(*pmd);
+
+       if (PagePinned(pgd_page) != PagePinned(pte_page)) {
+               extern void dump_pagetable(unsigned long address);
+               printk(KERN_CRIT "L4 at %p is %s contains L2 at %p which points 
at an L1 which is %s %s\n",
+                      pgd, PagePinned(pgd_page) ? "pinned" : "unpinned",
+                      pmd, PagePinned(pte_page) ? "pinned" : "unpinned",
+                      PageHighMem(pte_page) ? "highmem" : "lowmem");
+               printk(KERN_CRIT "address %#lx\n", address);
+               dump_pagetable(address);
+               printk(KERN_CRIT "Fixmap KM_PTE0 @ %#lx\n", 
fix_to_virt(KM_PTE0));
+               dump_pagetable(fix_to_virt(KM_PTE0));
+               printk(KERN_CRIT "Fixmap KM_PTE0 @ %#lx\n", 
fix_to_virt(KM_PTE1));
+               dump_pagetable(fix_to_virt(KM_PTE1));
+       }
        pte = pte_offset_map(pmd, address);
        /* Make a quick check before getting the lock */
        if (!sync && !pte_present(*pte)) {



I'd guess that this would at least work around the issue, I doubt it's a
proper fix and it's going to shaft perf I suspect (not that highpte
won't be doing a pretty good job of that ;-)).

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index fefdeee..4c694e4 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1521,7 +1521,7 @@ static void *xen_kmap_atomic_pte(struct page *page, enum 
km_type type)
 {
        pgprot_t prot = PAGE_KERNEL;
 
-       if (PagePinned(page))
+       if (1 || PagePinned(page))
                prot = PAGE_KERNEL_RO;
 
        if (0 && PageHighMem(page))


Ian.



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.