[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] [PATCH] Re: SMP dom0 with 8 cpus of i386
Keir, Ian, With PCI mmconfig option on, and with the PCI express enabled BIOS, the dom0 kernel reads the PCI config from fix-mapped PCI mmconfig space. The PCI mmconfig space is of 256MB size, and it's access is implemented differently on i386 & x86_64. On x86_64 the whole 256MB is mapped in the Kernel virtual address space. On i386 it will consume too much of the kernels virtual address space, hence it is implemented using a single fix-mapped page. This page is mapped to the desired physical address for every PCI mmconfig access, as seen in the following code from mmconfig.c . static inline void pci_exp_set_dev_base(int bus, int devfn) { u32 dev_base = pci_mmcfg_base_addr | (bus << 20) | (devfn << 12); if (dev_base != mmcfg_last_accessed_device) { mmcfg_last_accessed_device = dev_base; set_fixmap_nocache(FIX_PCIE_MCFG, dev_base); } } static int pci_mmcfg_read(unsigned int seg, unsigned int bus, unsigned int devfn, int reg, int len, u32 *value) { unsigned long flags; if (!value || (bus > 255) || (devfn > 255) || (reg > 4095)) return -EINVAL; spin_lock_irqsave(&pci_config_lock, flags); pci_exp_set_dev_base(bus, devfn); switch (len) { At the time of boot the PCI mmconfig space is accessed thousands times, one after another; that causes fixed map & unmap continuously very fast for a long time. Currently the fix-mapped virtual address for Shared_info_page for dom0 & the PCI mmconfig page are adjacent in the fixed_addresses in the fixedmap.h. #ifdef CONFIG_PCI_MMCONFIG FIX_PCIE_MCFG, #endif FIX_SHARED_INFO, FIX_GNTTAB_BEGIN, I am suspecting that this is causing a race condition because of writable page tables. While accessing the PCI mmconfig on i386 the dom0 kernel (cpu 0) is continuously rewriting into the pte for FIX_PCIE_MCFG at a very fast rate. With writable page tables the updates to ptes are deferred. In the SMP case other CPUs are getting the interrupts (timer) at the same time, interrupts handlers access the shared_info page to notify the dom0 of the events such as timer event. The problem possibly is that because of the writable page tables, the L1 page is getting evicted during the mmconfig access, and the shared_page translation needed for event notification is also in the same L1 page. All the cpus are using the same page tables at this time. While writing the pte, the L2 page is getting cut off from the page table. This is somehow causing corruption in the dom0 page tables, and we see the errors. I belive this issue is not on x86_64 because each mmconfig access does not map/unmap fixmap, and the racing condition accessing the l2 page is not there. The current work around working for me is to disable PCI_MMCONFIG for i386 in the xen0 kernel config. Today or later other people will also notice this corruption on SMP boxes with SNMP dom0. I can see it once in a while on a 4 way box. Can we disable PCI_MMCONFIG for i386 in the xen0 config till we solve the race condition issue? Attached is the patch for the config. As I have a workaround and I am seeing issues with VMX guests, I am trying to fix those issues now. Thanks & Regards, Nitin ------------------------------------------------------------------------ ----------- Sr Software Engineer Open Source Technology Center, Intel Corp -----Original Message----- From: Kamble, Nitin A Sent: Tuesday, August 30, 2005 10:06 AM To: Keir Fraser Cc: xen-devel Subject: RE: [Xen-devel] Re: SMP dom0 with 8 cpus of i386 > Default but with smp enabled. Same here. I am seeing the issue inconsistently on a 4 way box. 8 way system does not have any issue with maxcpus=1. with 8 cpus it is consistent. More no of cpus are causing some corruption. It is always happening at the time of reading/writing the pci mmconfig space. I am debugging here. Thanks & Regards, Nitin Attachment:
nopcimmconfig_i386.patch _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |