[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] iommu/quirk: disable shared EPT for Sandybridge and earlier processors.
> From: Andrew Cooper [mailto:andrew.cooper3@xxxxxxxxxx] > Sent: Thursday, December 03, 2015 7:19 PM > > On 03/12/15 08:50, Tian, Kevin wrote: > >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx] > >> Sent: Thursday, December 03, 2015 4:18 PM > >> > >>>>> On 03.12.15 at 03:40, <kevin.tian@xxxxxxxxx> wrote: > >>> Just confirmed internally with HW team. On SNB 4KB cache is always > >>> used regardless of 4KB/2MB/1GB mapping. There'd be another reason > >>> for this 40% drop observation... > >> So when they stated that the 4k TLB gets always used, did they at > >> least provide some thoughts on what else might be causing this > >> severe a performance impact? Without them helping we're left > >> guessing... > >> > > Unfortunately no clear answer... > > http://networkbuilders.intel.com/docs/Network_Builders_RA_vBRAS_Final.pdf > > Page 42: "The IOTLB on the previous generation Intel Xeon Processor > E5-2690 does not natively support huge pages (it emulates them using 4K > pages)." > > And Figure 51 on Page 43 > > The "emulates them using 4K pages" probably means that the IOTLB is > flushed and filled with 512 adjacent 4k mappings. > > Citrix's measurements back up the findings in that paper, and also show > that performance is better when using plain 4k mappings as opposed to > emulated 2M mappings. > Thanks for the information. I'll forward it to HW team. If above interpretation is correct (which also matches my thought), then for two options you listed earlier: --- > This leaves two options > 1) 2M mappings are entirely uncached > 2) 2M mappings are shattered to 4K mappings and cached > The fact there is a 40% performance reduction suggests 1 rather than 2. --- looks 2) is suggested rather than 1). There are two further options: 2.1) 2M mappings are shattered to 512 adjacent 4k mappings which are all cached; 2.2) Only the 4k mapping out of 2M mapping is cached for the page being accessed; for 2.1), as IOTLB entries are limited, it may cause unnecessary IOTLB entry flushes and thus incurs more page walking overhead to fill-in. for 2.2), I can't think out a reason to cause performance drop. Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |