[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] iommu/quirk: disable shared EPT for Sandybridge and earlier processors.



> From: Andrew Cooper [mailto:andrew.cooper3@xxxxxxxxxx]
> Sent: Thursday, December 03, 2015 7:19 PM
> 
> On 03/12/15 08:50, Tian, Kevin wrote:
> >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> >> Sent: Thursday, December 03, 2015 4:18 PM
> >>
> >>>>> On 03.12.15 at 03:40, <kevin.tian@xxxxxxxxx> wrote:
> >>> Just confirmed internally with HW team. On SNB 4KB cache is always
> >>> used regardless of 4KB/2MB/1GB mapping. There'd be another reason
> >>> for this 40% drop observation...
> >> So when they stated that the 4k TLB gets always used, did they at
> >> least provide some thoughts on what else might be causing this
> >> severe a performance impact? Without them helping we're left
> >> guessing...
> >>
> > Unfortunately no clear answer...
> 
> http://networkbuilders.intel.com/docs/Network_Builders_RA_vBRAS_Final.pdf
> 
> Page 42: "The IOTLB on the previous generation Intel Xeon Processor
> E5-2690 does not natively support huge pages (it emulates them using 4K
> pages)."
> 
> And Figure 51 on Page 43
> 
> The "emulates them using 4K pages" probably means that the IOTLB is
> flushed and filled with 512 adjacent 4k mappings.
> 
> Citrix's measurements back up the findings in that paper, and also show
> that performance is better when using plain 4k mappings as opposed to
> emulated 2M mappings.
> 

Thanks for the information. I'll forward it to HW team.

If above interpretation is correct (which also matches my thought), then
for two options you listed earlier:

---
> This leaves two options
> 1) 2M mappings are entirely uncached
> 2) 2M mappings are shattered to 4K mappings and cached

> The fact there is a 40% performance reduction suggests 1 rather than 2.
---

looks 2) is suggested rather than 1). There are two further options:

2.1) 2M mappings are shattered to 512 adjacent 4k mappings which are all
cached;
2.2) Only the 4k mapping out of 2M mapping is cached for the page being 
accessed;

for 2.1), as IOTLB entries are limited, it may cause unnecessary IOTLB
entry flushes and thus incurs more page walking overhead to fill-in.

for 2.2), I can't think out a reason to cause performance drop.

Thanks
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.