[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v2 03/23] x86: zero BSS using stosl instead of stosb
>>> On 22.07.15 at 13:22, <andrew.cooper3@xxxxxxxxxx> wrote: > On 22/07/15 11:04, Jan Beulich wrote: >>>>> On 22.07.15 at 10:42, <andrew.cooper3@xxxxxxxxxx> wrote: >>> In the case of having aligned source and destination on a 16-byte >>> boundary (which we can trivially arrange), then ERMSB (to give it its >>> Intel name) and rep stosl differ only in the setup cost; they still >>> scale at the same rate for changes in length. >>> >>> Therefore, assuming we arrange for 16-byte alignment, using rep stosl >>> would appear to be a single 60ish cycle hit over using ERMSB, but would >>> be substantially more efficient than using rep stosb on a non-ERMSB system. >>> >>> Overall, I think 16 byte alignment and rep stosl is the best compromise. >> Or leaving such code alone, with the assumption that over time the >> setup cost (on a growing number of systems) outweighs the benefits >> (on a shrinking set). > > The BSS is large - 295k on the last compile I have from staging. The > setup cost is lost in the nose compared to the elapsed time to write > that many zeroes to memory. > > Therefore, on an ERMBS-capable system, the two options will complete in > the same amount of time. > > However, on all AMD hardware and Intel hardware older than IvyBridge, > rep stosl is 4 times faster than rep stosb. Well, okay then. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |