[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v2 03/23] x86: zero BSS using stosl instead of stosb
On 22/07/2015 06:18, Jan Beulich wrote: >>>> Daniel Kiper <daniel.kiper@xxxxxxxxxx> 07/21/15 8:23 PM >>> >> On Tue, Jul 21, 2015 at 03:37:48AM -0600, Jan Beulich wrote: >>>>>> On 20.07.15 at 16:28, <daniel.kiper@xxxxxxxxxx> wrote: >>> ... because of ??? Nowadays - with X86_FEATURE_ERMS - rep stosb >>> is expected to be faster than rep stosl. >> OK, I did not know about that. However, as I know this feature >> was introduced in 2012 with Ivy Bridge. So, I suppose that there >> are still a lot of machines in the wild which does not support it. >> Anyway, because this code is not performance critical I am not going >> to insist on one or another solution. However, Andrew suggested that >> thing, so, please agree with him in which direction we should go. >> I will do what you agree. > ISTR having seen a similar patch from him(?), maybe in another area > of code, before (or was it v1 of this one?), which I responded to with the > same as above. Indeed you have, several in fact. I had not had chance to delve into the optimisation manuals, but have taken a peek now. (Section 3.7.6) In the case of having aligned source and destination on a 16-byte boundary (which we can trivially arrange), then ERMSB (to give it its Intel name) and rep stosl differ only in the setup cost; they still scale at the same rate for changes in length. Therefore, assuming we arrange for 16-byte alignment, using rep stosl would appear to be a single 60ish cycle hit over using ERMSB, but would be substantially more efficient than using rep stosb on a non-ERMSB system. Overall, I think 16 byte alignment and rep stosl is the best compromise. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |