[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [XenPPC] Re: copy_4K_page() doesn't use dcbtst?
A stronger argument would be for using dcbz, but IIRC it actually made things slower (on POWER4 at least). I suspect the hardware is gathering the stores for the whole of each cache line automatically, so using dcbz doesn't provide any benefit. It seems on 970 at least it still is a nice win. Do you have any good benchmarks I could run? I did a lot of measurements of memory copy speed on POWER4 (using different copy loops, copy sizes, alignments, cache hot/cold cases) and the copy_4K_page loop is the fastest I could come up with for POWER4. Yeah, POWER4 is quite a different beast (its memory subsystem, anyway). I'm surprised dcbz hurt though; did you schedule it early enough before the actual data copy? Segher _______________________________________________ Xen-ppc-devel mailing list Xen-ppc-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-ppc-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |