[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [XenPPC] Re: copy_4K_page() doesn't use dcbtst?
Hollis Blanchard writes: > Hi Paul, some Xen people were just noticing that copy_4K_page > (arch/powerpc/lib/copypage_64.S) doesn't use the dcbtst instruction. Why > doesn't it help there? Why would we want to read the cache lines for the destination from memory when we're only going to overwrite them completely anyway? A stronger argument would be for using dcbz, but IIRC it actually made things slower (on POWER4 at least). I suspect the hardware is gathering the stores for the whole of each cache line automatically, so using dcbz doesn't provide any benefit. I did a lot of measurements of memory copy speed on POWER4 (using different copy loops, copy sizes, alignments, cache hot/cold cases) and the copy_4K_page loop is the fastest I could come up with for POWER4. If anyone can come up with a routine that is measurably faster on current machines, I'm happy to look at it, of course. Paul. _______________________________________________ Xen-ppc-devel mailing list Xen-ppc-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-ppc-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |