Xen project Mailing List

Re: [PATCH v2] Reduce assembly code size of exception entry points

To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

From: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Wed, 14 Feb 2024 17:05:47 +0100

Cc: Jan Beulich <jbeulich@xxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Wei Liu <wl@xxxxxxx>, Frediano Ziglio <frediano.ziglio@xxxxxxxxx>

Delivery-date: Wed, 14 Feb 2024 16:06:05 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Wed, Feb 14, 2024 at 03:53:24PM +0000, Andrew Cooper wrote: > On 14/02/2024 3:29 pm, Roger Pau Monné wrote: > > On Wed, Feb 14, 2024 at 04:08:12PM +0100, Jan Beulich wrote: > >> On 14.02.2024 16:02, Roger Pau Monné wrote: > >>> On Wed, Feb 14, 2024 at 10:35:58AM +0000, Frediano Ziglio wrote: > >>>> We just pushed a 8-bytes zero and exception constants are > >>>> small so we can just write a single byte saving 3 bytes for > >>>> instruction. > >>>> With ENDBR64 this reduces the size of many entry points from 32 to > >>>> 16 bytes (due to alignment). > >>>> Similar code is already used in autogen_stubs. > >>> Will using movb instead of movl have any performance impact? I don't > >>> think we should trade speed for code size, so this needs to be > >>> mentioned in the commit message. > >> That's really what the last sentence is about (it could have been said > >> more explicitly though): If doing so on interrupt paths is fine, it > >> ought to be fine on exception paths as well. > > I might view it the other way around: maybe it's autogen_stubs that > > needs changing to use movl instead of movb for performance reasons? > > > > I think this needs to be clearly stated, and ideally some kind of > > benchmarks should be provided to demonstrate no performance change if > > there are doubts whether movl and movb might perform differently. > > The push and the mov are overlapping stores either way. Swapping > between movl and movb will make no difference at all. > > However, the shorter instruction ends up halving the size of the entry > stub when alignment is considered, and that will make a marginal > difference. Fewer cache misses (to a first approximation, even #PF will > be L1-cold), and better utilisation of branch prediction resource (~> > less likely to be BP-cold). > > I doubt you'll be able to see a difference without perf counters > (whatever difference is covered here will be dwarfed by the speculation > workarounds), but a marginal win is still a win. I'm happy just stating in the commit message that the change doesn't make any performance difference. Thanks, Roger.

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.