|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 4/6] x86/trace: Reduce stack usage from HVMTRACE_ND()
On 20/09/2021 10:05, Jan Beulich wrote:
> On 17.09.2021 10:45, Andrew Cooper wrote:
>> It is pointless to write all 6 entries and only consume the useful subset.
>> bloat-o-meter shows quite how obscene the overhead is in
>> vmx_vmexit_handler(),
>> weighing in at 11% of the function arranging unread zeroes on the stack, and
>> 8% for svm_vmexit_handler().
>>
>> add/remove: 0/0 grow/shrink: 0/20 up/down: 0/-1867 (-1867)
>> Function old new delta
>> hvm_msr_write_intercept 1049 1033 -16
>> vmx_enable_intr_window 238 214 -24
>> svm_enable_intr_window 337 313 -24
>> hvmemul_write_xcr 115 91 -24
>> hvmemul_write_cr 350 326 -24
>> hvmemul_read_xcr 115 91 -24
>> hvmemul_read_cr 146 122 -24
>> hvm_mov_to_cr 438 414 -24
>> hvm_mov_from_cr 253 229 -24
>> vmx_intr_assist 1150 1118 -32
>> svm_intr_assist 459 427 -32
>> hvm_rdtsc_intercept 138 106 -32
>> hvm_msr_read_intercept 898 866 -32
>> vmx_vmenter_helper 1142 1094 -48
>> vmx_inject_event 813 765 -48
>> svm_vmenter_helper 238 190 -48
>> hvm_hlt 197 146 -51
>> svm_inject_event 1678 1614 -64
>> svm_vmexit_handler 5880 5416 -464
>> vmx_vmexit_handler 7281 6473 -808
>> Total: Before=3644184, After=3642317, chg -0.05%
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx>
Thanks, but this is buggy. There are direct callers of HVMTRACE_ND()
which need adjustments too.
There is also a further optimisation for the 0 case which drops even more.
>
>> Normally I wouldn't recommend patches like this for backport, but
>> {vmx,svm}_vmexit_handler() are fastpaths and this is a *lot* of I-cache lines
>> dropped...
> The change in size is indeed unexpectedly large for these two functions.
> However, what I find puzzling is that TRACEBUFFER is enabled by default
> (i.e. also in release builds) in the first place, and that it can only
> be disabled when EXPERT.
Its not surprising in the slightest. TRACEBUFFER long predates Kconfig.
> More gains could be had towards dropped code if
> the option wasn't on by default in at least release builds. "Debugging
> or performance analysis" (as its help text says) after all isn't a
> primary target of release builds.
All performance analysis needs doing on release builds.
> IOW what I'd prefer to consider a backport candidate would be a patch
> changing the option's default. Thoughts?
I very much doubt that XenServer are the only people who use xentrace in
customer environments.
I'm -1 to changing the default in staging, and firmly against doing so
in older releases.
>> --- a/xen/include/asm-x86/hvm/trace.h
>> +++ b/xen/include/asm-x86/hvm/trace.h
>> @@ -67,38 +67,30 @@
>> #define TRACE_2_LONG_4D(_e, d1, d2, d3, d4, ...) \
>> TRACE_6D(_e, d1, d2, d3, d4)
>>
>> -#define HVMTRACE_ND(evt, modifier, cycles, count, d1, d2, d3, d4, d5, d6) \
>> +#define HVMTRACE_ND(evt, modifier, cycles, ...) \
>> do { \
>> if ( unlikely(tb_init_done) && DO_TRC_HVM_ ## evt ) \
>> { \
>> - struct { \
>> - u32 d[6]; \
>> - } _d; \
>> - _d.d[0]=(d1); \
>> - _d.d[1]=(d2); \
>> - _d.d[2]=(d3); \
>> - _d.d[3]=(d4); \
>> - _d.d[4]=(d5); \
>> - _d.d[5]=(d6); \
>> + uint32_t _d[] = { __VA_ARGS__ }; \
>> __trace_var(TRC_HVM_ ## evt | (modifier), cycles, \
>> - sizeof(*_d.d) * count, &_d); \
>> + sizeof(_d), _d); \
>> } \
>> } while(0)
>>
>> #define HVMTRACE_6D(evt, d1, d2, d3, d4, d5, d6) \
>> - HVMTRACE_ND(evt, 0, 0, 6, d1, d2, d3, d4, d5, d6)
>> + HVMTRACE_ND(evt, 0, 0, d1, d2, d3, d4, d5, d6)
>> #define HVMTRACE_5D(evt, d1, d2, d3, d4, d5) \
>> - HVMTRACE_ND(evt, 0, 0, 5, d1, d2, d3, d4, d5, 0)
>> + HVMTRACE_ND(evt, 0, 0, d1, d2, d3, d4, d5)
>> #define HVMTRACE_4D(evt, d1, d2, d3, d4) \
>> - HVMTRACE_ND(evt, 0, 0, 4, d1, d2, d3, d4, 0, 0)
>> + HVMTRACE_ND(evt, 0, 0, d1, d2, d3, d4)
>> #define HVMTRACE_3D(evt, d1, d2, d3) \
>> - HVMTRACE_ND(evt, 0, 0, 3, d1, d2, d3, 0, 0, 0)
>> + HVMTRACE_ND(evt, 0, 0, d1, d2, d3)
>> #define HVMTRACE_2D(evt, d1, d2) \
>> - HVMTRACE_ND(evt, 0, 0, 2, d1, d2, 0, 0, 0, 0)
>> + HVMTRACE_ND(evt, 0, 0, d1, d2)
>> #define HVMTRACE_1D(evt, d1) \
>> - HVMTRACE_ND(evt, 0, 0, 1, d1, 0, 0, 0, 0, 0)
>> + HVMTRACE_ND(evt, 0, 0, d1)
>> #define HVMTRACE_0D(evt) \
>> - HVMTRACE_ND(evt, 0, 0, 0, 0, 0, 0, 0, 0, 0)
>> + HVMTRACE_ND(evt, 0, 0)
> These HVMTRACE_<n>D() aren't this much of a gain anymore; perhaps down
> the road we will want to have just a single wrapper macro adding the
> modifier and cycles arguments, otherwise making use of variable
> arguments as well?
Same on the plain TRACE() side. There is an awful lot of cleanup to do
here.
Other findings include HVM records using the non-HVM helpers (to have
cycles included), and examples such as vpic_ack_pending_irq() which
duplicate calls vlapic_accept_pic_intr(), causing 3 trace records to be
written out.
~Andrew
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |