[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 4/6] x86/trace: Reduce stack usage from HVMTRACE_ND()


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Mon, 20 Sep 2021 12:02:33 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=0x5ChsYwve31t/TPkJ37BAV51m9YgYbZ4FHOk28I/QU=; b=Ig9IBDpUaGohQ+f2mUIBOS17mNMTkhEnYz5XsUYy51s/Hd8bGRaS+wN50ohrmg2r4IQAdONwmmDiocm5+9IM5kpNZll1WRtOQVvXGuRvc4yKe4pRrhvJJfOX7/eVnjEvjG5+1EM2P9AYIELdFmTz/4co5EcQCMnizEatyscQ3z/YGkDsgpNdknPcpzMbI/VxE34DXP5SHkoSpgYF4OQ+JdUHrDsh3cPsmOJY4X0KKdKjz/npcwTsY48BPgsMKl5mr7Sc6kBu+Bk4SDzp4RIORhiW7ea/4oRT1AZSqD9eIkYl3S3FpSPl089+qa/idE41xELQyakurUh6csXe434DOw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=G2yFAtAIAhN8CjLgLgdH6Fb+6cI5Lc/XPvidAMUMwLZnfhbp15JhrH79GCmJcXlCQNHVuSLD24DscxIKUkp6wuduzqgXZQQx4ETiTrz+HGaIjui5hYpXbCF1U367LgvtqnPtenk6qkkvlcZ2kBJhMc9B2pbZWfFqhF4iDVyvHHJPP5woeFjVOegGAHKorp0cPHCbV/m7z61oOqsvpPVnCXNC0tlrOQ0OBoDAnYVkNXnoJWe0I/+cWS4TVTYMqu9eR2hhW4pIoeRVU01hHWL5fdKDhUjbgs2Zq7K3jFxl5ZwYQMUipatzRg9qLjUaGbBFCQqBGKqmuAYCfS6C35cYRg==
  • Authentication-results: esa1.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
  • Cc: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, Ian Jackson <iwj@xxxxxxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Dario Faggioli <dfaggioli@xxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Mon, 20 Sep 2021 11:02:51 +0000
  • Ironport-data: A9a23:Ks3aeKAQl8oQWxVW/6jkw5YqxClBgxIJ4kV8jS/XYbTApD0g12EOn zFJX2yBMv+MM2uhettyPoji9hhX7ZXTmodqQQY4rX1jcSlH+JHPbTi7wuYcHM8wwunrFh8PA xA2M4GYRCwMo/u1Si6FatANl1ElvU2zbue6WLOs1hxZH1c+EX9w00o7wYbVv6Yz6TSHK1LV0 T/Ni5W31G+Ng1aY5UpNtspvADs21BjDkGtwUm4WPJinj3eH/5UhN7oNJLnZEpfNatI88thW5 Qr05OrREmvxp3/BAz4++1rxWhVirrX6ZWBihpfKMkQLb9crSiEai84G2PQghUh/zAmO3MFek 9x0rLvqWF0DPIfBks00TEwNe81+FfUuFL7vJHG+tYqYzlHccmuqyPJrZK00FdRGoKAtWzgIr KFGbmBWBvyAr7veLLaTY+9gnMk8auLsO5sSoCpIxjDFF/c2B5vERs0m4PcFh2pq3Z0XR54yY eIweDVjQEnnPSRNPw4PI9Umt8qzvyXwJmgwRFW9+vNsvjm7IBZK+KP2LNPfd9iORMNUtkWVv GTL+yL+GB5yHN2AyxKV/3S0nOjNkCjnHoUIG9WQ9PRnnVmSzWw7EwANWB2wpvzRolG6c8JSL QoT4CVGhbg/8gmnQ8fwWzW8oWWYpVgMVtxICeo45QqRjK3O7G6xCmYNQTFAb9gnnNQrXjFs3 ViM9/vrGDhuvbu9WX+bsLCOoluP1TM9dDFYI3VeFE1cvoel8NpbYg/zoshLV5KyzczaQjjJ2 hfQvXIUhZ8eh+ciyPDulbzYuA5AtqQlXyZsuF6ODjn0v1sjDGK2T9f3sgmAtJ6sOK7cFwPY7 SZew6By+chTVcnlqcCbfAka8FhFDd6+OTvAiBZEG5A7/lxBEFbyINgNvFmSyKpvW/vomAMFg meI4mu9B7cJZRNGiJObhKrrUKwXIVDIT4iNaxwtRoMmjmJNmOq7EMZGPhT44owQuBJ0zfFX1 WmzKJ7xZZrlNUiX5GXvHLpMuVPa7gs/2XnSVfjGI+ePiOHFDEN5vYwtaQPUBshgtfvsiFyMr 753apvboz0CAbaWSnSGruYuwaUicCFT6Wbe8JcMKIZu42NORQkcNhMm6elwI9A6x/sFyLigE 7PUchYw9WcTTEbvcG2iQntidKnuTdB4q3c6NjYrJlGmxz4oZoPH0UvVX8JfkWAP+LMxwPhqY eMCfsncUP1DRi6eo2YWbIXnrZwkfxOu3FrcMy2gaTk5XphhWw2WpYO0IlqxrHEDXnitqM8zg 7y8zQeHE5ANcBtvUZTNY/W1wlLv4XVEwLBuX1HFK8V4cVn39NQ4MDT4i/I6epleKRjKyjaA+ RyRBBMU+bvEr4MvqYGbjqGYtYa5VeB5GxMCTWXc6L+3Mwjc/3aintAcALrZI2iFWTqtqqu4Z OhTw/XtC9E9nQ5H49hmDrJm7aMi/N+z9bVU+RtpQSfQZFOxB7I+fnTfhZtTtrdAz6NysBetX h7d4cFTPLiENZ+3EFMVIwZ5PO2P2etNx2vX5PUxZk77+DV27PyMVkALZ0uAjylULb1UNoI5w Lh+5J5KulLn0hd6YMybii109niXKi1SWqoqgZgWHYv3h1d50VpFe5HdVnf77Zznhw+g6aX2z ut4XJb/uok=
  • Ironport-hdrordr: A9a23:KLvlt651e17sb31w1QPXwXOBI+orL9Y04lQ7vn2ZFiY6TiXIra +TdaoguSMc0AxhI03I6urwQpVoIEmsuaKdhLNxAV7MZniehILFFvAB0WKA+UyuJ8SdzJ8k6U 4IScEXY7ecbSkYsS+T2njgLz9K+qjizEncv5a5854bd3AMV0gP1XYdNi+rVmlNACVWD5swE5 SRouBdoSC7RHgRZsOnQlEYQunqvbTw5dzbSC9DIyRixBiFjDuu5rK/OQOfxA0iXzRGxqpn2X TZkjb++r6ov5iAu17hPi7ontRrcenau5l+7f+3+40ow/LX+0KVjbFaKv6/VfYO0aaSARgR4Z /xSlwbTrlOAjvqDx2ISF3WqlHdOX8VmgDf4E7djn35rcPjQjUmT8JHmIJCaxPcr1Etpddmzc twriqkXjVsfGD9dQnGlq71vitR5wOJSLsZ4Jwupm0aVZFbZK5arIQZ8k8QGJAcHDji4IRiFO V1FsnT6PtfbFvfNhnizyNS6c3pWm52EgaNQ0AEtMDQ2z9KnGphx09dwMAEhH8P+J80VpEB7e XZNaZjkq1IU6YtHOlALfZERdHyBn3GQBrKPm7XKVP7FLsfM3aIsJLz6KVd3pDcRHXJ9upHpH 3laiIUiYcfQTOeNSS+5uw/zvmWehTCYd3E8LAv26RE
  • Ironport-sdr: qX5kjBJrjxrLDa3GCfAWyqc6seaOQor1M+lF8PfbbOM2CNDypjRcP0UQbRCW1L9AyYptZRLked RubN3sr4Ijez4yRouJvq7kQyRi/oRIoQNbxnjt92j9zpqf1QxLGI8prxQx7TXdZ+wJN/FJZbaA DOSkUC/2prsjW5E/2PRnWXm5ezXW+gNnqmbgnwcQiAhijbzP2Xpss4huyINlLXJS5Kufyq0jfu 4Gtq6maZmr75ivWVYHOhfLyNGy83Ln7JmrTArWiS5OyFAP2DIfag4jWuLUx5Zfwhfq/0ZFV651 122b/RfLfH5sabAd28y1ZfkS
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 20/09/2021 10:05, Jan Beulich wrote:
> On 17.09.2021 10:45, Andrew Cooper wrote:
>> It is pointless to write all 6 entries and only consume the useful subset.
>> bloat-o-meter shows quite how obscene the overhead is in 
>> vmx_vmexit_handler(),
>> weighing in at 11% of the function arranging unread zeroes on the stack, and
>> 8% for svm_vmexit_handler().
>>
>>   add/remove: 0/0 grow/shrink: 0/20 up/down: 0/-1867 (-1867)
>>   Function                                     old     new   delta
>>   hvm_msr_write_intercept                     1049    1033     -16
>>   vmx_enable_intr_window                       238     214     -24
>>   svm_enable_intr_window                       337     313     -24
>>   hvmemul_write_xcr                            115      91     -24
>>   hvmemul_write_cr                             350     326     -24
>>   hvmemul_read_xcr                             115      91     -24
>>   hvmemul_read_cr                              146     122     -24
>>   hvm_mov_to_cr                                438     414     -24
>>   hvm_mov_from_cr                              253     229     -24
>>   vmx_intr_assist                             1150    1118     -32
>>   svm_intr_assist                              459     427     -32
>>   hvm_rdtsc_intercept                          138     106     -32
>>   hvm_msr_read_intercept                       898     866     -32
>>   vmx_vmenter_helper                          1142    1094     -48
>>   vmx_inject_event                             813     765     -48
>>   svm_vmenter_helper                           238     190     -48
>>   hvm_hlt                                      197     146     -51
>>   svm_inject_event                            1678    1614     -64
>>   svm_vmexit_handler                          5880    5416    -464
>>   vmx_vmexit_handler                          7281    6473    -808
>>   Total: Before=3644184, After=3642317, chg -0.05%
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx>

Thanks, but this is buggy.  There are direct callers of HVMTRACE_ND()
which need adjustments too.

There is also a further optimisation for the 0 case which drops even more.

>
>> Normally I wouldn't recommend patches like this for backport, but
>> {vmx,svm}_vmexit_handler() are fastpaths and this is a *lot* of I-cache lines
>> dropped...
> The change in size is indeed unexpectedly large for these two functions.
> However, what I find puzzling is that TRACEBUFFER is enabled by default
> (i.e. also in release builds) in the first place, and that it can only
> be disabled when EXPERT.

Its not surprising in the slightest.  TRACEBUFFER long predates Kconfig.

>  More gains could be had towards dropped code if
> the option wasn't on by default in at least release builds. "Debugging
> or performance analysis" (as its help text says) after all isn't a
> primary target of release builds.

All performance analysis needs doing on release builds. 

> IOW what I'd prefer to consider a backport candidate would be a patch
> changing the option's default. Thoughts?

I very much doubt that XenServer are the only people who use xentrace in
customer environments.

I'm -1 to changing the default in staging, and firmly against doing so
in older releases.

>> --- a/xen/include/asm-x86/hvm/trace.h
>> +++ b/xen/include/asm-x86/hvm/trace.h
>> @@ -67,38 +67,30 @@
>>  #define TRACE_2_LONG_4D(_e, d1, d2, d3, d4, ...) \
>>      TRACE_6D(_e, d1, d2, d3, d4)
>>  
>> -#define HVMTRACE_ND(evt, modifier, cycles, count, d1, d2, d3, d4, d5, d6) \
>> +#define HVMTRACE_ND(evt, modifier, cycles, ...)                           \
>>      do {                                                                  \
>>          if ( unlikely(tb_init_done) && DO_TRC_HVM_ ## evt )               \
>>          {                                                                 \
>> -            struct {                                                      \
>> -                u32 d[6];                                                 \
>> -            } _d;                                                         \
>> -            _d.d[0]=(d1);                                                 \
>> -            _d.d[1]=(d2);                                                 \
>> -            _d.d[2]=(d3);                                                 \
>> -            _d.d[3]=(d4);                                                 \
>> -            _d.d[4]=(d5);                                                 \
>> -            _d.d[5]=(d6);                                                 \
>> +            uint32_t _d[] = { __VA_ARGS__ };                              \
>>              __trace_var(TRC_HVM_ ## evt | (modifier), cycles,             \
>> -                        sizeof(*_d.d) * count, &_d);                      \
>> +                        sizeof(_d), _d);                                  \
>>          }                                                                 \
>>      } while(0)
>>  
>>  #define HVMTRACE_6D(evt, d1, d2, d3, d4, d5, d6)    \
>> -    HVMTRACE_ND(evt, 0, 0, 6, d1, d2, d3, d4, d5, d6)
>> +    HVMTRACE_ND(evt, 0, 0, d1, d2, d3, d4, d5, d6)
>>  #define HVMTRACE_5D(evt, d1, d2, d3, d4, d5)        \
>> -    HVMTRACE_ND(evt, 0, 0, 5, d1, d2, d3, d4, d5,  0)
>> +    HVMTRACE_ND(evt, 0, 0, d1, d2, d3, d4, d5)
>>  #define HVMTRACE_4D(evt, d1, d2, d3, d4)            \
>> -    HVMTRACE_ND(evt, 0, 0, 4, d1, d2, d3, d4,  0,  0)
>> +    HVMTRACE_ND(evt, 0, 0, d1, d2, d3, d4)
>>  #define HVMTRACE_3D(evt, d1, d2, d3)                \
>> -    HVMTRACE_ND(evt, 0, 0, 3, d1, d2, d3,  0,  0,  0)
>> +    HVMTRACE_ND(evt, 0, 0, d1, d2, d3)
>>  #define HVMTRACE_2D(evt, d1, d2)                    \
>> -    HVMTRACE_ND(evt, 0, 0, 2, d1, d2,  0,  0,  0,  0)
>> +    HVMTRACE_ND(evt, 0, 0, d1, d2)
>>  #define HVMTRACE_1D(evt, d1)                        \
>> -    HVMTRACE_ND(evt, 0, 0, 1, d1,  0,  0,  0,  0,  0)
>> +    HVMTRACE_ND(evt, 0, 0, d1)
>>  #define HVMTRACE_0D(evt)                            \
>> -    HVMTRACE_ND(evt, 0, 0, 0,  0,  0,  0,  0,  0,  0)
>> +    HVMTRACE_ND(evt, 0, 0)
> These HVMTRACE_<n>D() aren't this much of a gain anymore; perhaps down
> the road we will want to have just a single wrapper macro adding the
> modifier and cycles arguments, otherwise making use of variable
> arguments as well?

Same on the plain TRACE() side.  There is an awful lot of cleanup to do
here.

Other findings include HVM records using the non-HVM helpers (to have
cycles included), and examples such as vpic_ack_pending_irq() which
duplicate calls vlapic_accept_pic_intr(), causing 3 trace records to be
written out.

~Andrew




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.