[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: AMD EPYC virtual network performances


  • To: Elliott Mitchell <ehem+xen@xxxxxxx>
  • From: Jürgen Groß <jgross@xxxxxxxx>
  • Date: Fri, 15 Nov 2024 07:46:07 +0100
  • Autocrypt: addr=jgross@xxxxxxxx; keydata= xsBNBFOMcBYBCACgGjqjoGvbEouQZw/ToiBg9W98AlM2QHV+iNHsEs7kxWhKMjrioyspZKOB ycWxw3ie3j9uvg9EOB3aN4xiTv4qbnGiTr3oJhkB1gsb6ToJQZ8uxGq2kaV2KL9650I1SJve dYm8Of8Zd621lSmoKOwlNClALZNew72NjJLEzTalU1OdT7/i1TXkH09XSSI8mEQ/ouNcMvIJ NwQpd369y9bfIhWUiVXEK7MlRgUG6MvIj6Y3Am/BBLUVbDa4+gmzDC9ezlZkTZG2t14zWPvx XP3FAp2pkW0xqG7/377qptDmrk42GlSKN4z76ELnLxussxc7I2hx18NUcbP8+uty4bMxABEB AAHNH0p1ZXJnZW4gR3Jvc3MgPGpncm9zc0BzdXNlLmNvbT7CwHkEEwECACMFAlOMcK8CGwMH CwkIBwMCAQYVCAIJCgsEFgIDAQIeAQIXgAAKCRCw3p3WKL8TL8eZB/9G0juS/kDY9LhEXseh mE9U+iA1VsLhgDqVbsOtZ/S14LRFHczNd/Lqkn7souCSoyWsBs3/wO+OjPvxf7m+Ef+sMtr0 G5lCWEWa9wa0IXx5HRPW/ScL+e4AVUbL7rurYMfwCzco+7TfjhMEOkC+va5gzi1KrErgNRHH kg3PhlnRY0Udyqx++UYkAsN4TQuEhNN32MvN0Np3WlBJOgKcuXpIElmMM5f1BBzJSKBkW0Jc Wy3h2Wy912vHKpPV/Xv7ZwVJ27v7KcuZcErtptDevAljxJtE7aJG6WiBzm+v9EswyWxwMCIO RoVBYuiocc51872tRGywc03xaQydB+9R7BHPzsBNBFOMcBYBCADLMfoA44MwGOB9YT1V4KCy vAfd7E0BTfaAurbG+Olacciz3yd09QOmejFZC6AnoykydyvTFLAWYcSCdISMr88COmmCbJzn sHAogjexXiif6ANUUlHpjxlHCCcELmZUzomNDnEOTxZFeWMTFF9Rf2k2F0Tl4E5kmsNGgtSa aMO0rNZoOEiD/7UfPP3dfh8JCQ1VtUUsQtT1sxos8Eb/HmriJhnaTZ7Hp3jtgTVkV0ybpgFg w6WMaRkrBh17mV0z2ajjmabB7SJxcouSkR0hcpNl4oM74d2/VqoW4BxxxOD1FcNCObCELfIS auZx+XT6s+CE7Qi/c44ibBMR7hyjdzWbABEBAAHCwF8EGAECAAkFAlOMcBYCGwwACgkQsN6d 1ii/Ey9D+Af/WFr3q+bg/8v5tCknCtn92d5lyYTBNt7xgWzDZX8G6/pngzKyWfedArllp0Pn fgIXtMNV+3t8Li1Tg843EXkP7+2+CQ98MB8XvvPLYAfW8nNDV85TyVgWlldNcgdv7nn1Sq8g HwB2BHdIAkYce3hEoDQXt/mKlgEGsLpzJcnLKimtPXQQy9TxUaLBe9PInPd+Ohix0XOlY+Uk QFEx50Ki3rSDl2Zt2tnkNYKUCvTJq7jvOlaPd6d/W0tZqpyy7KVay+K4aMobDsodB3dvEAs6 ScCnh03dDAFgIq5nsB11j3KPKdVoPlfucX2c7kGNH+LUMbzqV6beIENfNexkOfxHfw==
  • Cc: Andrei Semenov <andrei.semenov@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Delivery-date: Fri, 15 Nov 2024 06:46:40 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 15.11.24 01:11, Elliott Mitchell wrote:
On Wed, Nov 13, 2024 at 08:20:02PM +0100, Jürgen Groß wrote:
On 13.11.24 18:25, Elliott Mitchell wrote:
On Tue, Jul 09, 2024 at 08:36:18AM +0000, Andrei Semenov wrote:

After some investigations we notices a huge performances drop (perfs divided
by
factor of 5) starting from 5.10.88 Linux kernel version on the AMD EPYC
platforms. The patch introduced in this kernel version that allows to
pinpoint
the buggy behavior is :

   “xen/netfront: harden netfront against event channel storms”
d31b3379179d64724d3bbfa87bd4ada94e3237de

The patch basically binds the network frontend to the `xen_lateeoi_chip`
irq_chip (insead of `xen_dynamic_chip`) which allows to its clients to
inform
the chip if spurious interrupts are detected and so the delay in interrupt
treatment is introduced by the chip.

I worry I'm being knave here.

For the heck of it, I took a glance at b27d47950e48.  If my understanding
is correct, b27d47950e48 is making a very basic (and wrong) assumption
about timing/latency.

In particular any time either side receive an event, it will handle
X # of incoming payloads and Y # of acknowledged outgoing payloads.  As
such if X + Y > 1, then up to X + Y - 1 spurious events may be detected.
The issue is there is no synchronization between the event channel and
the work queues.

In particular the network back end could legitimately generate:

work0   signal0 work1   signal1 work2   signal2 work3   signal3

Whereas the network front end may handle this as:

event0  work0   work1   work2   work3   event1  event2  event3

Where b27d47950e48 would interpret events 1-3 as spurious, even though
they're perfectly legitimate.  The same phenomenon could occur in both
directions and also with the Xen block devices.

No.

For one, as long as event0 isn't EOI'd, the other events would just be
merged into a single one.

With the 2-level bitfield event channel certainly, but what if FIFO
event channels were in use?

The same applies. The event channel is masked as long as there was no
EOI.


Additionally, as long as work0 isn't acknowledged by incrementing the
consumer index, additional queued work items should NOT result in
additional events being sent. An event is only sent if a work item is
queued to a ring buffer with consumer == producer.

What if the front-end and back-end were running simultaneously on
different processors?

There are (or should be) appropriate barriers around accesses of consumer
and producer indices, and the sequence they should be accessed is well
defined.


Ultimately how is the network portion of XSA-391 any different from any
other network DoS?  If an interrupt is generated for every single packet
of a series of runt frames, there will be heavy processor use for little
network traffic.

The problem is that a steady stream of events could keep the other side
in IRQ handling for arbitrary amount of times, leading to hangups.

I know.  I was pointing out this seems little different from other
typical network DoS behavior.  This sort of situation is also an issue
when network speeds are increasing since more packets means more
interrupts.

AMD systems may fair worse than Intel systems due to differing cache
coherence behavior/latency.  Perhaps AMD's NUMA implementation adds
some latency.  (huh, suddenly the RAID1 issue comes to mind)


Hopefully I'm not making knave speculation here.  Might this be the
simplest of issues, just it was missed due to being too obvious?

I don't agree with your analysis, see above.

Okay.  I was asking since it looked a bit odd and there has been no news
on this issue (unless I missed some patch flying by).

I don't know how large the impact of this is.  I wouldn't be surprised if
this turned out to overwhelm all my other efforts at performance
improvement.

Any news on your efforts to track this down?

ENOTIME up to now.

Did you try to set the spurious threshold to e.g. 2 instead of the default
of 1? In case that helps it might be a good idea to either change the default
or to at least add a boot parameter for setting the default.


Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.