[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: AMD EPYC virtual network performances
On 15.11.24 01:11, Elliott Mitchell wrote: On Wed, Nov 13, 2024 at 08:20:02PM +0100, Jürgen Groß wrote:On 13.11.24 18:25, Elliott Mitchell wrote:On Tue, Jul 09, 2024 at 08:36:18AM +0000, Andrei Semenov wrote:After some investigations we notices a huge performances drop (perfs divided by factor of 5) starting from 5.10.88 Linux kernel version on the AMD EPYC platforms. The patch introduced in this kernel version that allows to pinpoint the buggy behavior is : “xen/netfront: harden netfront against event channel storms” d31b3379179d64724d3bbfa87bd4ada94e3237de The patch basically binds the network frontend to the `xen_lateeoi_chip` irq_chip (insead of `xen_dynamic_chip`) which allows to its clients to inform the chip if spurious interrupts are detected and so the delay in interrupt treatment is introduced by the chip.I worry I'm being knave here. For the heck of it, I took a glance at b27d47950e48. If my understanding is correct, b27d47950e48 is making a very basic (and wrong) assumption about timing/latency. In particular any time either side receive an event, it will handle X # of incoming payloads and Y # of acknowledged outgoing payloads. As such if X + Y > 1, then up to X + Y - 1 spurious events may be detected. The issue is there is no synchronization between the event channel and the work queues. In particular the network back end could legitimately generate: work0 signal0 work1 signal1 work2 signal2 work3 signal3 Whereas the network front end may handle this as: event0 work0 work1 work2 work3 event1 event2 event3 Where b27d47950e48 would interpret events 1-3 as spurious, even though they're perfectly legitimate. The same phenomenon could occur in both directions and also with the Xen block devices.No. For one, as long as event0 isn't EOI'd, the other events would just be merged into a single one.With the 2-level bitfield event channel certainly, but what if FIFO event channels were in use? The same applies. The event channel is masked as long as there was no EOI. Additionally, as long as work0 isn't acknowledged by incrementing the consumer index, additional queued work items should NOT result in additional events being sent. An event is only sent if a work item is queued to a ring buffer with consumer == producer.What if the front-end and back-end were running simultaneously on different processors? There are (or should be) appropriate barriers around accesses of consumer and producer indices, and the sequence they should be accessed is well defined. Ultimately how is the network portion of XSA-391 any different from any other network DoS? If an interrupt is generated for every single packet of a series of runt frames, there will be heavy processor use for little network traffic.The problem is that a steady stream of events could keep the other side in IRQ handling for arbitrary amount of times, leading to hangups.I know. I was pointing out this seems little different from other typical network DoS behavior. This sort of situation is also an issue when network speeds are increasing since more packets means more interrupts.AMD systems may fair worse than Intel systems due to differing cache coherence behavior/latency. Perhaps AMD's NUMA implementation adds some latency. (huh, suddenly the RAID1 issue comes to mind) Hopefully I'm not making knave speculation here. Might this be the simplest of issues, just it was missed due to being too obvious?I don't agree with your analysis, see above.Okay. I was asking since it looked a bit odd and there has been no news on this issue (unless I missed some patch flying by). I don't know how large the impact of this is. I wouldn't be surprised if this turned out to overwhelm all my other efforts at performance improvement. Any news on your efforts to track this down? ENOTIME up to now. Did you try to set the spurious threshold to e.g. 2 instead of the default of 1? In case that helps it might be a good idea to either change the default or to at least add a boot parameter for setting the default. Juergen Attachment:
OpenPGP_0xB0DE9DD628BF132F.asc Attachment:
OpenPGP_signature.asc
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |