[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] xen/netfront: Fix TX response spurious interrupts



On Wed, Jul 16, 2025 at 11:31:06AM -0700, Elliott Mitchell wrote:
> On Wed, Jul 16, 2025 at 07:47:48AM +0000, Anthoine Bourgeois wrote:
> > On Tue, Jul 15, 2025 at 12:19:34PM -0700, Elliott Mitchell wrote:
> > >On Tue, Jul 15, 2025 at 08:21:40AM +0000, Anthoine Bourgeois wrote:
> > >>
> > >> Thank you for the test!
> > >> Could you send me the domU/dom0 kernel version and xen version ?
> > >
> > >I tend to follow Debian, so kernel 6.1.140 and 4.17.6.  What may be
> > >more notable is AMD processor.
> > >
> > >When initially reported, it was reported as being more severe on systems
> > >with AMD processors.  I've been wondering about the reason(s) behind
> > >that.
> > 
> > AMD processors could make a huge difference. On Ryzen, this patch could
> > almost double the bandwidth and on Epyc close to nothing with low
> > frequency models, there is another bottleneck here I guess.
> > On which one do you test?
> > 
> > Do you know there is also a workaround on AMD processors about remapping
> > grant tables as WriteBack?
> > Upstream patch is 22650d605462 from XenServer.
> > The test package for XCP-ng with the patch:
> > https://xcp-ng.org/forum/topic/10943/network-traffic-performance-on-amd-processors
> > 
> 
> Why are you jumping onto mostly unrelated issues when the current bug is
> unfinished?
> 
> Spurious events continue to be observed on the network backend.  Spurious
> events are also being observed on block and PCI backends.  You identified
> one cause, but others remain.
> 
> (I'm hoping the next one effects all the back/front ends; the PCI backend
> is a bigger issue for me)
> 
> Should add, one VM being observed with these issue(s) is using 6.12.38.

For reference, the following:

for d in /sys/devices/{pci,vbd,vif}-*[0-9]-*[0-9]/xenbus
do      if [ -f "$d/spurious_events" ]
        then    read s < "$d/spurious_events"
        else    s=0
        fi
        if [ "$s" -gt 0 ]
        then    printf "problem %s: %d\\n" "$d/spurious_events" "$s"
        else    printf "clean: %s\\n" "$d/spurious_events"
        fi
done

Flags all passthrough and virtual devices.  Even though there is a
reduction with virtual network devices, that is only a 10% reduction.
Most of the problem remains even though there is progress.

I was mentioning an AMD processor since the initial report stated the
problem was more severe with AMD processor machines.

This is likely a driver design issue.  Most pieces of hardware, telling
the hardware to process an empty queue is quite cheap.  Perhaps minor
energy loss, but most hardware isn't (yet) too worried about being
attacked.

Passthrough and virtual devices are quite unusual in there being a
concern over attacks.  There could be major design flaws due to the
front-ends being designed similar to normal drivers.


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@xxxxxxx  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445





 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.