Xen project Mailing List

Re: Issue: Networking performance in Xen VM on Arm64

On Mon, 17 Oct 2022, Leo Yan wrote: > Hi Stefano, > > Sorry for late response. Please see below comments. > > On Tue, Oct 11, 2022 at 02:47:00PM -0700, Stefano Stabellini wrote: > > On Tue, 11 Oct 2022, Leo Yan wrote: > > > > > The second question is how to mitigate the long latency when send data > > > > > from DomU. A possible solution is the Xen network forend driver > > > > > copies > > > > > skb into mediate (bounce) buffer, just like what does in Xen net > > > > > backend driver with gnttab_batch_copy(), in this way the forend driver > > > > > doesn't need to wait for backend driver response and directly return > > > > > back. > > > > > > > > About this, I am not super familiar with drivers/net/xen-netfront.c but > > > > I take you are referring to xennet_tx_buf_gc? Is that the function that > > > > is causing xennet_start_xmit to wait? > > > > > > No. We can take the whole flow in xen-netfront.c as: > > > > > > xennet_start_xmit() > > > ----------> notify Xen Dom0 to process skb > > > <--------- Dom0 copies skb and notify back to DomU > > > xennet_tx_buf_gc() > > > softirq/NET_TX : __kfree_skb() > > > > Let me premise again that I am not an expert in PV netfront/netback. > > However, I think the above is only true if DomU and Dom0 are running on > > the same physical CPU. If you use sched=null as I suggested above, > > you'll get domU and dom0 running at the same time on different physical > > CPUs and the workflow doesn't work as described. > > > > It should be: > > > > CPU1: xennet_start_xmit() || CPU2: doing something else > > CPU1: notify Xen Dom0 to process skb || CPU2: receive notification > > CPU1: return from xennet_start_xmit() || CPU2: Dom0 copies skb > > CPU1: do something else || CPU2: notify back to DomU > > CPU1: receive irq, xennet_tx_buf_gc() || CPU2: do something else > > Yes, I agree this is ideal case. I tried to set option "sched=null" but > I can observe the latency in the second step when CPU1 notify Xen Dom0, > Dom0 takes 500us+ to receive the notification. > > Please see below detailed log: > > DomU log: > > 4989078512 pub-321 [003] 101.150966: bprint: > xennet_start_xmit: xennet_start_xmit: TSC: 4989078512 > 4989078573 pub-321 [003] 101.150968: bprint: > xennet_tx_setup_grant: id=24 ref=1816 offset=2 len=1514 TSC: 4989078573 > 4989078592 pub-321 [003] 101.150969: bprint: > xennet_start_xmit: xennet_notify_tx_irq: TSC: 4989078592 > > Dom0 log: > > 4989092169 <idle>-0 [013] 140.121667: bprint: > xenvif_tx_interrupt: xenvif_tx_interrupt: TSC: 4989092169 > 4989092331 <idle>-0 [013] 140.121673: bprint: > xenvif_tx_build_gops.constprop.0: id=24 ref=1816 offset=2 len=1514 TSC: > 4989092331 > > We can see DomU sends notification with timestamp (raw counter) is > 4989078592 and Dom0 receives the interrupt with timestamp 4989092169. > Since Dom0 and DomU use the same time counter and the counter > frequency is 25MHz, so we can get the delta value (in macroseconds): > > (4989092169 - 4989078592) / 25000000 * 1000 * 1000 > = 543us > > Which means it takes 543us to let Dom0 to receive the notification. > You could see DomU runs in CPU3 and Dom0 runs on CPU13, there should > not have contention for CPU resources. Seems to me, it's likely Xen > hypervisor takes long time to deliver the interrupt, note, it's not > take so long time for every skb transferring, sometimes the time for > response a notification is short (about ~10us). Good find. I think this is worth investigating further. Do you have vwfi=native in your Xen command line as well? After that, I would add printk also in Xen with the timestamp. The event channel notification code path is the following: # domU side xen/arch/arm/vgic-v2.c:vgic_v2_to_sgi xen/arch/arm/vgic.c:vgic_to_sgi xen/arch/arm/vgic.c:vgic_inject_irq xen/arch/arm/vgic.c:vcpu_kick xen/arch/arm/gic-v2.c:gicv2_send_SGI # dom0 side xen/arch/arm/gic.c:do_sgi xen/arch/arm/traps.c:leave_hypervisor_to_guest It would be good to understand why sometimes it takes ~10us and some other times it takes ~540us > > > > I didn't think that waiting for the backend is actually required. I > > > > mean, in theory xennet_start_xmit could return without calling > > > > xennet_tx_buf_gc, it is just an optimization. But I am not sure about > > > > this. > > > > > > The function xennet_start_xmit() will not wait and directly return > > > back, but if we review the whole flow we can see the skb is freed until > > > the softirq NET_TX. > > > > Is it an issue that the skb is not freed until later? Is that affecting > > the latency results? It shouldn't, right? > > I did an extra experiment in Xen net forend driver, I enabled the flag > "info->bounce = true" so the forend driver will use bounce buffer to > store data and release the skb immediately to network core layer. > > The throughput can be boosted significantly for this: the netperf > result can be improved from 107.73 Mbits/s to 300+ Mbits/s. > > > What matters is when dom0 is > > getting those packets on the physical network interface and that happens > > before the skb is freed. I am just trying to figure out if we are > > focusing on the right problem. > > Good point. I agree that releasing skb earlier only can benefit for > throughput, but we still cannot resolve the latency issue if Dom0 > takes long time to relay packets to phusical network interface. > > > > In this whole flow, it needs DomU and Dom0 to work > > > together (includes two context switches) to process skb. > > > > There are not necessarily 2 context switches as things should run in > > parallel. > > > > > Here I mean the optimization is to allow Dom0 and DomU to work in > > > parallel. It could be something like blow, the key point is DomU > > > doesn't need to wait for Dom0's notification. > > > > I think it is already the case that domU doesn't need to wait for dom0's > > notification? > > Agree. domU doesn't need to wait for dom0's notification until it uses > out the skb can be allocated by the network core layer. This is why I > also can tweak core layer's parameters for buffer size (see > /proc/sys/net/core/wmem_default and /proc/sys/net/core/wmem_max). > > > It is true that domU is waiting for dom0's notification to > > free the skb but that shouldn't affect latency? > > Yeah. I will focus on the elaberated issue above that Dom0 takes long > time to receive the notification. > > Will keep posted if have any new finding. Thanks, this is very interesting

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.