[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Issue: Networking performance in Xen VM on Arm64
On Mon, 24 Oct 2022, Leo Yan wrote: > > If you are really running with the NULL scheduler, then I would > > investigate why the vCPU has is_running == 0 because it should not > > happen. > > Correct for this: it's my bad that I didn't really enable NULL scheduler > in my code base. After I enabled NULL scheduler, the latency by context > switching is dismissed. > > 8963 pub-338 [002] 217.777652: bprint: > xennet_tx_setup_grant: id=60 ref=1340 offset=2 len=1514 TSC: 7892178799 > 8964 pub-338 [002] 217.777662: bprint: > xennet_tx_setup_grant: id=82 ref=1362 offset=2050 len=1006 TSC: 7892179043 > 8965 ksoftirqd/12-75 [012] 255.466914: bprint: > xenvif_tx_build_gops.constprop.0: id=60 ref=1340 offset=2 len=1514 TSC: > 7892179731 > 8966 ksoftirqd/12-75 [012] 255.466915: bprint: > xenvif_tx_build_gops.constprop.0: id=82 ref=1362 offset=2050 len=1006 TSC: > 7892179761 > 8967 pub-338 [002] 217.778057: bprint: > xennet_tx_setup_grant: id=60 ref=1340 offset=2050 len=1514 TSC: 7892188930 > 8968 pub-338 [002] 217.778072: bprint: > xennet_tx_setup_grant: id=53 ref=1333 offset=2 len=1514 TSC: 7892189293 > 8969 containerd-2965 [012] 255.467304: bprint: > xenvif_tx_build_gops.constprop.0: id=60 ref=1340 offset=2050 len=1514 TSC: > 7892189479 > 8970 containerd-2965 [012] 255.467306: bprint: > xenvif_tx_build_gops.constprop.0: id=53 ref=1333 offset=2 len=1514 TSC: > 7892189533 I am having difficulty following the messages. Are the two points [a] and [b] as described in the previous email shown here? > So the xennet (Xen net forend driver) and xenvif (net backend driver) > work in parallel. Please note, I didn't see networking performance > improvement after changed to use NULL scheduler. > > Now I will compare the duration for two directions, one direction is > sending data from xennet to xenvif, and another is the reversed > direction. It's very likely the two directions have significant > difference for sending data with grant tables, you could see in above > log, it takes 20~30us to transmit a data block (we can use the id > number and grant table's ref number to match the data block in xennet > driver and xenvif driver). > > > Now regarding the results, I can see the timestamp 3842008681 for > > xennet_notify_tx_irq, 3842008885 for vgic_inject_irq, and 3842008935 for > > vcpu_kick. Where is the corresponding TSC for the domain receiving the > > notification? > > > > Also for the other case, starting at 3842016505, can you please > > highlight the timestamp for vgic_inject_irq, vcpu_kick, and also the one > > for the domain receiving the notification? > > > > The most interesting timestamps would be the timestamp for vcpu_kick in > > "notification sending domain" [a], the timestamp for receiving the > > interrupt in the Xen on pCPU for the "notification receiving domain" > > [b], and the timestamp for the "notification receiving domain" getting > > the notification [c]. > > > > If really context switch is the issue, then the interesting latency > > would be between [a] and [b]. > > Understand. I agree that I didn't move into more details, the main > reason is Xen dmesg buffer is fragile after adding more logs, e.g. > after I added log in the function gicv3_send_sgi(), Xen will stuck > during the booting phase, and after adding logs in > leave_hypervisor_to_guest() it will introduce huge logs (so I need to > only trace for first 16 CPUs to mitigate log flood). > > I think it would be better to enable xentrace for my profiling at my > side. If I have any further data, will share back. Looking forward to it. Without more details it is impossible to identify the source of the problem and fix it.
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |