[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Issue: Networking performance in Xen VM on Arm64
Hi there, I tested the networking performance on my Arm64 platform in Xen virtual machine, below I will try to give out summary for testing result and share some analysis, at the end I want to check a bit from the community and get suggestion before I can proceed. First of all, if you want to know more details for the profiling, you could access the slides: https://docs.google.com/presentation/d/1iTQRx8-UYnm19eU6CnVUSaAodKZ0JuRiHYaXBomfu3E/edit?usp=sharing ## Testing summary The TL;DR is that I used two tools: netperf and ddsperf to test the networking latency and throughput for Xen Dom0 and DomU, the below result shows the performance for sending data from a Xen domain (Dom0 or DomU) to my x86 PC respectively, and performance is poor when transmit data from Xen DomU (Note, I used the default networking bridge configuration when launch Xen VM). Throughput result: Profile netperf (Mbits/sec) ddsperf (Mbits/sec) Xen-Dom0 939.41 > 620 Xen-DomU 107.73 4~12 Latency result: Profile ddsperf's max ping/pong latency (us) Xen-Dom0 200 ~ 1400 Xen-DomU > 60,000 ## Analysis The critical thing for the performance is low level network driver if it uses synchronous or asynchronous mode for skb transferring. When we transfer data from my x86 machine to Xen DomU, the data flow is: bridge -> xenif (Xen network backend driver) => Dom0 `> xennet driver (Xen net forend driver) => DomU In this flow, Xen network backend driver (in Dom0) copies skb into the mediate buffer (gnttab_batch_copy()) and notify Xen VM by sending rx irq, the key point is the backend driver doesn't wait for Xen VM to process the skb and directly return to user space, therefore, Xen Dom0 and DomU work in asynchronous mode in this case (Dom0 doesn't need to wait for DomU), the duration for handling a skb is 30+ us. Conversely, if transmit data from Xen DomU, the flow is: DomU | Dom0 ---------------------------------+------------------------------------ xennet driver receives skb | `> send tx interrupt to Dom0 | | xenif respond tx interrupt | Copy skb into mediate buffer | Notify DomU (send tx irq) xennet driver handle tx irq | free skb | So we can see when DomU sends out packets, it needs to wait for Dom0 to process the packets, until Dom0 notifies DomU that packet has been processed the net forend driver in DomU releases skb. This means it's a long way to process skbs: Xen DomU and Dom0 work in synchronous mode, the forend driver in DomU sends out skb and notifies Dom0, Dom0 handles skb and notifies back to DomU, finally DomU knows the skb has been processed and releases it. The duration between sendind and releasing a skb is about 180+ us. ## Questions Given Xen network driver has been merged in Linux kernel 2.6 (back in 2007), it's very unlikely I am the first person to observe this issue. I think this is a common issue and not specific to Arm64 arch, the reason is the long latency is mainly caused by Xen networking driver and I did't see the Xen context switching on Arm64 is abnormal (I saw it takes ~10us for context switching between Xen domains). Could anyone confirm if this is a known issue? The second question is how to mitigate the long latency when send data from DomU. A possible solution is the Xen network forend driver copies skb into mediate (bounce) buffer, just like what does in Xen net backend driver with gnttab_batch_copy(), in this way the forend driver doesn't need to wait for backend driver response and directly return back. But here I am not clear for the mechanism for Xen grant table, especially if the Xen grant table is only writtable from Dom0, then it would be hard for us to optimize the forend driver in DomU by directly copying skb into the grant table. Any thoughts for this? Welcome any suggestion and comments. Thanks! Leo
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |