Xen project Mailing List

[Xen-devel] Limited range for counters in /sys/class/net/<vifname>/statistics/{r, t}x_bytes - 33 bits?

From: Andy Smith <andy@xxxxxxxxxxxxxx>

Date: Sun, 3 Sep 2017 03:10:46 +0000

Delivery-date: Sun, 03 Sep 2017 03:11:11 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Openpgp: id=BF15490B; url=http://strugglers.net/~andy/pubkey.asc

Hello, I stumbled across something I find strange, and I'm not sure if it is a bug or not, but if it is not then still it has some unfortunate repercussions. I have one guest which does a decent amount of traffic (100–200Mbps constantly) and it was brought to my attention that the bandwidth stats as measured within the guest using SNMP do not tally with those measured against the vif in dom0 using SNMP. The stats from dom0 were under-reporting the data transferred. It looked very much like a 32-bit counter wrapping multiple times and losing the data, so the first thing I checked was that 64-bit SNMP counters were being used. They were. I then noticed that the counters in /sys/class/net/<vifname>/statistics/{r,t}x_bytes were wrapping much sooner than they ought to. I dumped out the values from those two files every 10 seconds for a couple of hours and the largest value I managed to read from them before they wrapped around was 8408464067. That is larger than 2³² (4294967296) but slightly smaller than 2³³ (8589934592). They are wrapping around every couple of minutes. Guest OS doesn't seem to matter. None of the counters in /sys/class/net/*/statistics/{r,t}x_bytes are larger than 8589934592 if the interface is a vif, and the guests spread several versions of Debian, Ubuntu and CentOS. There is no such limit on the counters for real Ethernet interfaces in dom0 (e.g. eth0 and eth1), nor for bond interfaces in dom0 (bond0), nor for eth0 inside the guests. I haven't observed one of those with a truly massive 64-bit number but they are happily showing values in the region of 32914069823827 at the moment. The consequence of this is that SNMP polling of counters for a vif on a 5 minute basis sees multiple wraps for the counter and gets confused, losing data. My bandwidth stats for this guest have been massively inaccurate for months. I can get consistent data by doing 1 minute polling, as at the data rates in effect here there is no chance to wrap multiple times in one minute. Is this intentional? If so then it presents a bit of a gotcha for anyone running guests which might do more than 100Mbit/s as they would reasonably expect the 64-bit SNMP octet counters to be usable with 5 minute polling, and indeed some of the interface counters are usable while only the ones for the Xen vifs aren't. Or is it a weird bug of just my configuration? If so then it has been present for a long time. At least with the Debian jessie (linux-image-3.16.0-4-amd64) and stretch (linux-image-4.9.0-3-amd64) kernels. Does anyone know more about this? Thanks, Andy _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.