[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Limited range for counters in /sys/class/net/<vifname>/statistics/{r, t}x_bytes - 33 bits?


  • To: xen-devel@xxxxxxxxxxxxxxxxxxxx
  • From: Andy Smith <andy@xxxxxxxxxxxxxx>
  • Date: Sun, 3 Sep 2017 03:10:46 +0000
  • Delivery-date: Sun, 03 Sep 2017 03:11:11 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xen.org>
  • Openpgp: id=BF15490B; url=http://strugglers.net/~andy/pubkey.asc

Hello,

I stumbled across something I find strange, and I'm not sure if it
is a bug or not, but if it is not then still it has some unfortunate
repercussions.

I have one guest which does a decent amount of traffic (100–200Mbps
constantly) and it was brought to my attention that the bandwidth
stats as measured within the guest using SNMP do not tally with
those measured against the vif in dom0 using SNMP. The stats from
dom0 were under-reporting the data transferred.

It looked very much like a 32-bit counter wrapping multiple times
and losing the data, so the first thing I checked was that 64-bit
SNMP counters were being used. They were.

I then noticed that the counters in
/sys/class/net/<vifname>/statistics/{r,t}x_bytes were wrapping much
sooner than they ought to.

I dumped out the values from those two files every 10 seconds for a
couple of hours and the largest value I managed to read from them
before they wrapped around was 8408464067. That is larger than 2³²
(4294967296) but slightly smaller than 2³³ (8589934592). They are
wrapping around every couple of minutes.

Guest OS doesn't seem to matter. None of the counters in
/sys/class/net/*/statistics/{r,t}x_bytes are larger than 8589934592
if the interface is a vif, and the guests spread several versions of
Debian, Ubuntu and CentOS.

There is no such limit on the counters for real Ethernet interfaces
in dom0 (e.g. eth0 and eth1), nor for bond interfaces in dom0
(bond0), nor for eth0 inside the guests. I haven't observed one of
those with a truly massive 64-bit number but they are happily
showing values in the region of 32914069823827 at the moment.

The consequence of this is that SNMP polling of counters for a vif
on a 5 minute basis sees multiple wraps for the counter and gets
confused, losing data. My bandwidth stats for this guest have been
massively inaccurate for months.

I can get consistent data by doing 1 minute polling, as at the data
rates in effect here there is no chance to wrap multiple times in
one minute.

Is this intentional? If so then it presents a bit of a gotcha for
anyone running guests which might do more than 100Mbit/s as they
would reasonably expect the 64-bit SNMP octet counters to be usable
with 5 minute polling, and indeed some of the interface counters are
usable while only the ones for the Xen vifs aren't.

Or is it a weird bug of just my configuration? If so then it has
been present for a long time. At least with the Debian jessie
(linux-image-3.16.0-4-amd64) and stretch (linux-image-4.9.0-3-amd64)
kernels.

Does anyone know more about this?

Thanks,
Andy

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.