[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-API] [XCP-1.1] High OVS cpu load and unresponsive host network while VMPR archive phase is running

On Wednesday 01 August 2012 19:27:11 Ben Pfaff wrote:
> Christian Fischer
> writes:

> >> >> 180% CPU load is impossible for OVS 1.0.1, which has only a
> >> >> single procsss with a single thread.
> >> > 
> >> > Yes, that's right, but we run OVS 1.4.2
> >> > 
> >> > XCP build: 1.1.0-50674c
> >> > OVS build: 1.4.2
> >> > NICs: BCM5709 Gigabit TOE iSCSI Offload
> >> > OVS NIC bonding: active/active
> >> 
> >> Only the as-yet-unreleased post-1.8.0 Open vSwitch has more than
> >> one process, and it still doesn't have multiple threads.
> >> 
> >> I suppose ovsdb-server and ovs-vswitchd could both go crazy at
> >> the same time, but I haven't had any reports of that.
> >> 
> >> What process(es) add up to 180%?
> > 
> > Both, server and vswitchd logs, show a lot of poll_loop entries with high
> > CPU usage. You can find some snippets at pastbin. Send a mail if you
> > need the whole logs.
> > 
> > ovsdb-server.log Jul 26 08:00:
> > http://pastebin.com/RaCRyZiz
> > ovs-vswitchd.log Jul 26 08:00:
> > http://pastebin.com/bmXJUWaT
> The ovsdb-server high CPU usage appears to be due to tons of
> activity talking to ovs-vswitchd.  That is very strange; it
> doesn't really make sense.  Is there anything particularly
> unusual going on, such as something modifying the database
> quickly, VMs going up and down at a high rate, etc.?

No, there's nothing special. No VMs going up or down or migrating to a new 
host, the database shouldn't be modified. In addition the VMPR snapshot archive 
phase is running, nothing else.

> The ovs-vswitchd high CPU usage appears to be due to a lot of
> activity from the OpenFlow controller (I guess that's the VSwitch
> Controller you mention).

Okay, that's the case for the first logs from Jul, 26th. That's with the 
VSwitch controller I mention.

> The bonding code is unnecessarily shifting around load, but I
> don't think that would cause a lot of CPU usage.

Why load is shifted if it's unnecessary?

> > ovs-vswitchd.log Jul 30 22:30 (180 - 230 % CPU load):
> > http://pastebin.com/xZykK2Ad
> That one doesn't make any sense to me.

That one is without the OpenFlow controller. You said above that the ovs-
vswitchd high CPU usage appears due to a lot of OpenFlow controller activity, 
but there's no controller configured and the CPU load is increased.

> What do you see for these processes' CPU usage using some other
> tool, such as "top"?

I can not run the archive phase again, that crashes the vswitches, sometimes 
the whole host(s). If archiving is finished or broken, if the host is alive and 
responsible, old flows perform mostly ok, while new flows are erratic or don't 
work at all. VM live migration is also affected, is broken in most cases, and 
the vswitches have stopped sending sflow to the analyzer. hsflowd is sending 
sflow. A host reboot is required.

I hope I get some test servers ready next days.


> > Sometimes there was a VSwitch Controller (Citrix) connected,
> > but it's removed,
> In the first ovs-vswitchd.log paste, the controller certainly
> looks like a culprit.
> _______________________________________________
> Xen-api mailing list
> Xen-api@xxxxxxxxxxxxx
> http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api

Xen-api mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.