[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-API] [XCP-1.1] High OVS cpu load and unresponsive host network while VMPR archive phase is running
On Wednesday 01 August 2012 19:27:11 Ben Pfaff wrote: > Christian Fischer > > writes: > >> >> 180% CPU load is impossible for OVS 1.0.1, which has only a > >> >> single procsss with a single thread. > >> > > >> > Yes, that's right, but we run OVS 1.4.2 > >> > > >> > XCP build: 1.1.0-50674c > >> > OVS build: 1.4.2 > >> > NICs: BCM5709 Gigabit TOE iSCSI Offload > >> > OVS NIC bonding: active/active > >> > >> Only the as-yet-unreleased post-1.8.0 Open vSwitch has more than > >> one process, and it still doesn't have multiple threads. > >> > >> I suppose ovsdb-server and ovs-vswitchd could both go crazy at > >> the same time, but I haven't had any reports of that. > >> > >> What process(es) add up to 180%? > > > > Both, server and vswitchd logs, show a lot of poll_loop entries with high > > CPU usage. You can find some snippets at pastbin. Send a mail if you > > need the whole logs. > > > > ovsdb-server.log Jul 26 08:00: > > http://pastebin.com/RaCRyZiz > > ovs-vswitchd.log Jul 26 08:00: > > http://pastebin.com/bmXJUWaT > > The ovsdb-server high CPU usage appears to be due to tons of > activity talking to ovs-vswitchd. That is very strange; it > doesn't really make sense. Is there anything particularly > unusual going on, such as something modifying the database > quickly, VMs going up and down at a high rate, etc.? No, there's nothing special. No VMs going up or down or migrating to a new host, the database shouldn't be modified. In addition the VMPR snapshot archive phase is running, nothing else. > The ovs-vswitchd high CPU usage appears to be due to a lot of > activity from the OpenFlow controller (I guess that's the VSwitch > Controller you mention). Okay, that's the case for the first logs from Jul, 26th. That's with the VSwitch controller I mention. > The bonding code is unnecessarily shifting around load, but I > don't think that would cause a lot of CPU usage. Why load is shifted if it's unnecessary? > > ovs-vswitchd.log Jul 30 22:30 (180 - 230 % CPU load): > > http://pastebin.com/xZykK2Ad > > That one doesn't make any sense to me. That one is without the OpenFlow controller. You said above that the ovs- vswitchd high CPU usage appears due to a lot of OpenFlow controller activity, but there's no controller configured and the CPU load is increased. > What do you see for these processes' CPU usage using some other > tool, such as "top"? I can not run the archive phase again, that crashes the vswitches, sometimes the whole host(s). If archiving is finished or broken, if the host is alive and responsible, old flows perform mostly ok, while new flows are erratic or don't work at all. VM live migration is also affected, is broken in most cases, and the vswitches have stopped sending sflow to the analyzer. hsflowd is sending sflow. A host reboot is required. I hope I get some test servers ready next days. Thanks Christian > > Sometimes there was a VSwitch Controller (Citrix) connected, > > but it's removed, > > In the first ovs-vswitchd.log paste, the controller certainly > looks like a culprit. > > > _______________________________________________ > Xen-api mailing list > Xen-api@xxxxxxxxxxxxx > http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api _______________________________________________ Xen-api mailing list Xen-api@xxxxxxxxxxxxx http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |