Xen project Mailing List

Re: [Xen-API] [XCP-1.1] High OVS cpu load and unresponsive host network while VMPR archive phase is running

From: Christian Fischer <christian.fischer@xxxxxxxxxxxxxxxxxxx>

Date: Tue, 31 Jul 2012 08:45:22 +0200

Cc: "xen-api@xxxxxxxxxxxxxxxxxxx" <xen-api@xxxxxxxxxxxxxxxxxxx>

Delivery-date: Tue, 31 Jul 2012 06:46:13 +0000

List-id: User and development list for XCP and XAPI <xen-api.lists.xen.org>

On Monday 30 July 2012 16:36:16 Joseph Hom wrote: > I've seen the same type of thing when VMs on tagged networks take on a lot > of traffic. In my case the root cause appears to be ovs being unable to > handle the amount of flow build up/tear downs. This causes old flows to > perform somewhat ok, while new flows are erratic or don't work at all. > This only affects vlan networks. Any VM on networks without a vlan > tag(e.g. native) don't experience the issue. > > I've been able to duplicate the issue all the way up to the latest ovs > 1.6.1. > > When this happens can you check to see if any VM on vlan networks are > taking on an increased network load ( >100k pps)? We have no tagged vlans here, all physical switch ports running access mode. I wouldn't say that network load is increased when this happens, 15 kpps. Network performance could be poor due either a vswitch issue (runs at 180% CPU load if the vswitch log don't lie) or high load on/cheep hardware of the customer shared backup storage. I've never seen this stuff. Christian > > -----Original Message----- > From: xen-api-bounces@xxxxxxxxxxxxx [mailto:xen-api-bounces@xxxxxxxxxxxxx] > On Behalf Of Christian Fischer Sent: Saturday, July 28, 2012 4:26 PM > To: xen-api@xxxxxxxxxxxxxxxxxxx > Subject: [Xen-API] [XCP-1.1] High OVS cpu load and unresponsive host > network while VMPR archive phase is running > > We notice high vswitch cpu load while the vm protection archive phase is > running, which ends up in broken network connections and unresponsive pool > servers. Any help to solve this problem is welcome. > > XCP build: 1.1.0-50674c > OVS build: 1.4.2 > NICs: BCM5709 Gigabit TOE iSCSI Offload > OVS NIC bonding: active/active > Pool Nodes: Dell R610 > Storage type: LVMoiSCSI > > The archive phase starts at 03.00AM, short time after that OVS logs > poll_loop events and high CPU usage, after some hours (3-4) the whole host > network becomes unresponsive, except the offloaded iSCSI connections to > the NetAPP guest system image LUN (bnx2i cnic). We snapshot and archive > only guest system images (mostly 8GB per image), data volumes are mounted > directly by guest VMs (iSCSI). > > We had running an XCP-1.0 pool on Intel Servers for the last two years with > a lot of VLAN trunks, active/active bonds, cheep switches, self made DRBD- > replicated storage, and OVS-1.0.1 IIRC. We've never seen such behavior. > > Thanks > Christian > > > > > > > > > _______________________________________________ > Xen-api mailing list > Xen-api@xxxxxxxxxxxxx > http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api -- ------------------------------------------------------------ EasternGraphics - visualize your business Christian Fischer Administration http://www.EasternGraphics.com phone: +49 3677 678265 EasternGraphics GmbH - Albert-Einstein-Strasse 1 - DE-98693 Ilmenau Amtsgericht Jena - HRB304052, Geschaeftsfuehrer: Ekkehard Beier, Volker Blankenberg, Frank Wicht, Andreas Winkler _______________________________________________ Xen-api mailing list Xen-api@xxxxxxxxxxxxx http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.