[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-API] [XCP-1.1] High OVS cpu load and unresponsive host network while VMPR archive phase is running



On Monday 30 July 2012 16:36:16 Joseph Hom wrote:
> I've seen the same type of thing when VMs on tagged networks take on a lot
> of traffic. In my case the root cause appears to be ovs being unable to
> handle the amount of flow build up/tear downs. This causes old flows to
> perform somewhat ok, while new flows are erratic or don't work at all.
> This only affects vlan networks. Any VM on networks without a vlan
> tag(e.g. native) don't experience the issue.
> 
> I've been able to duplicate the issue all the way up to the latest ovs
> 1.6.1.
> 
> When this happens can you check to see if any VM on vlan networks are
> taking on an increased network load ( >100k pps)?

We have no tagged vlans here, all physical switch ports running access mode.
I wouldn't say that network load is increased when this happens, 15 kpps. 
Network performance could be poor due either a vswitch issue (runs at 180% CPU 
load if the vswitch log don't lie) or high load on/cheep hardware of the 
customer shared backup storage. I've never seen this stuff.

Christian

> 
> -----Original Message-----
> From: xen-api-bounces@xxxxxxxxxxxxx [mailto:xen-api-bounces@xxxxxxxxxxxxx]
> On Behalf Of Christian Fischer Sent: Saturday, July 28, 2012 4:26 PM
> To: xen-api@xxxxxxxxxxxxxxxxxxx
> Subject: [Xen-API] [XCP-1.1] High OVS cpu load and unresponsive host
> network while VMPR archive phase is running
> 
> We notice high vswitch cpu load while the vm protection archive phase is
> running, which ends up in broken network connections and unresponsive pool
> servers. Any help to solve this problem is welcome.
> 
> XCP build: 1.1.0-50674c
> OVS build: 1.4.2
> NICs: BCM5709 Gigabit TOE iSCSI Offload
> OVS NIC bonding: active/active
> Pool Nodes: Dell R610
> Storage type: LVMoiSCSI
> 
> The archive phase starts at 03.00AM, short time after that OVS logs
> poll_loop events and high CPU usage, after some hours (3-4) the whole host
> network becomes unresponsive, except the offloaded iSCSI connections to
> the NetAPP guest system image LUN (bnx2i cnic). We snapshot and archive
> only guest system images (mostly 8GB per image), data volumes are mounted
> directly by guest VMs (iSCSI).
> 
> We had running an XCP-1.0 pool on Intel Servers for the last two years with
> a lot of VLAN trunks, active/active bonds, cheep switches, self made DRBD-
> replicated storage, and OVS-1.0.1 IIRC. We've never seen such behavior.
> 
> Thanks
> Christian
> 
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Xen-api mailing list
> Xen-api@xxxxxxxxxxxxx
> http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api

-- 
------------------------------------------------------------
EasternGraphics - visualize your business

Christian Fischer
Administration
http://www.EasternGraphics.com
phone: +49 3677 678265

EasternGraphics GmbH - Albert-Einstein-Strasse 1 - DE-98693 Ilmenau
Amtsgericht Jena - HRB304052, Geschaeftsfuehrer:
Ekkehard Beier, Volker Blankenberg, Frank Wicht, Andreas Winkler

_______________________________________________
Xen-api mailing list
Xen-api@xxxxxxxxxxxxx
http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.