[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-API] [XCP-1.1] High OVS cpu load and unresponsive host network while VMPR archive phase is running
We using antispoofing protection (I've publish patches few month ago in xen-api@) based on rules, applied from /etc/xensource/scripts/vif. Those rules looks like this: $ofctl add-flow $p_bridge "in_port=$port priority=39000 dl_type=0x0800 nw_src=$IP dl_src=$mac idle_timeout=0 action=normal" $ofctl add-flow $p_bridge "in_port=$port priority=38500 dl_type=0x0806 dl_src=$mac nw_src=$IP idle_timeout=0 action=normal" $ofctl add-flow $p_bridge "in_port=$port priority=38000 idle_timeout=0 action=drop" During abnormal activity last rule counter (DROP) is growing quickly, and VM migration to other hosts cause same symptom on new host. We simply shutdown such VM's (because attempt to use non-assigned IP is violation of TOC for our services) and message to owner to ask them to fix problem. It happens rarely (less then once in month), but happens. 03.08.2012 12:01, Christian Fischer ÐÐÑÐÑ: On Thursday 02 August 2012 23:46:18 George Shuklin wrote:In product environment I saw that behavior few times. ovs-* processes starts to consume lot of cpu (over 100%) and start to cause packets drops. That usually happens with 'hacked' customer VMs (sudden spike of outgoing traffic, cpu, and in few cases we assisted in research, actual trojans running on server because of some stupid php misconfiguration in yet another phpbb/cms/durpal/etc).We have no customer VMs there, and we watch the vm traffic. Nothing unusual. The archive phase is running. It's 100% reproducible. I suppose that, in my case, it has something to do with with the OpenFlow controller (Citrix DVS Controller) we tried to evaluate. Currently we do some tests in an testing environment to work out the problem. But by the way, what do you do to protect your production environment against crashing caused by flooding the network? IIRC Jesse Gross told something about some work on patches preventing a single vm from being able to render the network unresponsive, maybe a year ago. What's the state?I'm not sure wat exactly happens, but my hypothesis is that it related to amount of flows. Then trojan starts to flood out traffic to different servers (smtp/www spam/etc) it cause lots of new connections.. On 01.08.2012 04:08, Ben Pfaff wrote:Christian Fischer writes:On Tuesday 31 July 2012 18:08:18 Ben Pfaff wrote:Christian Fischer writes:We have no tagged vlans here, all physical switch ports running access mode. I wouldn't say that network load is increased when this happens, 15 kpps. Network performance could be poor due either a vswitch issue (runs at 180% CPU load if the vswitch log don't lie) or high load on/cheep hardware of the customer shared backup storage. I've never seen this stuff.180% CPU load is impossible for OVS 1.0.1, which has only a single procsss with a single thread.Yes, that's right, but we run OVS 1.4.2 XCP build: 1.1.0-50674c OVS build: 1.4.2 NICs: BCM5709 Gigabit TOE iSCSI Offload OVS NIC bonding: active/activeOnly the as-yet-unreleased post-1.8.0 Open vSwitch has more than one process, and it still doesn't have multiple threads. I suppose ovsdb-server and ovs-vswitchd could both go crazy at the same time, but I haven't had any reports of that. What process(es) add up to 180%? _______________________________________________ Xen-api mailing list Xen-api@xxxxxxxxxxxxx http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api_______________________________________________ Xen-api mailing list Xen-api@xxxxxxxxxxxxx http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api _______________________________________________ Xen-api mailing list Xen-api@xxxxxxxxxxxxx http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |