[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-users] [Xen-devel] vif-bridge errors when creating and destroying dozens of VMs simultaneously
On Wed, May 17, 2017 at 11:10 AM, George Dunlap <george.dunlap@xxxxxxxxxx> wrote: > On 17/05/17 10:45, Roger Pau Monné wrote: >> On Wed, May 17, 2017 at 10:04:40AM +0100, George Dunlap wrote: >>> cc'ing xen-devel & some relevant people >> >> Please bear with me, my knowledge of iptables is 0. >> >>> On Tue, May 16, 2017 at 4:21 PM, Antony Saba <awsaba@xxxxxxxxx> wrote: >>>> Hello xen-users, >>>> >>>> We are seeing the following errors repeatedly while trying to create >>>> domains using a script, with the end result that 2 or 3 out of about >>>> 20 VMs fail to start, and there are stale entries in the iptables for >>>> domains that have been destroyed. >>>> >>>> >>>> 2017-05-10 11:45:40 UTC libxl: error: >>>> libxl_exec.c:118:libxl_report_child_exitstatus: >>>> /etc/xen/scripts/vif-bridge remove [18767] exited with error status 4 >>>> 2017-05-10 11:50:52 UTC libxl: error: >>>> libxl_exec.c:118:libxl_report_child_exitstatus: >>>> /etc/xen/scripts/vif-bridge offline [1554] exited with error status 4 >>>> >>>> I've been testing the following patch of vif-common.sh over the last >>>> day and it appears to resolve the issue. iptables exits with status 4 >>>> when "Another app is currently holding the xtables lock." >> >> So, an iptables command can fail randomly because there's someone else >> holding >> an iptables internal lock? >> >> Isn't there anyway to tell the iptables command to just block until it can >> get >> the lock? This seems extremely racy, isn't people then forced to use >> something >> like: >> >> while true; do >> iptables <...> >> if [ $? == 0 ]; then >> break; >> elif [ $? != 4 ]; then >> error ... >> fi >> done >> >> When dealing with iptables? > > This seems to be a common problem ([1][2][3] come up right away). > > The basic solution seems to be to add the '-w' option to have it wait > for the lock. It does seem like that should be the default though. > Having commands normally run inside of scripts randomly fail unless you > add the special "don't randomly fail" option seems a bit mad. Hmm, looking more into it: * The -w option was introduced at the same time that the locking was introduced [1]. So any version that has locking will have the -w option. * The bare -w option doesn't introduce a timeout, so in the case that the xtables lock wasn't released, the script will hang indefinitely. A '-W' option was introduced in 2016 to introduce a timeout, but this is on even fewer systems than the -w option. (My desktop, running Debian Jessie, doesn't seem to have the -W option for instance.) * The return code, RESOURCE_PROBLEM, is returned for other reasons; but it looks like for our purposes in most case retrying might not be a bad strategy in those cases either. * But that was only in 2013 that the option was introduced, so it's likely there are still old versions of iptables around that don't have the -w option. The good news is that versions without the -w option will *also* not fail with error code 4 (although they may fail in other ways in the case of concurrent accesses instead). So we have three options: 1. Always add -w. This will effectively drop support for systems which don't have iptables -w. It also wouldn't allow us to reliably set a timeout. 2. Always do a loop. This should work on all systems, but is redundant for systems with -w and unnecessary on systems without. On the other hand, it would allow us to implement our own timeout even on systems without the -W option. 3. Try to check to see if the version of iptables we have supports -w, and use it if available. This should also work on all systems, but introduces a bit of complication. It also doesn't allow us to reliably use a timeout. Any thoughts? -George [1] https://git.netfilter.org/iptables/commit/?id=93587a04d0f2511e108bbc4d87a8b9d28a5c5dd8 _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxx https://lists.xen.org/xen-users
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |