[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] xend: do not polling vcpus info if guest state is not RUNNING or PAUSED



On Tue, Nov 19, 2013 at 06:41:37PM +0800, Joe Jin wrote:
> On 11/19/13 16:03, Roger Pau Monné wrote:
> > On 19/11/13 07:13, Joe Jin wrote:
> >> When created new guest on NUMA server, xend tried to get the best node by
> >> calculated all vcpus info, the race is if other geust is rebooting, the
> >> guest in the list when entered find_relaxed_node(), but when call
> >> getVCPUInfo() the guest be terminated, then getVCPUInfo() will fail with
> >> below error:
> >>
> >> [2013-09-04 20:01:26 6254] ERROR (XendDomainInfo:496) VM start failed
> >> Traceback (most recent call last):
> >>   File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", 
> >> line 482, in start
> >>     XendTask.log_progress(31, 60, self._initDomain)
> >>   File "/usr/lib64/python2.4/site-packages/xen/xend/XendTask.py", line 
> >> 209, in log_progress
> >>     retval = func(*args, **kwds)
> >>   File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", 
> >> line 2918, in _initDomain
> >>     node = self._setCPUAffinity()
> >>   File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", 
> >> line 2835, in _setCPUAffinity
> >>     best_node = find_relaxed_node(candidate_node_list)[0]
> >>   File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", 
> >> line 2803, in find_relaxed_node
> >>     cpuinfo = dom.getVCPUInfo()
> >>   File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", 
> >> line 1600, in getVCPUInfo
> >>     raise XendError(str(exn))
> >> XendError: (3, 'No such process')
> >>
> >> This patch will let find_relaxed_node() only polling the RUNNING or PAUSED
> >> guest vpus info to avoid the race.
> >>
> >> Signed-off-by: Joe Jin <joe.jin@xxxxxxxxxx>
> >> ---
> >>  tools/python/xen/xend/XendDomainInfo.py |    2 ++
> >>  1 files changed, 2 insertions(+), 0 deletions(-)
> >>
> >> diff --git a/tools/python/xen/xend/XendDomainInfo.py 
> >> b/tools/python/xen/xend/XendDomainInfo.py
> >> index e9d3e7e..66e4b9f 100644
> >> --- a/tools/python/xen/xend/XendDomainInfo.py
> >> +++ b/tools/python/xen/xend/XendDomainInfo.py
> >> @@ -2734,6 +2734,8 @@ class XendDomainInfo:
> >>                  from xen.xend import XendDomain
> >>                  doms = XendDomain.instance().list('all')
> >>                  for dom in filter (lambda d: d.domid != self.domid, doms):
> >> +                    if dom._stateGet() not in 
> >> (DOM_STATE_RUNNING,DOM_STATE_PAUSED):
> >> +                        continue
> > 
> > Isn't it possible that the domain has rebooted and is no longer there
> > between this two calls?
> > 
> > IMHO it's very unlikely, but there's still a window where getVCPUInfo
> > could fail.
> > 
> 
> Yes your right, this patch just reduce the window. 
> I created a new patch for this, please comment!
> 
> [PATCH] xend: getVCPUInfo should handle died domain
> 
> When created new guest on NUMA server, xend tried to get the best node by
> calculated all vcpus info, the race is if other geust is rebooting, the
> guest in the list when entered find_relaxed_node(), but when call
> getVCPUInfo() the guest already be terminated, then getVCPUInfo() will
> fail with  below error:
> 
> [2013-09-04 20:01:26 6254] ERROR (XendDomainInfo:496) VM start failed
> Traceback (most recent call last):
>   File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 
> 482, in start
>     XendTask.log_progress(31, 60, self._initDomain)
>   File "/usr/lib64/python2.4/site-packages/xen/xend/XendTask.py", line 209, 
> in log_progress
>     retval = func(*args, **kwds)
>   File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 
> 2918, in _initDomain
>     node = self._setCPUAffinity()
>   File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 
> 2835, in _setCPUAffinity
>     best_node = find_relaxed_node(candidate_node_list)[0]
>   File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 
> 2803, in find_relaxed_node
>     cpuinfo = dom.getVCPUInfo()
>   File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 
> 1600, in getVCPUInfo
>     raise XendError(str(exn))
> XendError: (3, 'No such process')
> 
> This patch will handle the situation.
> 
> Signed-off-by: Joe Jin <joe.jin@xxxxxxxxxx>
> ---
>  tools/python/xen/xend/XendDomainInfo.py |    4 ++++
>  1 files changed, 4 insertions(+), 0 deletions(-)
> 
> diff --git a/tools/python/xen/xend/XendDomainInfo.py 
> b/tools/python/xen/xend/XendDomainInfo.py
> index e9d3e7e..c6414ed 100644
> --- a/tools/python/xen/xend/XendDomainInfo.py
> +++ b/tools/python/xen/xend/XendDomainInfo.py
> @@ -34,6 +34,7 @@ import os
>  import stat
>  import shutil
>  import traceback
> +import errno
>  from types import StringTypes
>  
>  import xen.lowlevel.xc
> @@ -1541,6 +1542,9 @@ class XendDomainInfo:
>              return sxpr
>  
>          except RuntimeError, exn:
> +            # Domain already died.
> +            if exn.args[0] == errno.ESRCH:
> +                return sxpr
>              raise XendError(str(exn))
>  
>  

Adding Matt as he has stepped up to be the bug-fix maintainer of Xend
(I think? Is that correct - should that be reflected in the MAINTAINERS file?)
> -- 
> 1.7.1
> 
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.