[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] A question about changeset 20621:f9392f6eda79 and Discontinuous online node
Hi, Andre and Keir, we meet a divide_by_zero bug in xend when create guest, after checking the code, seems it is caused by changeset 20621(see below for the patch). It removes the checking for len(info['node_to_cpu'][i]) > 0 before nodeload[i] = int(nodeload[i] * 16 / len(info['node_to_cpu'][i])), so that if a node has no CPU populated, it will fail. A deep checking of the code reveals more than this changeset. Per my understanding to the code, currently Xen API and control panel assumes the online node is always continuous. The XEN_SYSCTL_physinfo hypercall will return only the number of online node, and control panel like tools/python/xen/xend/XendDomainInfo.py will iterate from 0~nr_nodes. However, this is not always true, considering if no memory is populated behind some socket. For example, in a NUMA system with 4 pxm domain, pxm 0/2 has both CPU and memory populated, while pxm 1/3 has only CPU.. Xen hypervisor will setup pxm~node mapping for all 4 domain(assume pxm is 1:1 mapping with node), but only node 0/2 is online (per my understanding, according to current memory allocation mechanism, only node with memory populated is online). When control panel call XEN_SYSCTL_physinfo, it will get nr_nodes as 2, and currently it will assume node 0/1 is online, this is sure to be wrong and may cause various issues. This continuous assumption apply to CPU side also. Currently nr_cpus is returned as num_online_cpus(), this will cause issue if some of cpu is offlined. I'm considering if we can pass this dis-continuous information to user space too, but that requires change this sysctl interface. The worse is, even if we can change this interface, we may run out of the 128 byte limitation for xen_sysctl hypercall if we change the NR_CPUS == 128 in future (currently the struct xen_sysctl_physinfo is 104 byte already). I'd get some input from you guys and community before I try to fix this issue, any suggestion? Thanks Yunhong Jiang diff -r a50c1cbf08ec -r f9392f6eda79 tools/python/xen/xend/XendDomainInfo.py --- a/tools/python/xen/xend/XendDomainInfo.py Fri Dec 11 08:58:06 2009 +0000 +++ b/tools/python/xen/xend/XendDomainInfo.py Fri Dec 11 08:59:54 2009 +0000 @@ -2670,10 +2670,9 @@ class XendDomainInfo: nodeload[i] += 1 break for i in range(0, nr_nodes): - if len(info['node_to_cpu'][i]) > 0 and i in node_list: - nodeload[i] = int(nodeload[i] * 16 / len(info['node_to_cpu'][i])) - else: - nodeload[i] = sys.maxint + nodeload[i] = int(nodeload[i] * 16 / len(info['node_to_cpu'][i])) + if len(info['node_to_cpu'][i]) == 0 or i not in node_list: + nodelist[i] += 8 return map(lambda x: x[0], sorted(enumerate(nodeload), key=lambda x:x[1])) _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |