|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] _PXM, NUMA, and all that goodnesss
On 02/12/2014 07:50 PM, Konrad Rzeszutek Wilk wrote: A warning that the PCI device is not in the numa affinity of the guest might be nice. Interestingly enough one can also read this from SysFS: /sys/bus/pci/devices/<BDF>/numa_node,local_cpu,local_cpulist. Except that we don't expose the NUMA topology to the initial domain so the 'numa_node' is all -1. And the local_cpu depends on seeing _all_ of the CPUs - and of course it assumes that vCPU == pCPU. Anyhow, if this was "tweaked" such that the initial domain was seeing the hardware NUMA topology and parsing it (via Elena's patches) we could potentially have at least the 'numa_node' information present and figure out if a guest is using a PCIe device from the right socket. I don't think we want to go down the path of pretending that dom0 is the hypervisor. This is the same reason I objected to Boris' approach to perf integration last year. I can understand the idea of wanting to use the same tools in the same way; but the fact is dom0 is a guest, and its virtual hardware (including #cpus, topology, &c) isn't (and shouldn't be required to) be in any way related to the host. On the other hand... just tossing this out there, but how hard would it be for dom0 to report information about the *physical* topology on certain things in sysfs, rather than *virtual* topology? I.e., no matter what dom0's virtual topology was, to report the physical numa_node, local_cpu, &c in sysfs? I suppose this might cause problems if the scheduler then tried to run a process / tasklet on the node to which the device was attached, only to find out that no such (virtual) node existed. If that would be a no-go, then I think we need to expose that information via libxl somehow so the toolstack can make reasonable decisions. I think in general, we should: * Do something reasonable when no NUMA topology has been specified* Do what the user asks (but help them make good decisions) when they do specify topology. A couple of things that might mean:* Having the NUMA placement algorithm take into account the location of assigned PCI devices is probably a good idea. * Having a warning when a device is outside of a VM's soft cpu affinity or NUMA affinity. (I think we do something similar when the soft cpu affinity doesn't intersect the NUMA affinity.) * Exposing the NUMA affinity of a device when doing xl pci-assignable-list might be a good idea as well, just to give people a hint that they should be maybe thinking about this. Maybe have xl pci-assignable-add print what node a device is on as well? (Maybe only on NUMA boxes?) Just as an aside, can I take it that a lot of your customers have / are expected to have such NUMA boxes? The accepted wisdom (at least in some circles) seems to be that NUMA isn't particularly important for cloud, because cloud providers will generally use a larger number of smaller boxes and use a cloud orchestration layer to tie them all together. -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |