[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] _PXM, NUMA, and all that goodnesss
On 13/02/14 10:08, Jan Beulich wrote: >> Interestingly enough one can also read this from SysFS: >> /sys/bus/pci/devices/<BDF>/numa_node,local_cpu,local_cpulist. >> >> Except that we don't expose the NUMA topology to the initial >> domain so the 'numa_node' is all -1. And the local_cpu depends >> on seeing _all_ of the CPUs - and of course it assumes that >> vCPU == pCPU. >> >> Anyhow, if this was "tweaked" such that the initial domain >> was seeing the hardware NUMA topology and parsing it (via >> Elena's patches) we could potentially have at least the >> 'numa_node' information present and figure out if a guest >> is using a PCIe device from the right socket. > I think you're mixing up things here. Afaict Elena's patches > are to introduce _virtual_ NUMA, i.e. it would specifically _not_ > expose the host NUMA properties to the Dom0 kernel. Don't > we have interfaces to expose the host NUMA information to > the tools already? I have recently looked into this when playing with xen support in hwloc. Xen can export its vcpu_to_{socket,node,core} mappings for the toolstack to consume, and for each node expose an count of used and free pages, along with a square matrix of distances from the SRAT table. The counts of used pages are problematic, because it includes pages mapping MMIO regions, which is different to the logical expectation of just being RAM > >> So what I am wondering is: >> 1) Were there any plans for the XEN_PCI_DEV_PXM in the >> hypervisor? Were there some prototypes for exporting the >> PCI device BDF and NUMA information out. > As said above: Intentions (I wouldn't call it plans) yes, prototypes > no. > >> 2) Would it be better to just look at making the initial domain >> be able to figure out the NUMA topology and assign the >> correct 'numa_node' in the PCI fields? > As said above, I don't think this should be exposed to and > handled in Dom0's kernel. It's the tool stack to have the overall > view here. This is where things get awkward. Dom0 has the real APCI tables and is the only entity with the ability to evaluate the _PXM() attributes to work out which PCI devices belong to which NUMA nodes. On the other hand, its idea of cpus and numa is stifled by being virtual and generally not having access to all the cpus it can see as present in the ACPI tables. It would certainly be nice for dom0 to report the _PXM() attributes back up to Xen, but I have no idea how easy/hard it would be. > >> 3). If either option is used, would taking that information in-to >> advisement when launching a guest with either 'cpus' or 'numa-affinity' >> or 'pci' and informing the user of a better choice be good? >> Or would it be better if there was some diagnostic tool to at >> least tell the user whether their PCI device assignment made >> sense or not? Or perhaps program the 'numa-affinity' based on >> the PCIe socket location? > I think issuing hint messages would be nice. Automatic placement > could clearly also take assigned devices' localities into consideration, > i.e. one could expect assigned devices to result in the respective > nodes to be picked in preference (as long as CPU and memory > availability allow doing so). > > Jan > Diagnostic tool is arguably in the works, having been done in my copious free time, and rather more activly on the hwloc-devel list than xen-devel, given the current code freeze. http://xenbits.xen.org/gitweb/?p=people/andrewcoop/hwloc.git;a=shortlog;h=refs/heads/hwloc-xen-topology-v4 http://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=shortlog;h=refs/heads/hwloc-support-experimental-v2 One vague idea I had was to see about using hwlocs placement algorithms to help advise domain placement, but I have not yet done any investigation into the feasibility of this. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |