[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] HVM CPU enumeration, mapping to VCPU ID (Was: Re: [Xen-users] FreeBSD PVHVM call for testing)



On Mon, Jun 03, 2013 at 08:57:26AM -0700, Matt Wilson wrote:
> On Mon, Jun 03, 2013 at 10:44:43AM -0400, Konrad Rzeszutek Wilk wrote:
> > On Fri, May 31, 2013 at 11:53:22AM -0700, Matt Wilson wrote:
> > > On Fri, May 31, 2013 at 09:21:50AM +0100, Ian Campbell wrote:
> > > > On Thu, 2013-05-30 at 10:16 -0700, Matt Wilson wrote:
> > > > > 
> > > > > On bare metal x86 Linux, the kernel enumerates CPUs based on an order
> > > > > defined by the BIOS.
> > > > >  Typically this means that all the cores are
> > > > > enumerated first, followed by logical processors (HT/SMT). For Linux,
> > > > > maxcpus=N/2 should disable HT on systems that enumerate processors in
> > > > > the recommended order. Some history:
> > > > >   https://bugzilla.kernel.org/show_bug.cgi?id=2317
> > > > 
> > > > How the guest chooses to enumerate the CPUs is not terribly relevant so
> > > > long as the Xen specific code for that OS knows how to invert that
> > > > mapping to get at the underlying ABI which determines Xen's VCPUID for a
> > > > CPU.
> > > 
> > > Indeed.
> > > 
> > > > I think I was wrong to focus on the guest enumeration scheme before,
> > > > what actually matters is where in our ABI we expose the VCPUID, which
> > > > isn't at all clear to me.
> > > 
> > > Agreed.
> > > 
> > > > > The virtual BIOS provides both ACPI tables and a legacy MP-table that
> > > > > gives the LAPIC id mapping. The guest could infer the Xen vCPU ID from
> > > > > a processor's position in these tables.
> > > > 
> > > > Do we consider the ordering given in any of those tables to be an HVM
> > > > guest ABI? What about the lapic_id == 2*vcpuid -- is that multiplication
> > > > factor part of the ABI (i.e. is the guest expected to pass lapic_id/2 to
> > > > vcpuop)?
> > > 
> > > I strongly prefer the order in the BIOS tables, *not* the
> > > lapic_id = 2*vcpuid formula. Once I've done some libxl work, I'll be
> > > proposing a patch that makes the LAPIC / x2APIC IDs configurable,
> > > and that will break this assumption.
> > > 
> > > > >  Or we could add a VCPUOP that an enlightened guest could use to get
> > > > > the information more directly.
> > > > 
> > > > I'm hoping that there is some existing interface which I simply don't
> > > > know about, but yes this could be the answer if such a thing doesn't
> > > > exist.
> > > 
> > > I don't know of one that provide the information explicitly. It might
> > > be easiest to provide this as a hypervisor CPUID leaf so it can be
> > > used in early boot.
> > > 
> > > > > One question: why does a hypercall take a parameter that only has one
> > > > > valid value? That value can be determined by looking at the current
> > > > > running vCPU.
> > > > 
> > > > The generic prototype is:
> > > >         vcpu_op(int cmd, int vcpuid, void *extra_args)
> > > > Some cmds can act on any vcpuid and others can only act on the current
> > > > vcpu. In an ideal world we would have had VCPUID_SELF or something but
> > > > its a bit late for that.
> > > 
> > > Yea, that makes sense.
> > > 
> > > > > The *2 is just for assigning the LAPIC ID, and I'm pretty sure that
> > > > > Linux is assigning processor IDs sequentially at ACPI parse time.
> > > > 
> > > > That probably doesn't matter, what matters is the Xen specific parts of
> > > > the kernel's ability to reverse that assignment to get at the underlying
> > > > APIC ID, assuming that is actually an ABI from which we can infer the
> > > > VCPU ID...
> > > 
> > > Indeed. This seems to be loosely defined so far, and easy to get wrong
> > > as happened with this FreeBSD work.
> > > 
> > > Konrad, Keir - any thoughts here?
> > 
> > I am a bit confused by 'I strongly prefer the order in the BIOS tables'.
> > The way I understand it - Linux setup up the vCPUs based on the LAPIC
> > which are created by the hvmloader. There are no hypercalls or any
> > lapic_id =2*vcpuid formule in the Linux kernel. I presume what you meant
> > by the lapic_id = 2 * vcpuid is more of this:
> > 
> > 144     for ( i = 0; i < nr_processor_objects; i++ )                        
> >         
> > 145     {                                                                   
> >         
> > 146         memset(lapic, 0, sizeof(*lapic));                               
> >         
> > 147         lapic->type    = ACPI_PROCESSOR_LOCAL_APIC;                     
> >         
> > 148         lapic->length  = sizeof(*lapic);                                
> >         
> > 149         /* Processor ID must match processor-object IDs in the DSDT. */ 
> >         
> > 150         lapic->acpi_processor_id = i;                                   
> >         
> > 151         lapic->apic_id = LAPIC_ID(i);                        
> > 
> > Which sets this up.
> 
> Right, all of the LAPIC information is provided to the guest OS via
> the MADT. I believe what I'm observing is that Linux and Windows use
> the order of entries to enumerate processors in the system.
> 
> What we typically see on bare metal Intel systems is something like
> this (example system has 16 cores with HT):
> 
> All of the "cores"...
> [    0.000000] ACPI: Local APIC address 0xfee00000
> [    0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
> [    0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x02] enabled)
> [    0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x04] enabled)
> [    0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x06] enabled)
> [    0.000000] ACPI: LAPIC (acpi_id[0x04] lapic_id[0x08] enabled)
> ...
> 
> Followed by all of the "threads"...
> [    0.000000] ACPI: LAPIC (acpi_id[0x10] lapic_id[0x01] enabled)
> [    0.000000] ACPI: LAPIC (acpi_id[0x11] lapic_id[0x03] enabled)
> [    0.000000] ACPI: LAPIC (acpi_id[0x12] lapic_id[0x05] enabled)
> [    0.000000] ACPI: LAPIC (acpi_id[0x13] lapic_id[0x07] enabled)
> [    0.000000] ACPI: LAPIC (acpi_id[0x14] lapic_id[0x09] enabled)
> 
> Since Xen hard codes the LAPIC ID (and x2APIC ID) to 0, 2, 4, 6, 8,
> etc. (vCPUID * 2), everything looks like a core.

OK.
> 
> > So .. assuming this was thought out, why are we starting on vCPUs
> > that don't match to this? That seems like a bug? (Note, this is 
> > with maxvcpus=32, vcpus=1 and the starting of a VCPU1 actually
> > ended up starting at VCPU4?!).
> 
> I'm lost. What?

http://lists.xen.org/archives/html/xen-devel/2013-05/msg00941.html
> 
> > I think all of this can be sorted out if the hvmloader sets the
> > LAPIC CPU == VCPU ID properly.
> 
> No, that's not the right answer. Or, at least, not completely. Right
> now Xen provides the same ID for both the LAPIC and x2APIC. In order
> for cpu topology discovery to work, the x2APIC needs to follow a
> particular structure. See the Intel whitepaper on processor topology
> enumeration:
>   
> http://software.intel.com/sites/default/files/m/d/4/1/d/8/Kuo_CpuTopology_rc1.rh1.final.pdf

Nice explanation. However, I was under impression that right now we
don't virtualize the x2APIC registers?
> 
> > So perhaps a better question is - why is it not setup properly
> > nowadays? If the formal is baked in for the PVHVM guests, somewhere
> > the formula is not being evaluated properly?
> 
> The "LAPIC ID = 2 * vCPUID" formula is not baked into any OS that I
> know of, and it shouldn't be. It should all be discovered via
> firmware/BIOS tables. The enumeration order in the tables should,
> under best practices, match the logical processor ID assignment in the
> OS.

OK, good. That is my understanding too.

> 
> > The new hypercall to figure this out could be used, but that wouldn't
> > explain why we are failing to start on the correct VCPU?
> 
> I didn't follow the jump here. Can you provide an example?

http://lists.xen.org/archives/html/xen-devel/2013-05/msg00941.html

> 
> --msw

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.