[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v10 1/6] x86: detect and initialize Cache QoS Monitoring feature



>>> On 02.04.14 at 14:40, <dongxiao.xu@xxxxxxxxx> wrote:
>>  -----Original Message-----
>> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
>> Sent: Wednesday, April 02, 2014 4:37 PM
>> To: Xu, Dongxiao
>> Cc: andrew.cooper3@xxxxxxxxxx; Ian.Campbell@xxxxxxxxxx;
>> Ian.Jackson@xxxxxxxxxxxxx; stefano.stabellini@xxxxxxxxxxxxx;
>> xen-devel@xxxxxxxxxxxxx; konrad.wilk@xxxxxxxxxx; dgdegra@xxxxxxxxxxxxx;
>> keir@xxxxxxx 
>> Subject: RE: [PATCH v10 1/6] x86: detect and initialize Cache QoS Monitoring
>> feature
>> 
>> >>> On 02.04.14 at 04:17, <dongxiao.xu@xxxxxxxxx> wrote:
>> >> Iirc disabled CPUs (i.e. hot-pluggable ones) are still listed in MADT,
>> >> including their LAPIC ID (which the socket number is being calculated
>> >> from).
>> >
>> > I dumped one MADT table from the system, and following is one piece from 
> it:
>> > If the CPU is there, then APIC ID is shown as a valid value, while CPU is
>> > not there, then APIC ID is shown as 0xFF.
>> > I know we can get socket info if CPU is there (by certain calculation), but
>> > if CPU is not there, could you help to explain how can we get the socket
>> > information from it?
>> 
>> Now a fundamental question is whether this was on a system where
>> CPUs are actually hotpluggable - it's quite common for APIC IDs to be
>> set to invalid values when the firmware just allocates a large table
>> and only fills (and marks enabled) the slots for existing CPUs.
>> 
>> In any event with sparse APIC ID allocations like this
>> 
>> ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
>> ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
>> ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] enabled)
>> ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] enabled)
>> ACPI: LAPIC (acpi_id[0x10] lapic_id[0x20] enabled)
>> ACPI: LAPIC (acpi_id[0x12] lapic_id[0x22] enabled)
>> ACPI: LAPIC (acpi_id[0x14] lapic_id[0x24] enabled)
>> ACPI: LAPIC (acpi_id[0x16] lapic_id[0x26] enabled)
>> 
>> making assumptions about the socket ID range being a subset of
>> the NR_CPUS one is wrong - you'll either need some data structure
>> other than a flat array to deal with this, or at least dimension the
>> thing using MAX_APICS (but you'll then realize that you end up
>> with an even larger array, i.e. it'll become less acceptable). And
>> yes, other users making invalid assumptions on the socket ID
>> ranges should be fixed as well (i.e. likely you would want a cleanup
>> patch first, and then put the your QoS stuff on top).
> 
> From your above explanation, it seems that we still cannot get exact socket 
> number from the MADT table,

Why not? The socket IDs are derived from the LAPIC IDs, and as long
as you have the full set of the latter, you can also determine the full
set of the former.

> what we can get is the possible CPU numbers, 
> which may also be a very big value.
> As you mentioned, NR_CPUS may not able to cover the socket_id, but it can 
> surely cover the socket number, so the following allocation code is OK from 
> correctness point of view.

That depends on what you intend to use for indexing the array.

> Do you agree? I think it doesn't differ a lot compared with getting the 
> upper value from ACPI MADT table.
> 
>> +    /* Allocate the buffer that holds MFNs of per-socket CQM LLC */
>> +    order = get_order_from_bytes(sizeof(unsigned long) * NR_CPUS);
>> +    cqm->socket_l3c_mfn = alloc_xenheap_pages(order, 0);
> 
> But for the socket_id issue, I will introduce another dimension in the CQM 
> buffer data structure in my code in case socket_id may be bigger than 
> NR_CPUS.

Another dimension for something that can reasonably sparse? I
was thinking of using e.g. a radix tree instead.

> As we implemented the dynamic CQM buffer memory allocation in this v10 
> patch, when hot plug a new CPU into a socket, we need to dynamically allocate 
> certain continuous memory for it, which might fail. But the case is very rare 
> and even the allocation failure will not impact other components.
> 
> Is it acceptable for you?

Not really, no. I hate to repeat myself: We should be able to get
away without multi-page runtime allocations.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.