[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v10 1/6] x86: detect and initialize Cache QoS Monitoring feature
> -----Original Message----- > From: Jan Beulich [mailto:JBeulich@xxxxxxxx] > Sent: Wednesday, April 02, 2014 10:48 PM > To: Xu, Dongxiao > Cc: andrew.cooper3@xxxxxxxxxx; Ian.Campbell@xxxxxxxxxx; > Ian.Jackson@xxxxxxxxxxxxx; stefano.stabellini@xxxxxxxxxxxxx; > xen-devel@xxxxxxxxxxxxx; konrad.wilk@xxxxxxxxxx; dgdegra@xxxxxxxxxxxxx; > keir@xxxxxxx > Subject: RE: [PATCH v10 1/6] x86: detect and initialize Cache QoS Monitoring > feature > > >>> On 02.04.14 at 14:40, <dongxiao.xu@xxxxxxxxx> wrote: > >> -----Original Message----- > >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx] > >> Sent: Wednesday, April 02, 2014 4:37 PM > >> To: Xu, Dongxiao > >> Cc: andrew.cooper3@xxxxxxxxxx; Ian.Campbell@xxxxxxxxxx; > >> Ian.Jackson@xxxxxxxxxxxxx; stefano.stabellini@xxxxxxxxxxxxx; > >> xen-devel@xxxxxxxxxxxxx; konrad.wilk@xxxxxxxxxx; dgdegra@xxxxxxxxxxxxx; > >> keir@xxxxxxx > >> Subject: RE: [PATCH v10 1/6] x86: detect and initialize Cache QoS > >> Monitoring > >> feature > >> > >> >>> On 02.04.14 at 04:17, <dongxiao.xu@xxxxxxxxx> wrote: > >> >> Iirc disabled CPUs (i.e. hot-pluggable ones) are still listed in MADT, > >> >> including their LAPIC ID (which the socket number is being calculated > >> >> from). > >> > > >> > I dumped one MADT table from the system, and following is one piece from > > it: > >> > If the CPU is there, then APIC ID is shown as a valid value, while CPU is > >> > not there, then APIC ID is shown as 0xFF. > >> > I know we can get socket info if CPU is there (by certain calculation), > >> > but > >> > if CPU is not there, could you help to explain how can we get the socket > >> > information from it? > >> > >> Now a fundamental question is whether this was on a system where > >> CPUs are actually hotpluggable - it's quite common for APIC IDs to be > >> set to invalid values when the firmware just allocates a large table > >> and only fills (and marks enabled) the slots for existing CPUs. > >> > >> In any event with sparse APIC ID allocations like this > >> > >> ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) > >> ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled) > >> ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] enabled) > >> ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] enabled) > >> ACPI: LAPIC (acpi_id[0x10] lapic_id[0x20] enabled) > >> ACPI: LAPIC (acpi_id[0x12] lapic_id[0x22] enabled) > >> ACPI: LAPIC (acpi_id[0x14] lapic_id[0x24] enabled) > >> ACPI: LAPIC (acpi_id[0x16] lapic_id[0x26] enabled) > >> > >> making assumptions about the socket ID range being a subset of > >> the NR_CPUS one is wrong - you'll either need some data structure > >> other than a flat array to deal with this, or at least dimension the > >> thing using MAX_APICS (but you'll then realize that you end up > >> with an even larger array, i.e. it'll become less acceptable). And > >> yes, other users making invalid assumptions on the socket ID > >> ranges should be fixed as well (i.e. likely you would want a cleanup > >> patch first, and then put the your QoS stuff on top). > > > > From your above explanation, it seems that we still cannot get exact socket > > number from the MADT table, > > Why not? The socket IDs are derived from the LAPIC IDs, and as long > as you have the full set of the latter, you can also determine the full > set of the former. I dumped the MADT table from the machine that supports CPU hotplug. However if I remove some CPUs from the sockets, those disabled CPU's APIC ID is shown as 0xFF. Therefore to me it is difficult to get the full set of the LAPIC IDs, so as the socket IDs. > > > what we can get is the possible CPU numbers, > > which may also be a very big value. > > As you mentioned, NR_CPUS may not able to cover the socket_id, but it can > > surely cover the socket number, so the following allocation code is OK from > > correctness point of view. > > That depends on what you intend to use for indexing the array. NR_CPUS is only to cover the number of sockets, but will not use it to index the array. > > > Do you agree? I think it doesn't differ a lot compared with getting the > > upper value from ACPI MADT table. > > > >> + /* Allocate the buffer that holds MFNs of per-socket CQM LLC */ > >> + order = get_order_from_bytes(sizeof(unsigned long) * NR_CPUS); > >> + cqm->socket_l3c_mfn = alloc_xenheap_pages(order, 0); > > > > But for the socket_id issue, I will introduce another dimension in the CQM > > buffer data structure in my code in case socket_id may be bigger than > > NR_CPUS. > > Another dimension for something that can reasonably sparse? I > was thinking of using e.g. a radix tree instead. Not a sparse dimension. I was thinking of adding a data structure, which is something like below: struct cqm_data { unsigned int socket_id; unsigned long l3c; } The socket_id field in this structure can be sparse, to avoid the issue where socket_id might be bigger than NR_CPUS. In current implementation, the CQM buffer is shared between Xen and Dom0 tool stack. If use radix tree, it is implemented as a list, where all nodes are allocated by xmalloc(). It is difficult to share these radix nodes between Xen and Dom0 tool stack. Besides, current libxl/libxc doesn't have such radix APIs. > > > As we implemented the dynamic CQM buffer memory allocation in this v10 > > patch, when hot plug a new CPU into a socket, we need to dynamically > > allocate > > certain continuous memory for it, which might fail. But the case is very > > rare > > and even the allocation failure will not impact other components. > > > > Is it acceptable for you? > > Not really, no. I hate to repeat myself: We should be able to get > away without multi-page runtime allocations. In the CQM case, the order for per-socket LLC buffer is 1, that is 2 continuous pages. But if you are unhappy with that, maybe I will allocate the page twice with order=0. Dongxiao > > Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |