[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v10 09/25] x86: refactor psr: L3 CAT: set value: implement framework.
On 17-04-13 19:11:54, Yi Sun wrote: > On 17-04-13 04:58:06, Jan Beulich wrote: > > >>> On 13.04.17 at 12:49, <yi.y.sun@xxxxxxxxxxxxxxx> wrote: > > > On 17-04-13 03:41:44, Jan Beulich wrote: > > >> >>> On 13.04.17 at 10:11, <yi.y.sun@xxxxxxxxxxxxxxx> wrote: > > >> > On 17-04-12 06:42:01, Jan Beulich wrote: > > >> >> >>> On 12.04.17 at 14:23, <yi.y.sun@xxxxxxxxxxxxxxx> wrote: > > >> >> > On 17-04-12 03:09:56, Jan Beulich wrote: > > >> >> >> >>> On 12.04.17 at 07:53, <yi.y.sun@xxxxxxxxxxxxxxx> wrote: > > >> >> >> > On 17-04-11 09:01:53, Jan Beulich wrote: > > >> >> >> >> >>> On 01.04.17 at 15:53, <yi.y.sun@xxxxxxxxxxxxxxx> wrote: > > >> >> >> >> Furthermore I'm not at all convinced this is appropriate to do > > >> >> >> >> in the > > >> >> >> >> context of a CPU_UP_CANCELED / CPU_DEAD notification: If you > > >> >> >> >> have a few thousand VMs, the loop above may take a while. > > >> >> >> >> > > >> >> >> > Hmm, that may be a potential issue. I have two proposals below. > > >> >> >> > Could you > > >> >> >> > please help to check which one you prefer? Or provide another > > >> >> >> > solution? > > >> >> >> > > > >> >> >> > 1. Start a tasklet in free_socket_resources() to restore > > >> >> > 'psr_cos_ids[socket]' > > >> >> >> > of all domains. The action is protected by 'ref_lock' to > > >> >> >> > avoid > > >> >> > confliction > > >> >> >> > in 'psr_set_val'. We can reduce 'info->cos_ref[cos]' in > > >> >> >> > tasklet or > > > memset > > >> >> >> > the array to 0 in free_socket_resources(). > > >> >> >> > > > >> >> >> > 2. Move 'psr_cos_ids[]' from 'domain' to 'psr_socket_info' and > > >> >> >> > change > > > index > > >> >> >> > from 'socket' to 'domain_id'. So we keep all domains' COS IDs > > >> >> >> > per > > > socket > > >> >> >> > and can memset the array to 0 when socket is offline. But > > >> >> >> > here is an > > >> >> > issue > > >> >> >> > that we do not know how many members this array should have. > > >> >> >> > I cannot > > >> >> > find > > >> >> >> > a macro something like 'DOMAIN_MAX_NUMBER'. So, I prefer to > > >> >> >> > use > > >> >> > reallocation > > >> >> >> > in 'psr_alloc_cos' if the newly created domain's id is bigger > > >> >> >> > than > > >> >> > current > > >> >> >> > array number. > > >> >> >> > > >> >> >> The number of domains is limited by the special DOMID_* values. > > >> >> >> However, allocating an array with 32k entries doesn't sound very > > >> >> >> reasonable. > > >> >> > > > >> >> > I think 32K entries should be the extreme case. I can allocate e.g. > > >> >> > 100 > > > entries > > >> >> > when the first domain is created. If a new domain's id exceeds 100, > > > reallocate > > >> >> > another 100 entries. The total number of entries allocated should > > >> >> > be less > > > than > > >> >> > 32K. This is a functional requirement which cannot be avoided. How > > >> >> > do you > > >> >> > think? > > >> >> > > >> >> So how many entries would your array have once I start the 32,000th > > >> >> domain (having at any one time at most a single one running, besides > > >> >> Dom0)? > > >> >> > > >> > In such case, we have to keep a 32K array because the domain_id is the > > > index to > > >> > access the array. But this array is per socket so the whole memory > > >> > used > > > should > > >> > not be too much. > > >> > > >> We carefully avoid any runtime allocations of order > 0, so if you > > >> were to set up such an array, you'd need to use vmalloc()/vzalloc(). > > >> But I continue to be unconvinced that we want such a large array > > >> in the first place. > > >> > > >> > After considering this issue more, I think the original codes might > > >> > not be > > >> > so unacceptable. Per my knowledge, Intel Xeon Phi chip can support at > > >> > most > > >> > 288 CPUs. So, I think the domains running at same time in reality may > > >> > not > > > be > > >> > so many (no efficient resources). If this hypothesis is right, a loop > > >> > to > > > write > > >> > 'psr_cos_ids[socket]' of every domain to 0 may not take much time. If > > >> > I am > > >> > wrong, please correct me. Thanks! > > >> > > >> What relationship does the number of CPUs have to the number of > > >> domains on a host? There could be thousands with just a few dozen > > >> CPUs, provided none or very few of them have high demands on > > >> CPU resources. Additionally please never forget that system sizes > > >> basically only ever grow. Plus we wouldn't want a latent issue here > > >> in case we ever end up needing to widen domain IDs beyond 16 bits. > > >> > > > How about a per socket array like this: > > > uint32_t domain_switch[1024]; > > > > > > Every bit represents a domain id. Then, we can handle this case as below: > > > 1. In 'psr_cpu_init()', clear the array to be 0. I think this place is > > > enough to > > > cover socket offline case. We do not need to clear it in > > > 'free_socket_resources'. > > > > > > 2. In 'psr_ctxt_switch_to()', test_and_set_bit(domain_id, domain_switch) > > > to set > > > the bit to 1 according to domain_id. If the old value is 0 and the > > > 'psr_cos_ids[socket]' is not 0, restore 'psr_cos_ids[socket]' to be 0. > > > > > > 3. In 'psr_set_val()', test_and_set_bit(domain_id, domain_switch) to set > > > the bit > > > to 1 too. Then, update 'psr_cos_ids[socket]' according to find/pick > > > flow. > > > > > > Then, we only use 4KB for one socket. > > > > This looks to come closer to something I'd consider acceptable, but > > I may not understand your intentions in full yet: For one, there's > > nowhere you clear the bit (other than presumably during socket > > cleanup). > > Actually, clear the array in 'free_socket_resources' has same effect. I can > move clear action into it. > > > And then I don't understand the test_and_ parts of the > > constructs above, i.e. you don't clarify what the return values > > would be used/needed for. > > > Sorry, 0 means this domain has not been scheduled to the socket yet. If > test_and_ returns 0, that is the first time the domain runs on the socket > (the first time the socket is online). So, we need restore > 'psr_cos_ids[socket]' ^ missed 'after'. I mean the first time the domain is scheduled to the socket after the socket is online. > to 0 in 'psr_ctxt_switch_to()'. > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxx > https://lists.xen.org/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |