[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [Xen-devel] "cpus" config parameter broken?
Thanks for the reply and sorry for the delay in mine... I've been having email problems. Please note proposal and request for comments below (marked with >>>>>> Comments? <<<<<) > > 1) Is the "cpus" parameter expected to work in a config > file or is it > > somehow deprecated? > > (I see there is an "xm vcpu-pin" command so perhaps this is the > > accepted way to > > pin cpu's?) > > It's expected to work. Yes indeed it does work. There were some syntax variations in the cpus param that I didn't quite understand. However, my misunderstanding uncovered another interesting problem. See below. > > 3) Does "cpus" really have any real-world usage anyhow? > E.g. are most > > uses probably just > > user misunderstanding where "vcpu_avail" should be used instead? > > I'm sure some admins use it to good effect in hand placing > domains on CPUs, especially in a NUMA context. In most cases > its typically best to be fully work conserving and give Xen's > scheduler full flexibility. Yeah, I guess if you think of it as "poor man's hard partitioning" it makes a lot of sense. But if you think of it in a utility data center context, true affinity rather than restriction may make more sense. And vcpu_avail should cover most app licensing/pricing concerns. > > what happens if the vcpu is ready to schedule but none of the > > restricted set of pcpu's is available? > > It's a restriction. Each of the values in the mask is > processed modulo the number of physical CPUs. The output from "xm vcpu-list" observes the "modulo" but apparently the scheduler does not. For example on a 2 pcpu system launching a 2 vcpu guest with cpus=0,3 (noting that 3 mod 2 = 1), "xm vcpu-list" shows that each of the 2 vcpu's of the guest have "any cpu" in the "CPU Affinity" column, reflecting the fact that 0,3 is modulo the same as 0,1 which is the same as 0-1 which is the same as all. However, the cpu_mask is saved as 0,3 and the scheduler ignores any pcpu's other than 0 and 1. This can be observed in "xm vcpu-list" in the above example by seeing that both guest vcpus are sharing processor 0. So the results displayed by "xm vcpu-list" and the actual scheduler placement are different, but which one is the bug? Consider: If a 2 vcpu guest is running on an 8 pcpu machine and has been restricted to cpus="2,3,4,5" and this 2 vcpu guest gets migrated to a 4 pcpu system, to which pcpus should the migrated guest be restricted? Using the xm_vcpu-list logic it gets all 4 pcpus, but (if cpu_mask were preserved which it currently isn't) the scheduler logic would give it just two (2 and 3). And suppose this 2 vcpu guest on the 8 pcpu system were restricted to "5-8" and migrated to a 4 pcpu system. It wouldn't get any processor time at all (though xm_vcpu-list would say each vcpu's CPU Affinity is "any"). Because affinity/cpu_restriction is not currently preserved across save/restore or migration, this is a moot discussion. But if I were to "fix" it so it were preserved, the decision is important. My opinion: CPU affinity/restriction should NOT be preserved across migration. Or if it is, it should only be preserved when the source and target have the same number of pcpus (thus allowing save/restore to work OK). Or maybe it should only be preserved for save/restore and not for migration. >>>>>>>>>>>>>>>>> Comments? <<<<<<<<<<<<<<<<<<<<<<<<<<<<< Note that vcpu_avail would still work across migration. (Hmmm... have to look to see if vcpu_avail is currently preserved across save/restore/migration. If not, I will definitely need to find and fix that one.) > There was an extension to the cpus= syntax proposed at one > point that I'm not sure whether it ever got checked in. The > idea was to allow the cpus= parameter to be a list of > strings, enabling a different mask to specified for each > VCPU. This would enable an admin to pin individual VCPUs to > CPUs rather than just at a domain level. It looks like the internal vcpu data structure supports this and xm_vcpu-pin supports it, but afaict there's no way to specify per-vcpu-affinity at xm_create. > I'm not a huge fan of the cpus= mechanism. It would likely be > more user friendly to allow physical CPUs to be put into > groups then allow domains to be assigned to CPU groups. It > would also be better if you could specify physical CPUs by a > node.socket.core.thread hierarchy rather than the enumerated > CPU number. Agreed, though I'll bet that would take major scheduler surgery. And this would also further increase the confusion for migration! I'd also like to see affinity and restriction teased apart because they are separate concepts with different uses. Dan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |