[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v10 09/25] x86: refactor psr: L3 CAT: set value: implement framework.
On 17-04-19 03:00:29, Jan Beulich wrote: > >>> On 19.04.17 at 10:22, <yi.y.sun@xxxxxxxxxxxxxxx> wrote: > > On 17-04-18 05:46:43, Jan Beulich wrote: > >> >>> On 18.04.17 at 12:55, <yi.y.sun@xxxxxxxxxxxxxxx> wrote: > >> > I made a test patch based on v10 and attached it in mail. Could you > >> > please > >> > help to check it? Thanks! > >> > >> This looks reasonable at the first glance, albeit I continue to be > >> unconvinced that this is the only (reasonable) way of solving the Do you have any other suggestion on this? Thanks! > >> problem. After all we don't have to go through similar hoops for > >> any other of the register state associated with a vCPU. There > > > > Sorry, I do not understand your meaning clearly. > > 1. DYM the ASSOC register which is set in 'psr_ctxt_switch_to'? In this > > patch, > > we check 'dom_ids' array to know if the domain's cos id has not been set > > but > > its 'psr_cos_ids[socket]' already has a non zero value. This case means > > the > > socket offline has happened so that we have to restore the domain's > > 'psr_cos_ids[socket]' to default value 0 which points to default COS MSR. > > I think we have discussed this in previous mails and achieved agreement > > on > > such logic. > > 2. DYM the COS MSRs? We have discussed this before. Per your comments, when > > socket is online, the registers values may be modified by FW and others. > > So, we have to restore them to default value which is being done in > > 'cat_init_feature'. > > > > So, what is your exactly meaning here? Thanks! > > Neither of the two. I'm comparing with COS/PSR-_unrelated_ > handling of register state. After all there are other MSRs which > we need to put into the right state after a core comes online. > For PSR feature, the 'cos_reg_val[]' of the feature is destroied when socket is offline. The values in it are all 0 when socket is online again. This causes error when user shows the CBMs. So, we have two options when socket is online: 1. Read COS MSRs values and save them into 'cos_reg_val[]'. 2. Restore COS MSRs values and 'cos_reg_val[]' values to default CBM. Per our discussion on v9 patch 05, we decided to adopt option 2. Below are what we discussed. FYR. [v9 patch 05] >> >> > + /* cos=0 is reserved as default cbm(all bits within cbm_len are >> >> > 1). */ >> >> > + feat->cos_reg_val[0] = cat_default_val(cat.cbm_len); >> >> > + /* >> >> > + * To handle cpu offline and then online case, we need read MSRs >> >> > back to >> >> > + * save values into cos_reg_val array. >> >> > + */ >> >> > + for ( i = 1; i <= cat.cos_max; i++ ) >> >> > + { >> >> > + rdmsrl(MSR_IA32_PSR_L3_MASK(i), val); >> >> > + feat->cos_reg_val[i] = (uint32_t)val; >> >> > + } >> >> [Jan] >> >> You mention this in the changes done, but I don't understand why >> >> you do this. What meaning to these values have to you? If you want >> >> hardware and cached values to match up, the much more conventional >> >> way of enforcing this would be to write the values you actually >> >> want (normally all zero). >> >> [Sun Yi] >> > When all cpus on a socket are offline, the free_feature() is called >> > to free features resources so that the values saved in >> > cos_reg_val[] are lost. When the socket is online again, features >> > are allocated again so that cos_reg_val[] members are all >> > initialized to 0. Only is cos_reg_val[0] initialized to default value in >> > this function in old codes. >> > >> > But domain is still alive so that its cos id on the socket is kept. >> > The corresponding MSR value is kept too per test. To make >> > cos_reg_val[] values be same as HW to not to mislead user, we >> > should read back the valid values on HW into cos_reg_val[]. >> [Jan] >> Okay, I understand the background, but I don't view this solution as >> viable: Once the last core on a socket goes offline, all references >> to it should be cleaned up. After all what will be brought back >> online may be a different physical CPU altogether; you can't assume >> MSR values to have survived even if it is the same CPU which comes >> back online, as it may have undergone a reset cycle, or BIOS/SMM may >> have played with the MSRs. >> That's even a possibility for a single core coming back online, so >> you have to reload MSRs explicitly anyway if implicit reloading (i.e. >> once vCPU-s get scheduled onto it) doesn't suffice. >> [Sun Yi] > So, you think the MSRs values may not be valid after such process and > reloading (write MSRs to default value) is needed. If so, I would like > to do more operations in 'free_feature()': > 1. Iterate all domains working on the offline socket to change > 'd->arch.psr_cos_ids[socket]' to COS 0, i.e restore it back to init > status. > 2. Restore 'socket_info[socket].cos_ref[]' to all 0. > > These can make the socket's info be totally restored back to init status. [Jan] Yes, that's what I think is needed. > >> are a number of cosmetic issues, but commenting on an attached > >> (rather than inlined) patch is a little difficult. > >> > > Sorry for that, I will directly send patch out next time. > > > >> One remark regarding the locking though: Acquiring a lock in the > >> context switch path should be made as low risk of long stalls as > >> possible. Therefore you will want to consider using r/w locks > >> instead of spin locks here, which would allow parallelism on all > >> cores of a socket as long as COS IDs aren't being updated. > >> > > In 'psr_ctxt_switch_to', I use the lock only to protect 'write' actions. > > So, I do not understand why read-write lock is better? Anything I don't > > consider? Please help to point out. Thanks! > > Hmm, looking again I can see that r/w locks indeed may not help > here. However, what you say still doesn't look correct to me: You > acquire the lock depending on _only_ psra->cos_mask being non- > zero. This means that all cores on one socket are being serialized > here. Quite possibly all you need is for some of the checks done > inside the locked region to be replicated (but _not_ moved) to the > outer if(), to limit the number of times where the lock is to be > acquired. > I think your suggestions is to check old_cos outer the lock region. If it is not 0 which means the domain's cos id does not need restore to 0, we can directly set it into ASSOC register. That can avoid unnecessary lock. I will send out the test patch again to ask your help to provide review comments (you said there are also 'a number of cosmetics issues'). Thanks! > Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |