|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v10 09/25] x86: refactor psr: L3 CAT: set value: implement framework.
On 17-04-19 03:00:29, Jan Beulich wrote:
> >>> On 19.04.17 at 10:22, <yi.y.sun@xxxxxxxxxxxxxxx> wrote:
> > On 17-04-18 05:46:43, Jan Beulich wrote:
> >> >>> On 18.04.17 at 12:55, <yi.y.sun@xxxxxxxxxxxxxxx> wrote:
> >> > I made a test patch based on v10 and attached it in mail. Could you
> >> > please
> >> > help to check it? Thanks!
> >>
> >> This looks reasonable at the first glance, albeit I continue to be
> >> unconvinced that this is the only (reasonable) way of solving the
Do you have any other suggestion on this? Thanks!
> >> problem. After all we don't have to go through similar hoops for
> >> any other of the register state associated with a vCPU. There
> >
> > Sorry, I do not understand your meaning clearly.
> > 1. DYM the ASSOC register which is set in 'psr_ctxt_switch_to'? In this
> > patch,
> > we check 'dom_ids' array to know if the domain's cos id has not been set
> > but
> > its 'psr_cos_ids[socket]' already has a non zero value. This case means
> > the
> > socket offline has happened so that we have to restore the domain's
> > 'psr_cos_ids[socket]' to default value 0 which points to default COS MSR.
> > I think we have discussed this in previous mails and achieved agreement
> > on
> > such logic.
> > 2. DYM the COS MSRs? We have discussed this before. Per your comments, when
> > socket is online, the registers values may be modified by FW and others.
> > So, we have to restore them to default value which is being done in
> > 'cat_init_feature'.
> >
> > So, what is your exactly meaning here? Thanks!
>
> Neither of the two. I'm comparing with COS/PSR-_unrelated_
> handling of register state. After all there are other MSRs which
> we need to put into the right state after a core comes online.
>
For PSR feature, the 'cos_reg_val[]' of the feature is destroied when socket
is offline. The values in it are all 0 when socket is online again. This causes
error when user shows the CBMs. So, we have two options when socket is online:
1. Read COS MSRs values and save them into 'cos_reg_val[]'.
2. Restore COS MSRs values and 'cos_reg_val[]' values to default CBM.
Per our discussion on v9 patch 05, we decided to adopt option 2. Below are
what we discussed. FYR.
[v9 patch 05]
>> >> > + /* cos=0 is reserved as default cbm(all bits within cbm_len are
>> >> > 1). */
>> >> > + feat->cos_reg_val[0] = cat_default_val(cat.cbm_len);
>> >> > + /*
>> >> > + * To handle cpu offline and then online case, we need read MSRs
>> >> > back to
>> >> > + * save values into cos_reg_val array.
>> >> > + */
>> >> > + for ( i = 1; i <= cat.cos_max; i++ )
>> >> > + {
>> >> > + rdmsrl(MSR_IA32_PSR_L3_MASK(i), val);
>> >> > + feat->cos_reg_val[i] = (uint32_t)val;
>> >> > + }
>> >>
[Jan]
>> >> You mention this in the changes done, but I don't understand why
>> >> you do this. What meaning to these values have to you? If you want
>> >> hardware and cached values to match up, the much more conventional
>> >> way of enforcing this would be to write the values you actually
>> >> want (normally all zero).
>> >>
[Sun Yi]
>> > When all cpus on a socket are offline, the free_feature() is called
>> > to free features resources so that the values saved in
>> > cos_reg_val[] are lost. When the socket is online again, features
>> > are allocated again so that cos_reg_val[] members are all
>> > initialized to 0. Only is cos_reg_val[0] initialized to default value in
>> > this function in old codes.
>> >
>> > But domain is still alive so that its cos id on the socket is kept.
>> > The corresponding MSR value is kept too per test. To make
>> > cos_reg_val[] values be same as HW to not to mislead user, we
>> > should read back the valid values on HW into cos_reg_val[].
>>
[Jan]
>> Okay, I understand the background, but I don't view this solution as
>> viable: Once the last core on a socket goes offline, all references
>> to it should be cleaned up. After all what will be brought back
>> online may be a different physical CPU altogether; you can't assume
>> MSR values to have survived even if it is the same CPU which comes
>> back online, as it may have undergone a reset cycle, or BIOS/SMM may
>> have played with the MSRs.
>> That's even a possibility for a single core coming back online, so
>> you have to reload MSRs explicitly anyway if implicit reloading (i.e.
>> once vCPU-s get scheduled onto it) doesn't suffice.
>>
[Sun Yi]
> So, you think the MSRs values may not be valid after such process and
> reloading (write MSRs to default value) is needed. If so, I would like
> to do more operations in 'free_feature()':
> 1. Iterate all domains working on the offline socket to change
> 'd->arch.psr_cos_ids[socket]' to COS 0, i.e restore it back to init
> status.
> 2. Restore 'socket_info[socket].cos_ref[]' to all 0.
>
> These can make the socket's info be totally restored back to init status.
[Jan]
Yes, that's what I think is needed.
> >> are a number of cosmetic issues, but commenting on an attached
> >> (rather than inlined) patch is a little difficult.
> >>
> > Sorry for that, I will directly send patch out next time.
> >
> >> One remark regarding the locking though: Acquiring a lock in the
> >> context switch path should be made as low risk of long stalls as
> >> possible. Therefore you will want to consider using r/w locks
> >> instead of spin locks here, which would allow parallelism on all
> >> cores of a socket as long as COS IDs aren't being updated.
> >>
> > In 'psr_ctxt_switch_to', I use the lock only to protect 'write' actions.
> > So, I do not understand why read-write lock is better? Anything I don't
> > consider? Please help to point out. Thanks!
>
> Hmm, looking again I can see that r/w locks indeed may not help
> here. However, what you say still doesn't look correct to me: You
> acquire the lock depending on _only_ psra->cos_mask being non-
> zero. This means that all cores on one socket are being serialized
> here. Quite possibly all you need is for some of the checks done
> inside the locked region to be replicated (but _not_ moved) to the
> outer if(), to limit the number of times where the lock is to be
> acquired.
>
I think your suggestions is to check old_cos outer the lock region.
If it is not 0 which means the domain's cos id does not need restore
to 0, we can directly set it into ASSOC register. That can avoid
unnecessary lock. I will send out the test patch again to ask your
help to provide review comments (you said there are also 'a number
of cosmetics issues'). Thanks!
> Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |