[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH RFC V2 42/45] xen/sched: add fall back to idle vcpu when scheduling item
On 17/05/2019 08:57, Jan Beulich wrote: >>>> On 17.05.19 at 07:13, <jgross@xxxxxxxx> wrote: >> On 16/05/2019 16:41, Jan Beulich wrote: >>>>>> On 16.05.19 at 15:51, <jgross@xxxxxxxx> wrote: >>>> On 16/05/2019 15:05, Jan Beulich wrote: >>>>>>>> On 06.05.19 at 08:56, <jgross@xxxxxxxx> wrote: >>>>>> --- a/xen/arch/x86/domain.c >>>>>> +++ b/xen/arch/x86/domain.c >>>>>> @@ -154,6 +154,24 @@ static void idle_loop(void) >>>>>> } >>>>>> } >>>>>> >>>>>> +/* >>>>>> + * Idle loop for siblings of active schedule items. >>>>>> + * We don't do any standard idle work like tasklets, page scrubbing or >>>>>> + * livepatching. >>>>>> + * Use default_idle() in order to simulate v->is_urgent. >>>>> >>>>> I guess I'm missing a part of the description which explains all this: >>>>> What's wrong with doing scrubbing work, for example? Why is >>>>> doing tasklet work not okay, but softirqs are? What is the deal with >>>>> v->is_urgent, i.e. what justifies not entering a decent power >>>>> saving mode here on Intel, but doing so on AMD? >>>> >>>> One of the reasons for using core scheduling is to avoid running vcpus >>>> of different domains on the same core in order to minimize the chances >>>> for side channel attacks to data of other domains. Not allowing >>>> scrubbing or tasklets here is due to avoid accessing data of other >>>> domains. >>> >>> So how is running softirqs okay then? And how is scrubbing accessing >>> other domains' data? >> >> Right now I'm not sure whether it is a good idea to block any softirqs. >> We definitely need to process scheduling requests and I believe RCU and >> tasklets, too. The tlbflush one should be uncritical, so timers is the >> remaining one which might be questionable. This can be fine-tuned later >> IMO e.g. by defining a softirq mask of critical softirqs to block and >> eventually splitting up e.g. timer and tasklet softirqs into critical >> and uncritical ones. > > Well, okay, but please add an abridged version of this to the patch > description then. Okay. > >> Scrubbing will probably pull the cache lines of the dirty pages into >> the L1 cache of the cpu. For me this sounds problematic. In case we >> are fine to do scrubbing as there is no risk associated I'm fine to add >> it back in. > > Well, of course there's going to be a brief period of time where > a cache line will be present in CPU internal buffers (it's not just the > cache after all, as we've learned with XSA-297). So I can certainly > buy that when using core granularity you don't want to scrub on > the other thread. But what about the socket granularity case? > Scrubbing on fully idle cores should still be fine, I would think. I think this would depend on the reason for selecting socket scheduling. I'd at least would want to have a way to select that as I could think of e.g. L3-cache side channel attacks, too. So maybe I could add a patch on top for adding a sub-option to the sched-gran parameter which will allow (or disallow?) scrubbing on idle cores or threads. > >>>> As with core scheduling we can be sure the other thread is active >>>> (otherwise we would schedule the idle item) and hoping for saving power >>>> by using mwait is moot. >>> >>> Saving power may be indirect, by the CPU re-arranging >>> resource assignment between threads when one goes idle. >>> I have no idea whether they do this when entering C1, or >>> only when entering deeper C states. >> >> SDM Vol. 3 chapter 8.10.1 "HLT instruction": >> >> "Here shared resources that were being used by the halted logical >> processor become available to active logical processors, allowing them >> to execute at greater efficiency." > > To be honest, this is to broad/generic a statement to fully > trust it, judging from other areas of the SDM. And then, as > per above, what about the socket granularity case? Putting > entirely idle cores to sleep is surely worthwhile? Yes, I assume it is. OTOH this might affect context switches badly as the reaction time for the coordinated switch will rise. Maybe a good reason for another sub-option? >>> And anyway - I'm still none the wiser as to the v->is_urgent >>> relationship. >> >> With v->is_urgent set today's idle loop will drop into default_idle(). >> I can remove this sentence in case it is just confusing. > > I'd prefer if the connection would become more obvious. One > needs to go from ->is_urgent via ->urgent_count to > sched_has_urgent_vcpu() to find where the described > behavior really lives. > > What's worse though: This won't work as intended on AMD > at all. I don't think it's correct to fall back to default_idle() in > this case. Instead sched_has_urgent_vcpu() returning true > should amount to the same effect as max_cstate being set > to 1. There's > (a) no reason not to use MWAIT on Intel CPUs in this case, > if MWAIT can enter C1, and > (b) a strong need to use MWAIT on (at least) AMD Fam17, > or else it won't be C1 that gets entered. > I'll see about making a patch in due course. Thanks. Would you mind doing it in a way that the caller can specify max_cstate? This would remove the need to call sched_has_urgent_vcpu() deep down the idle handling and I could re-use it for my purpose. Juergen _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |