[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: sched=null vwfi=native and call_rcu()
On Fri, 14 Jan 2022, Dario Faggioli wrote: > On Thu, 2022-01-06 at 17:52 -0800, Stefano Stabellini wrote: > > On Thu, 6 Jan 2022, Julien Grall wrote: > > > > > > This issue and solution were discussed numerous time on the ML. In > > > short, we > > > want to tell the RCU that CPU running in guest context are always > > > quiesced. > > > For more details, you can read the previous thread (which also > > > contains a link > > > to the one before): > > > > > > https://lore.kernel.org/xen-devel/fe3dd9f0-b035-01fe-3e01-ddf065f182ab@xxxxxxxxx/ > > > > Thanks Julien for the pointer! > > > > Dario, I forward-ported your three patches to staging: > > https://gitlab.com/xen-project/people/sstabellini/xen/-/tree/rcu-quiet > > > Hi Stefano! > > I definitely would like to see the end of this issue, so thanks a lot > for your interest and your help with the patches. > > > I can confirm that they fix the bug. Note that I had to add a small > > change on top to remove the ASSERT at the beginning of > > rcu_quiet_enter: > > https://gitlab.com/xen-project/people/sstabellini/xen/-/commit/6fc02b90814d3fe630715e353d16f397a5b280f9 > > > Yeah, that should be fine. > > > Would you be up for submitting them for upstreaming? I would prefer > > if > > you send out the patches because I cannot claim to understand them > > completely (except for the one doing renaming :-P ) > > > Haha! So, I am up for properly submitting, but there's one problem. As > you've probably got, the idea here is to use transitions toward the > guest and inside the hypervisor as RCU quiescence and "activation" > points. > > Now, on ARM, that just meant calling rcu_quiet_exit() in > enter_hypervisor_from_guest() and calling rcu_quiet_enter() in > leave_hypervisor_to_guest(). Nice and easy, and even myself --and I'm > definitely not an ARM person-- cloud understand it (although with some > help from Julien) and put the patches together. > > However, the problem is really arch independent and, despite not > surfacing equally frequently, it affects x86 as well. And for x86 the > situation is by far not equally nice, when it comes to identifying all > the places from where to call rcu_quiet_{enter,exit}(). > > And finding out where to put them, among the various functions that we > have in the various entry.S variants is where I stopped. The plan was > to get back to it, but as shamefully as it sounds, I could not do that > yet. > > So, if anyone wants to help with this, handing over suggestions for > potential good spots, that would help a lot. Unfortunately I cannot volunteer due to time and also because I wouldn't know where to look and I don't have a reproducer or a test environment on x86. I would be flying blind. > Alternatively, we can submit the series as ARM-only... But I fear that > the x86 side of things would then be easily forgotten. :-( I agree with you on this, but at the same time we are having problems with customers in the field -- it is not like we can wait to solve the problem on ARM any longer. And the issue is certainly far less likely to happen on x86 (there is no vwfi=native, right?) In other words, I think it is better to have half of the solution now to solve the worst part of the problem than to wait more months for a full solution.
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |