[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: sched=null vwfi=native and call_rcu()
> On Jan 14, 2022, at 9:01 PM, Stefano Stabellini <sstabellini@xxxxxxxxxx> > wrote: > > On Fri, 14 Jan 2022, Dario Faggioli wrote: >> On Thu, 2022-01-06 at 17:52 -0800, Stefano Stabellini wrote: >>> On Thu, 6 Jan 2022, Julien Grall wrote: >>>> >>>> This issue and solution were discussed numerous time on the ML. In >>>> short, we >>>> want to tell the RCU that CPU running in guest context are always >>>> quiesced. >>>> For more details, you can read the previous thread (which also >>>> contains a link >>>> to the one before): >>>> >>>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fxen-devel%2Ffe3dd9f0-b035-01fe-3e01-ddf065f182ab%40codiax.se%2F&data=04%7C01%7CGeorge.Dunlap%40citrix.com%7Cb6795e0be3af416841a408d9d7a12030%7C335836de42ef43a2b145348c2ee9ca5b%7C0%7C0%7C637777909305566330%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=9%2BoiFfdK3rGAeWFSNCRu5aSuYgql1XZcaGJgT3aRsOA%3D&reserved=0 >>> >>> Thanks Julien for the pointer! >>> >>> Dario, I forward-ported your three patches to staging: >>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.com%2Fxen-project%2Fpeople%2Fsstabellini%2Fxen%2F-%2Ftree%2Frcu-quiet&data=04%7C01%7CGeorge.Dunlap%40citrix.com%7Cb6795e0be3af416841a408d9d7a12030%7C335836de42ef43a2b145348c2ee9ca5b%7C0%7C0%7C637777909305566330%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=vrNN5KgwXj93ZThreIDNB7UgKJdPNz%2BoL98b%2FoopN8w%3D&reserved=0 >>> >> Hi Stefano! >> >> I definitely would like to see the end of this issue, so thanks a lot >> for your interest and your help with the patches. >> >>> I can confirm that they fix the bug. Note that I had to add a small >>> change on top to remove the ASSERT at the beginning of >>> rcu_quiet_enter: >>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.com%2Fxen-project%2Fpeople%2Fsstabellini%2Fxen%2F-%2Fcommit%2F6fc02b90814d3fe630715e353d16f397a5b280f9&data=04%7C01%7CGeorge.Dunlap%40citrix.com%7Cb6795e0be3af416841a408d9d7a12030%7C335836de42ef43a2b145348c2ee9ca5b%7C0%7C0%7C637777909305566330%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=vjxT35b%2FglqzzA4DCLqTjbo0bAfOjtLcvN90OFs8U9Q%3D&reserved=0 >>> >> Yeah, that should be fine. >> >>> Would you be up for submitting them for upstreaming? I would prefer >>> if >>> you send out the patches because I cannot claim to understand them >>> completely (except for the one doing renaming :-P ) >>> >> Haha! So, I am up for properly submitting, but there's one problem. As >> you've probably got, the idea here is to use transitions toward the >> guest and inside the hypervisor as RCU quiescence and "activation" >> points. >> >> Now, on ARM, that just meant calling rcu_quiet_exit() in >> enter_hypervisor_from_guest() and calling rcu_quiet_enter() in >> leave_hypervisor_to_guest(). Nice and easy, and even myself --and I'm >> definitely not an ARM person-- cloud understand it (although with some >> help from Julien) and put the patches together. >> >> However, the problem is really arch independent and, despite not >> surfacing equally frequently, it affects x86 as well. And for x86 the >> situation is by far not equally nice, when it comes to identifying all >> the places from where to call rcu_quiet_{enter,exit}(). >> >> And finding out where to put them, among the various functions that we >> have in the various entry.S variants is where I stopped. The plan was >> to get back to it, but as shamefully as it sounds, I could not do that >> yet. >> >> So, if anyone wants to help with this, handing over suggestions for >> potential good spots, that would help a lot. > > Unfortunately I cannot volunteer due to time and also because I wouldn't > know where to look and I don't have a reproducer or a test environment > on x86. I would be flying blind. > > >> Alternatively, we can submit the series as ARM-only... But I fear that >> the x86 side of things would then be easily forgotten. :-( > > I agree with you on this, but at the same time we are having problems > with customers in the field -- it is not like we can wait to solve the > problem on ARM any longer. And the issue is certainly far less likely to > happen on x86 (there is no vwfi=native, right?) In other words, I think > it is better to have half of the solution now to solve the worst part of > the problem than to wait more months for a full solution. An x86 equivalent of vwfi=native could be implemented easily, but AFAIK nobody has asked for it yet. I agree that we need to fix if for ARM, and so in the absence of someone with the time to fix up the x86 side, I think fixing ARM-only is the way to go. It would be good if we could add appropriate comments warning anyone who implements `hlt=native` on x86 the problems they’ll face and how to fix them. Not sure the best place to do that; in the VMX / SVM code that sets the exit for HLT &c? -George
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |