|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH RESEND] x86/vpt: Replace per-guest pt_migrate lock with per pt lock
On 3/25/21 10:48 AM, Roger Pau Monné wrote:
> On Wed, Mar 24, 2021 at 05:05:05PM -0400, Boris Ostrovsky wrote:
>> Commit 8e76aef72820 ("x86/vpt: fix race when migrating timers between
>> vCPUs") addressed XSA-336 by introducing a per-domain rwlock that was
>> intended to protect periodic timer during VCPU migration. Since such
>> migration is an infrequent event no performance impact was expected.
>>
>> Unfortunately this turned out not to be the case: on a fairly large
>> guest (92 VCPUs) we've observed as much as 40% TPCC performance regression
>> with some guest kernels. Further investigation pointed to pt_migrate
>> read lock taken in pt_update_irq() as the largest contributor to this
>> regression. With large number of VCPUs and large number of VMEXITs
>> (from where pt_update_irq() is always called) the update of an atomic in
>> read_lock() is thought to be the main cause.
> Right, seems like a very plausible analysis.
>
> Since I suspect most (if not all?)
pt_restore_timer() (called from the scheduler) also contributes a couple of
percent. But yes.
> of the performance regression is
> from the read_lock in pt_update_irq I think we can remove that without
> doing such a big change to the current locking logic, and instead
> merging part of the logic that you detail in the cover letter without
> moving to a per-timer lock.
>
>> Stephen Brennan examined the locking pattern and suggested using a
>> per-timer lock instead. This lock will need to be held whenever there is
>> a chance that pt->vcpu field may change (thus avoiding XSA-336
>> condition).
>>
>> Suggested-by: Stephen Brennan <stephen.s.brennan@xxxxxxxxxx>
>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
> So while I think this is indeed a working solution, I'm not convinced
> we require a per-timer lock, I think we can find some middle ground
> using both a per-domain rwlock and the more fine grained per-vcpu
> lock.
>
> Basically for type 1 accesses (pt_vcpu_{un}lock) I think we can safely
> drop the read_{un}lock call,
Yes, if that's the case then current rwlock should be fine.
> and remove the performance bottleneck
> while slightly adjusting the functions that modify the per-vcpu timer
> lists to take the per-vcpu locks when doing so.
>
> I've tried to convey that in the comments below, while also pointing
> out some suitable places where comments can be added based on the text
> from your cover letter.
>
> Overall this should result in a smaller patch that will be both easier
> to review and argue in terms of inclusion into 4.15.
Sure. Thanks for the review/suggestions.
-boris
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |