[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH for-4.14 v3] x86/tlb: fix assisted flush usage





On 26/06/2020 14:21, Paul Durrant wrote:
-----Original Message-----
From: Jan Beulich <jbeulich@xxxxxxxx>
Sent: 26 June 2020 14:11
To: Roger Pau Monné <roger.pau@xxxxxxxxxx>; paul@xxxxxxx; Andrew Cooper 
<andrew.cooper3@xxxxxxxxxx>
Cc: Julien Grall <julien@xxxxxxx>; xen-devel@xxxxxxxxxxxxxxxxxxxx; Wei Liu 
<wl@xxxxxxx>; George Dunlap
<george.dunlap@xxxxxxxxxx>; Ian Jackson <ian.jackson@xxxxxxxxxxxxx>; Stefano 
Stabellini
<sstabellini@xxxxxxxxxx>; Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>
Subject: Re: [PATCH for-4.14 v3] x86/tlb: fix assisted flush usage

On 26.06.2020 12:07, Roger Pau Monné wrote:
On Fri, Jun 26, 2020 at 10:38:11AM +0100, Julien Grall wrote:
Hi Roger,

Sorry I didn't manage to answer to your question before you sent v3.

On 25/06/2020 12:30, Roger Pau Monne wrote:
diff --git a/xen/include/asm-arm/flushtlb.h b/xen/include/asm-arm/flushtlb.h
index ab1aae5c90..7ae0543885 100644
--- a/xen/include/asm-arm/flushtlb.h
+++ b/xen/include/asm-arm/flushtlb.h
@@ -27,6 +27,7 @@ static inline void page_set_tlbflush_timestamp(struct 
page_info *page)
   /* Flush specified CPUs' TLBs */
   void flush_tlb_mask(const cpumask_t *mask);
+#define flush_tlb_mask_sync flush_tlb_mask

Dropping the parameter 'sync' from filtered_flush_tlb_mask() is a nice
improvement, but it unfortunately doesn't fully address my concern.

After this patch there is exactly one use of flush_tlb_mask() in common code
(see grant_table.c). But without looking at the x86 code, it is not clear
why this requires a different flush compare to the two other sites.

It's not dealing with page allocation or page type changes directly,
and hence doesn't need to use an IPI in order to prevent races with
spurious_page_fault.

IOW, if I want to modify the common code in the future, how do I know which
flush to call?

Unless you modify one of the specific areas mentioned above (page
allocation or page type changes) you should use flush_tlb_mask.

This is not ideal, and my aim will be to be able to use the assisted
flush everywhere if possible, so I would really like to get rid of the
interrupt disabling done in spurious_page_fault and this model where
x86 relies on blocking interrupts in order to prevent page type
changes or page freeing.

Such change however doesn't feel appropriate for a release freeze
period, and hence went with something smaller that restores the
previous behavior. Another option is to just disable assisted flushes
for the time being and re-enable them when a suitable solution is
found.

As I can understand Julien's concern, maybe this would indeed be
the better approach for now? Andrew, Paul - thoughts?


Julien's concern seems to be about long term usage whereas IIUC this patch does 
fix the issue at hand, so can we put this patch in now on the basis that Roger 
will do the re-work described after 4.14 (which I think will address Julien's 
concern)?
Bear in mind that while this may be properly fixed in the next release, the hack will stay forever in Xen 4.14.

While I understand that disabling assisted flush is going to have a performance impact, we also need to make sure the interface make senses.

From a generic perspective, a TLB flush should have the exact same guarantee regardless where we call it in common/. So I would still strongly prefer if we have a single helper.

Is it possible to consider to replace all the flush_tlb_mask() in common code by arch_flush_tlb_mask()? On Arm, this would just be a rename. On x86, this would be an alias to flush_tlb_mask_sync()?

Cheers,

--
Julien Grall



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.