[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v2] x86/mm: also flush TLB when putting writable foreign page reference
>>> On 25.04.17 at 12:59, <tim@xxxxxxx> wrote: > Hi, > > At 02:59 -0600 on 25 Apr (1493089158), Jan Beulich wrote: >> Jann's explanation of the problem: >> >> "start situation: >> - domain A and domain B are PV domains >> - domain A and B both have currently scheduled vCPUs, and the vCPUs >> are not scheduled away >> - domain A has XSM_TARGET access to domain B >> - page X is owned by domain B and has no mappings >> - page X is zeroed >> >> steps: >> - domain A uses do_mmu_update() to map page X in domain A as writable >> - domain A accesses page X through the new PTE, creating a TLB entry >> - domain A removes its mapping of page X >> - type count of page X goes to 0 >> - tlbflush_timestamp of page X is bumped >> - domain B maps page X as L1 pagetable >> - type of page X changes to PGT_l1_page_table >> - TLB flush is forced using domain_dirty_cpumask of domain B >> - page X is mapped as L1 pagetable in domain B >> >> At this point, domain B's vCPUs are guaranteed to have no >> incorrectly-typed stale TLB entries for page X, but AFAICS domain A's >> vCPUs can still have stale TLB entries that map page X as writable, >> permitting domain A to control a live pagetable of domain B." > > AIUI this patch solves the problem by immediately flushing domain A's > TLB entries at the point where domain A removes its mapping of page X. > > Could we, instead, bitwise OR domain A's domain_dirty_cpumask into > domain B's domain_dirty_cpumask at the same point? > > Then when domain B flushes TLBs in the last step (in __get_page_type()) > it will catch any stale TLB entries from domain A as well. But in the > (hopefully common) case where there's a delay between domain A's > __put_page_type() and domain B's __get_page_type(), the usual TLB > timestamp filtering will suppress some of the IPIs/flushes. So I've given this a try, and failed miserably (including losing an XFS volume on the test machine). The problem is the BUG_ON() at the top of domain_relinquish_resources() - there will, very likely, be bits remaining set if the code added to put_page_from_l1e() set some pretty recently (irrespective of avoiding to set any once ->is_dying has been set). Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |