[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [V3 PATCH 7/9] x86/hvm: pkeys, add pkeys support for guest_walk_tables
On 18/12/15 08:21, Han, Huaitong wrote: > On Wed, 2015-12-16 at 15:36 +0000, George Dunlap wrote: >> [Adding Tim, the previous mm maintainer] >> >> On 11/12/15 09:16, Wu, Feng wrote: >>>>> +{ >>>>> + void *xsave_addr; >>>>> + unsigned int pkru = 0; >>>>> + bool_t pkru_ad, pkru_wd; >>>>> + >>>>> + bool_t uf = !!(pfec & PFEC_user_mode); >>>>> + bool_t wf = !!(pfec & PFEC_write_access); >>>>> + bool_t ff = !!(pfec & PFEC_insn_fetch); >>>>> + bool_t rsvdf = !!(pfec & PFEC_reserved_bit); >>>>> + bool_t pkuf = !!(pfec & PFEC_prot_key); >>>> >>>> So I'm just wondering out loud here -- is there actually any >>>> situation >>>> in which we would want guest_walk_tables to act differently than >>>> the >>>> real hardware? >>>> >>>> That is, is there actually any situation where, pku is enabled, >>>> the vcpu >>>> is in long mode, PFEC_write_access and/or PFEC_page_present is >>>> set, and >>>> the pkey is non-zero, that we want guest_walk_tables() to only >>>> check the >>>> write-protect bit for the pte, and not also check the pkru? >>>> >>>> Because if not, it seems like it would be much more robust to >>>> simply >>>> *always* check for pkru_ad if PFEC_page_present is set, and for >>>> pkru_wd >>>> if PFEC_write_access is set. >>> >>> guest_walk_tables() is also used by shadow code, though we don't >>> plan to support pkeys for shadow now, however, in that case, the >>> 'pfec' >>> is generated by hardware, and the pkuf bit may be 0 or 1 depending >>> on the real page fault. If we unconditionally check pkeys in >>> guest_walk_tables(), it is not a good ideas for someone who may >>> want to implement the pkeys for shadow in future, since we only >>> need to check pkeys when the pkuf is set in pfec. >> >> So you'll have to forgive me for being a bit slow here -- I stepped >> away >> from the mm code for a while, and a lot has changed since I agreed to >> become mm maintainer earlier this year. >> >> So here is what I see in the tree; please correct me if I missed >> something important. >> >> The function guest_walk_tables(): >> 1. Accepts a pfec argument which is meant to describe the *access >> type* >> that is happening >> 2. Returns a value with the "wrong" flags (i.e., flags which needed >> to >> be set that were missing, or flags that needed to be clear that were >> set). >> 3. It checks reserved bits in the pagetable regardless of whether >> PFEC_reserved_bit is set in the caller, and returns >> _PAGE_INVALID_BITS >> if so. >> 4. If PFEC_insn_fetch is set, it will only check for nx and smep if >> those features are enabled for the guest. >> >> The main callers are the various gva_to_gfn(), which accept a >> *pointer* >> to a pfec argument. This pointer is actually bidirectional: its >> value >> passed to guest_walk_tables() to determine what access types are >> checked; what is returned is meant to be a pfec value which can be >> passed to a guest as part of a fault (and is by several callers). >> But >> it may also return the Xen-internal flags, PFEC_page_{paged,shared}. >> And, importantly, it modifies the pfec value based on the bits that >> are >> returned from guest_walk_tables(): It will clear PFEC_present if >> guest_walk_tables() returns _PAGE_PRESENT, and it will set >> PFEC_reserved_bit if guest_walk_tables() returns _PAGE_INVALID_BIT. >> >> The next logical level up are the >> hvm_{copy,fetch}_{to,from}_guest_virt(), which also take a pfec >> argument: but really the only purpose of the pfec argument is to >> allow >> the caller to add the PFEC_user_mode flag; the other access-type >> flags >> are set automatically by the functions themselves (e.g., "to" sets >> PFEC_write_access, "fetch" sets PFEC_insn_fetch, &c). Several >> callers >> set PFEC_present as well, but callers set any other bits. >> >> (hvm_fetch_from_guest_virt() seems to only set PFEC_insn_fetch if nx >> or >> smep are enabled in the guest. This seems inconsistent to me with >> the >> treatment of PFEC_reserved_bit: it seems like >> hvm_fetch_from_guest_virt() should always pass in PFEC_insn_fetch, >> particularly as guest_walk_tables() will already gate the checks >> based >> on whether nx or smep is enabled in the guest. Tim, you know of any >> reason for this?) >> >> So there seems to me to be no reason to pass PFEC_prot_key into >> guest_walk_tables() or gva_to_gfn(). The pfec value passed *into* >> those >> should simply indicate the type of memory access being done: present, >> write, instruction fetch, user. >> >> With your current series, guest_walk_tables() already checks for >> pkeys >> being enabled in the guest before checking for them in the >> pagetables. >> For shadow mode, these will be false, and so no checks will be done. >> If >> anyone ever implements pkeys for shadow mode, then these will be >> enabled, and the checks will be done, without any intervention on the >> part of the caller. > > I have understood it, but, the problem with shadow mode is that pfec > may come from regs->error_codeïhardwareï, just like: > rc = sh_walk_guest_tables(v, va, &gw, regs->error_code); > so, when regs->error_code does not have PFEC_prot_key, > guest_walk_tables may still check PKEY when codes is writen according > to what you said, and it maybe return a different result. Under what situation would the *hardware* not generate PFEC_prot_key, but guest_walk_tables() would, under exactly the same conditions, generate PFEC_prot_key? Shouldn't guest_walk_tables() be made to behave identically to the hardware? The only situation where that might legitimately happen is when the shadow pagetables and the guest pagetables diverge; for instance, when a page had been write-protected because Xen thought it was a pagetable. Then the guest pte might have write permission, but the pkey read-only permission; but the shadow pte would not have write permission. In that case, the hardware might give PFEC_write_access but not PFEC_prot_key, I suppose (haven't read the manual to be sure); but in that case, we *want* guest_walk_tables() to detect and add PFEC_prot_key, don't we? -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |