[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v1 03/10] KVM: Prepare kvm_is_reserved_pfn() for PG_reserved changes
I think I know what's going wrong: Pages that are pinned via gfn_to_pfn() and friends take a references, however are often released via kvm_release_pfn_clean()/kvm_release_pfn_dirty()/kvm_release_page_clean()... E.g., in arch/x86/kvm/x86.c:reexecute_instruction() ... pfn = gfn_to_pfn(vcpu->kvm, gpa_to_gfn(gpa)); ... kvm_release_pfn_clean(pfn); void kvm_release_pfn_clean(kvm_pfn_t pfn) { if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn)) put_page(pfn_to_page(pfn)); } This function makes perfect sense as the counterpart for kvm_get_pfn(): void kvm_get_pfn(kvm_pfn_t pfn) { if (!kvm_is_reserved_pfn(pfn)) get_page(pfn_to_page(pfn)); } As all ZONE_DEVICE pages are currently reserved, pages pinned via gfn_to_pfn() and friends will often not see a put_page() AFAIKS.Assuming gup() takes a reference for ZONE_DEVICE pages, yes, this is a KVM bug. Yes, it does take a reference AFAIKs. E.g., mm/gup.c:gup_pte_range(): ... if (pte_devmap(pte)) { if (unlikely(flags & FOLL_LONGTERM)) goto pte_unmap; pgmap = get_dev_pagemap(pte_pfn(pte), pgmap); if (unlikely(!pgmap)) { undo_dev_pagemap(nr, nr_start, pages); goto pte_unmap; } } else if (pte_special(pte)) goto pte_unmap; VM_BUG_ON(!pfn_valid(pte_pfn(pte))); page = pte_page(pte); head = try_get_compound_head(page, 1); try_get_compound_head() will increment the reference count. Now, my patch does not change that, the result of kvm_is_reserved_pfn(pfn) will be unchanged. A proper fix for that would probably be a) To drop the reference to ZONE_DEVICE pages in gfn_to_pfn() and friends, after you successfully pinned the pages. (not sure if that's the right thing to do but you're the expert) b) To not use kvm_release_pfn_clean() and friends on pages that were definitely pinned.This is already KVM's intent, i.e. the purpose of the PageReserved() check is simply to avoid putting a non-existent reference. The problem is that KVM assumes pages with PG_reserved set are never pinned, which AFAICT was true when the code was first added.(talking to myself, sorry) Thinking again, dropping this patch from this series could effectively also fix that issue. E.g., kvm_release_pfn_clean() and friends would always do a put_page() if "pfn_valid() and !PageReserved()", so after patch 9 also on ZONDE_DEVICE pages.Yeah, this appears to be the correct fix.But it would have side effects that might not be desired. E.g.,: 1. kvm_pfn_to_page() would also return ZONE_DEVICE pages (might even be the right thing to do).This should be ok, at least on x86. There are only three users of kvm_pfn_to_page(). Two of those are on allocations that are controlled by KVM and are guaranteed to be vanilla MAP_ANONYMOUS. The third is on guest memory when running a nested guest, and in that case supporting ZONE_DEVICE memory is desirable, i.e. KVM should play nice with a guest that is backed by ZONE_DEVICE memory.2. kvm_set_pfn_dirty() would also set ZONE_DEVICE pages dirty (might be okay)This is ok from a KVM perspective. What about void kvm_get_pfn(kvm_pfn_t pfn) { if (!kvm_is_reserved_pfn(pfn)) get_page(pfn_to_page(pfn)); } Is a pure get_page() sufficient in case of ZONE_DEVICE?(asking because of the live references obtained via get_dev_pagemap(pte_pfn(pte), pgmap) in mm/gup.c:gup_pte_range() somewhat confuse me :) ) The scarier code (for me) is transparent_hugepage_adjust() and kvm_mmu_zap_collapsible_spte(), as I don't at all understand the interaction between THP and _PAGE_DEVMAP. The x86 KVM MMU code is one of the ugliest code I know (sorry, but it had to be said :/ ). Luckily, this should be independent of the PG_reserved thingy AFAIKs. -- Thanks, David / dhildenb _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |