[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Xen-unstable Linux-6.1.0-rc5 BUG: unable to handle page fault for address: ffff8880083374d0
On 21.11.22 09:18, Yu Zhao wrote: On Mon, Nov 21, 2022 at 12:10 AM Juergen Gross <jgross@xxxxxxxx> wrote:On 19.11.22 09:28, Sander Eikelenboom wrote:Hi Yu / Juergen,Hi Sander / Juergen, Thanks for the report and the analysis.This night I got a dom0 kernel crash on my new Ryzen box running Xen-unstable and a Linux-6.1.0-rc5 kernel. I did enable the new and shiny MGLRU, could this be related ?It might be related, but I think it could happen independently from it.Yes, I think it's related.Nov 19 06:30:11 serveerstertje kernel: [68959.647371] BUG: unable to handle page fault for address: ffff8880083374d0 Nov 19 06:30:11 serveerstertje kernel: [68959.663555] #PF: supervisor write access in kernel mode Nov 19 06:30:11 serveerstertje kernel: [68959.677542] #PF: error_code(0x0003) - permissions violation Nov 19 06:30:11 serveerstertje kernel: [68959.691181] PGD 3026067 P4D 3026067 PUD 3027067 PMD 7fee5067 PTE 8010000008337065 Nov 19 06:30:11 serveerstertje kernel: [68959.705084] Oops: 0003 [#1] PREEMPT SMP NOPTI Nov 19 06:30:11 serveerstertje kernel: [68959.718710] CPU: 7 PID: 158 Comm: kswapd0 Not tainted 6.1.0-rc5-20221118-doflr-mac80211debug+ #1 Nov 19 06:30:11 serveerstertje kernel: [68959.732457] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B450 Pro4 R2.0, BIOS P5.60 10/20/2022 Nov 19 06:30:11 serveerstertje kernel: [68959.746391] RIP: e030:pmdp_test_and_clear_young+0x25/0x40The kernel tired to reset the "accessed" bit in the pmd entry.Correct.It does so only since commit eed9a328aa1ae. Before that pmdp_test_and_clear_young() could be called only for huge pages, which are disabled in Xen PV guests.Correct. After that commit, we also can clear the accessed bit in non-leaf PMD entries (pointing to PTE tables).pmdp_test_and_clear_young() does a test_and_clear_bit() of the pmd entry, which is failing since the hypervisor is emulating pte entry modifications only (pmd and pud entries can be set via hypercalls only). Could you please test the attached patch whether it fixes the issue for you?There is a runtime kill switch for ARCH_HAS_NONLEAF_PMD_YOUNG, since I wasn't able to verify this capability on all x86 varieties. The following should do it: # cat /sys/kernel/mm/lru_gen/enabled 0x0007 # echo 3 >/sys/kernel/mm/lru_gen/enabled Details are in Documentation/admin-guide/mm/multigen_lru.rst. Alternatively, we could make ARCH_HAS_NONLEAF_PMD_YOUNG a runtime check similar to arch_has_hw_pte_young() on arm64. I like this idea. The patch should be rather trivial. Let me have a try ... Juergen Attachment:
OpenPGP_0xB0DE9DD628BF132F.asc Attachment:
OpenPGP_signature
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |