[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Xen-unstable Linux-6.1.0-rc5 BUG: unable to handle page fault for address: ffff8880083374d0
On Mon, Nov 21, 2022 at 12:10 AM Juergen Gross <jgross@xxxxxxxx> wrote: > > On 19.11.22 09:28, Sander Eikelenboom wrote: > > Hi Yu / Juergen, Hi Sander / Juergen, Thanks for the report and the analysis. > > This night I got a dom0 kernel crash on my new Ryzen box running > > Xen-unstable > > and a Linux-6.1.0-rc5 kernel. > > I did enable the new and shiny MGLRU, could this be related ? > > It might be related, but I think it could happen independently from it. Yes, I think it's related. > > Nov 19 06:30:11 serveerstertje kernel: [68959.647371] BUG: unable to handle > > page > > fault for address: ffff8880083374d0 > > Nov 19 06:30:11 serveerstertje kernel: [68959.663555] #PF: supervisor write > > access in kernel mode > > Nov 19 06:30:11 serveerstertje kernel: [68959.677542] #PF: > > error_code(0x0003) - > > permissions violation > > Nov 19 06:30:11 serveerstertje kernel: [68959.691181] PGD 3026067 P4D > > 3026067 > > PUD 3027067 PMD 7fee5067 PTE 8010000008337065 > > Nov 19 06:30:11 serveerstertje kernel: [68959.705084] Oops: 0003 [#1] > > PREEMPT > > SMP NOPTI > > Nov 19 06:30:11 serveerstertje kernel: [68959.718710] CPU: 7 PID: 158 Comm: > > kswapd0 Not tainted 6.1.0-rc5-20221118-doflr-mac80211debug+ #1 > > Nov 19 06:30:11 serveerstertje kernel: [68959.732457] Hardware name: To Be > > Filled By O.E.M. To Be Filled By O.E.M./B450 Pro4 R2.0, BIOS P5.60 > > 10/20/2022 > > Nov 19 06:30:11 serveerstertje kernel: [68959.746391] RIP: > > e030:pmdp_test_and_clear_young+0x25/0x40 > > The kernel tired to reset the "accessed" bit in the pmd entry. Correct. > It does so only since commit eed9a328aa1ae. Before that > pmdp_test_and_clear_young() could be called only for huge pages, which are > disabled in Xen PV guests. Correct. After that commit, we also can clear the accessed bit in non-leaf PMD entries (pointing to PTE tables). > pmdp_test_and_clear_young() does a test_and_clear_bit() of the pmd entry, > which > is failing since the hypervisor is emulating pte entry modifications only (pmd > and pud entries can be set via hypercalls only). > > Could you please test the attached patch whether it fixes the issue for you? There is a runtime kill switch for ARCH_HAS_NONLEAF_PMD_YOUNG, since I wasn't able to verify this capability on all x86 varieties. The following should do it: # cat /sys/kernel/mm/lru_gen/enabled 0x0007 # echo 3 >/sys/kernel/mm/lru_gen/enabled Details are in Documentation/admin-guide/mm/multigen_lru.rst. Alternatively, we could make ARCH_HAS_NONLEAF_PMD_YOUNG a runtime check similar to arch_has_hw_pte_young() on arm64.
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |