[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: XSA-446 relevance on Intel
On Tue, Dec 12, 2023 at 10:56:48AM +0000, Andrew Cooper wrote: > On 12/12/2023 9:43 am, James Dingwall wrote: > > Hi, > > > > We were experiencing a crash during PV domU boot on several different models > > of hardware but all with Intel CPUs. The Xen version was based on > > stable-4.15 > > at 4a4daf6bddbe8a741329df5cc8768f7dec664aed (XSA-444) with some local > > patches. Since updating the branch to > > b918c4cdc7ab2c1c9e9a9b54fa9d9c595913e028 > > (XSA-446) we have not observed the same crash. > > That range covers: > > 1f5f515da0f6 - iommu/amd-vi: use correct level for quarantine domain > page tables > b918c4cdc7ab - x86/spec-ctrl: Remove conditional IRQs-on-ness for INT > $0x80/0x82 paths > > so yeah - not much in the way of change. > > > The occurrence was on 1-2% of boots and we couldn't determine a particular > > sequence of events that would trigger it. The kernel is based on Ubuntu's > > 5.15.0-91 tag but we also observed the same with -85. Due to the low > > frequency it is possible that we simply haven't observed it again since > > updating our Xen build. > > > > If I have followed the early startup this is happening shortly after > > detection > > of possible CPU vulnerabilities and patching in alternative instructions. > > As > > the RIP was native_irq_return_iret and XSA-446 related to interupt > > management > > I wondered if it was possible that despite "Xen is not believed to be > > vulnerable > > in default configurations on CPUs from other hardware vendors." there could > > be some conditions in which an Intel CPU is affected? > > In short, XSA-446 isn't plausibly related. It's completely internal to > Xen, with no alteration on guest state. > > It is an error that Linux has ended up in native_irq_return_iret. Linux > cannot return to itself with an IRET instruction, and must use > HYPERCALL_iret instead. > > In recent versions of Linux, this is fixed up as about the earliest > action a PV kernel takes, but on older versions of Linux, any > interrupt/exception early enough on boot was fatal in this way. > > > This part of the backtrace is odd: > > [ 0.398962] ? native_iret+0x7/0x7 > [ 0.398967] ? insn_decode+0x79/0x100 > [ 0.398975] ? insn_decode+0xcf/0x100 > [ 0.398980] optimize_nops+0x68/0x150 > > as it's not clear how we've ended up in a case wanting to return back to > the kernel to begin with. However, it's most likely a pagefault, as > optimize_nops() is making changes in arbitrary locations. > > It is possible that a change in visible features has altered the > behaviour enough not to crash, but if everything is still the same as > far as you can tell, then it's likely just chance that you haven't seen > it again. > > This is definitely a Linux bug, so I suspect something bad has been > backported into Ubuntu. > > ~Andrew Thanks for the response. I had a look at the more recent kernels and managed to backport "x86/entry,xen: Early rewrite of restore_regs_and_return_to_kernel()" without too much trouble. It may still be a coincidence that we haven't encountered the problem but it seems to have gone away for now. Regards, James
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |