[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH for-4.16] x86/cpuid: do not shrink number of leaves in max policies
On 24/11/2021 18:07, Ian Jackson wrote: > (Hoisting Roger and Jan to the To:) > > Andrew Cooper writes ("Re: [PATCH for-4.16] x86/cpuid: do not shrink number > of leaves in max policies"): >> On 24/11/2021 16:24, Ian Jackson wrote: >>> Questions from my RM hat: >>> >>> Is there a workaround ? >> No. >> >> The safety check being tripped is intended to prevent the VM crashing on >> resume, and is functioning correctly. >> >>> What proportion of machines do we think this might affect ? >> Any pre-xsave machines (~2012 and older), and any newer machines booted >> with no-xsave. >> >> All AMD machines are actually broken by this, except that failure is >> being masked by other changes in 4.16. Future AMD machines will break >> in the same way. > This is quite bad, then, I think. I'm inclined to treat this as a > blocker for the release. I would also classify it as a blocker. > >>> Jan, Andy, do you have an opinion ? >> The reversion doesn't go far enough. >> >> While the shrinking of the max policies manifests as a concrete breakage >> here, there is further breakage caused by shrinking the default >> policies, because it renders some cpuid= settings in VM config files broken. >> >> There is still no feedback or error checking from individual cpuid= >> settings, so this will manifest as the VM admin settings silently no >> longer taking effect. >> >> >> I recommend a full and complete reversion of 540d911c28. The >> justification for it in the first place is especially weak because it is >> explicitly contrary to how real hardware behaves, and this is the 3rd >> ABI breakage it has caused, with more expected in the future based on >> the analysis of what has gone wrong so far. > I would like to collect as many opinions as possible. Do we have > other options besides (a) reverting 540d911c28, or (b) releasing with > this bug ? There is a 3rd option of taking this patch as-is, which is half way between (a) and (b), but anything other than (a) leaves us with known breakages that have no workaround. Shutting the VM down on the old host, copying it's disks and config file manually, then booting it clean would avoid this specific breakage on migrate, but you'd still be subject to the silent breakage from certain cpuid= settings not taking effect. > What bad consequences follow, for users of Xen, from reverting > 540d911c28 ? Nothing. It will take everything back to the same behaviour as 4.15 and older. > Presumably it had some purpose which will be undermined > by reverting it. The commit message speaks of details but doesn't > explain the ultimate impact, at least not to someone like me who only > dimly perceives the underlying technical aspects. 540d911c28 "fixes" an issue which is theoretical at best. Real hardware behaviour does not trim max leaf when certain features are turned off, and will report blocks of trailing zeros. None of the software manuals permit any inference based on max leaf, which is why the 4.15 behaviour has been fine for the lifetime of Xen so far. > I did an experimental git-revert. It seemed to go cleanly. > If we go for the revert, we would need a commit message. It may revert cleanly, but it won't build because of the first hunk in 81da2b544cbb00. That hunk needs reverting too, because it too breaks some cpuid= settings in VM config files. In principle, the *final* thing the toolstack should do, *for brand new VMs only*, is a shrink of that form, but this depends on whole load more toolstack work before it can be done safely. There is a plan to fix CPUID handling, in a safe way, and it is ongoing (subject to all the security interruptions), but has a long way to go yet. ~Andrew
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |