[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH for-4.16] Revert "x86/CPUID: shrink max_{,sub}leaf fields according to actual leaf contents"
On 25.11.2021 11:15, Roger Pau Monné wrote: > On Thu, Nov 25, 2021 at 10:52:12AM +0100, Jan Beulich wrote: >> On 25.11.2021 10:24, Roger Pau Monné wrote: >>> On Thu, Nov 25, 2021 at 09:57:31AM +0100, Jan Beulich wrote: >>>> On 24.11.2021 22:11, Andrew Cooper wrote: >>>>> OSSTest has identified a 3rd regression caused by this change. Migration >>>>> between Xen 4.15 and 4.16 on the nocera pair of machines (AMD Opteron >>>>> 4133) >>>>> fails with: >>>>> >>>>> xc: error: Failed to set CPUID policy: leaf 00000000, subleaf ffffffff, >>>>> msr ffffffff (22 = Invalid argument): Internal error >>>>> xc: error: Restore failed (22 = Invalid argument): Internal error >>>>> >>>>> which is a safety check to prevent resuming the guest when the CPUID data >>>>> has >>>>> been truncated. The problem is caused by shrinking of the max policies, >>>>> which >>>>> is an ABI that needs handling compatibly between different versions of >>>>> Xen. >>>> >>>> Would you mind pointing me to the flight which has hit this problem? I >>>> don't think I've seen any respective failure. I also don't think I've >>>> seen any respective discussion on irc, so I really wonder where all >>>> this is coming from all of the sudden. It's not like the change in >>>> question would have gone in just yesterday. >>> >>> It's from a pair of newish boxes that Credativ and Ian where >>> attempting to commission yesterday. Since the boxes are not yet in >>> production Ian wasn't sure if the issue could be on the testing or >>> hardware side, so emailed the details in private for us to provide >>> some feedback on the issue. The error is at: >>> >>> http://logs.test-lab.xenproject.org/osstest/logs/166296/test-amd64-amd64-migrupgrade/info.html >> >> I see. Quite lucky timing then. > > Indeed, it was pure luck that we got this just yesterday. > >> >>>>> Furthermore, shrinking of the default policies also breaks things in some >>>>> cases, because certain cpuid= settings in a VM config file which used to >>>>> have >>>>> an effect will now be silently discarded. >>>> >>>> I'm afraid I don't see what you're talking about here. Could you give >>>> an example? Is this about features the flags of which live in the >>>> higher leaves, which would have got stripped from the default policies? >>>> Which would mean the stripping really should happen on the max policies >>>> only (where it may not have much of an effect). >>> >>> I think there are two separate issues, which I tried to clarify in my >>> reply to this same patch. >>> >>> Options set using cpuid= with xl could now be rejected when in >>> previous versions they were accepted just fine. That's because the >>> shrinking to the policies can now cause find_leaf calls in >>> xc_cpuid_xend_policy to fail, and thus the whole operation would >>> fail. >> >> Okay, this could be addressed by merely dropping the calls from >> calculate_{pv,hvm}_def_policy(). Thinking about it, I can surely >> agree they shouldn't have been put there in the first place. >> Which would be quite the opposite of your initial proposal, where >> you did drop them from calculate_{pv,hvm}_max_policy(). A guest >> migrating in with a larger max leaf value should merely have that >> max leaf value retained, but that ought to be possible without >> dropping the shrinking from calculate_{pv,hvm}_max_policy(). > > I won't argue it's not possible to do that without dropping the shrink > from calculate_{pv,hvm}_max_policy(), but given the point we are on > the release we should consider the safest option, and IMO that's the > revert of the shrinking from there in order to restore the previous > behavior and have working migrations from 4.15 -> 4.16. > > We can discuss other likely better approaches to solve this issue > after the release. Sure; as said earlier on I merely would like to understand things sufficiently well before giving my ack. Jan >> Even >> leaving aside migration, I guess an explicit request for a large >> max leaf value should be honored; those possibly many trailing >> leaves then would simply all be blank. >> >>> There's another likely error that only affects callers of >>> xc_cpuid_apply_policy that pass a featureset (so not the upstream >>> toolstack), where some leaves of the featureset could now be ignored >>> by the guest if the max leaves value doesn't cover them anymore. Note >>> this was already an issue even before 540d911c2813, as applying the >>> featureset doesn't check that the set feature leaves are below the max >>> leaf. >> >> If this was an issue before the commit to be reverted, I take it >> the revert isn't going to help it? > > I think the commit makes it more likely to hit the above scenario by > shrinking max leaves. > >> In which case this information >> is interesting, but not applicable as justification for the >> revert? > > As said above, while the commit at hand is not introducing the issue > with the featuresets, it makes it more likely by shrinking the max > leaves, and IMO it's a regression from behavior in 4.15. > > Ie: options set on the featureset on 4.15 would be exposed, while the > same options could be hidden in 4.16 because of the shrinking to the > default domain policies, if the user happens to set an option that's > on an empty trailing featureset with a tail of zeroed leaves. > > Thanks, Roger. >
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |