[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 5/5] x86/cpuid: Fix handling of xsave dynamic leaves
On 05.05.2021 18:59, Andrew Cooper wrote: > On 05/05/2021 09:33, Jan Beulich wrote: >> On 04.05.2021 16:17, Andrew Cooper wrote: >>> On 04/05/2021 13:56, Jan Beulich wrote: >>>> On 03.05.2021 17:39, Andrew Cooper wrote: >>>>> +unsigned int xstate_compressed_size(uint64_t xstates) >>>>> +{ >>>>> + unsigned int i, size = XSTATE_AREA_MIN_SIZE; >>>>> + >>>>> + xstates &= ~XSTATE_FP_SSE; >>>>> + for_each_set_bit ( i, &xstates, 63 ) >>>>> + { >>>>> + if ( test_bit(i, &xstate_align) ) >>>>> + size = ROUNDUP(size, 64); >>>>> + >>>>> + size += xstate_sizes[i]; >>>>> + } >>>>> + >>>>> + /* In debug builds, cross-check our calculation with hardware. */ >>>>> + if ( IS_ENABLED(CONFIG_DEBUG) ) >>>>> + { >>>>> + unsigned int hwsize; >>>>> + >>>>> + xstates |= XSTATE_FP_SSE; >>>>> + hwsize = hw_compressed_size(xstates); >>>>> + >>>>> + if ( size != hwsize ) >>>>> + printk_once(XENLOG_ERR "%s(%#"PRIx64") size %#x != hwsize >>>>> %#x\n", >>>>> + __func__, xstates, size, hwsize); >>>>> + size = hwsize; >>>> To be honest, already on the earlier patch I was wondering whether >>>> it does any good to override size here: That'll lead to different >>>> behavior on debug vs release builds. If the log message is not >>>> paid attention to, we'd then end up with longer term breakage. >>> Well - our options are pass hardware size, or BUG(), because getting >>> this wrong will cause memory corruption. >> I'm afraid I'm lost: Neither passing hardware size nor BUG() would >> happen in a release build, so getting this wrong does mean memory >> corruption there. And I'm of the clear opinion that debug builds >> shouldn't differ in behavior in such regards. > > The point of not cross-checking with hardware in release builds is to > remove a bunch of very expensive operations from path which is hit > several times on every fork(), so isn't exactly rare. > > But yes - the consequence of being wrong, for whatever reason, is memory > corruption (especially as the obvious way it goes wrong is having an > xsave_size[] of 0, so we end up under-reporting). > > So one way or another, we need to ensure that every newly exposed option > is tested in a debug build first. The integration is either complete, > or it isn't, so I don't think this terribly onerous to do. > > > As for actually having a behaviour difference between debug and release, > I'm not concerned. > > Hitting this failure means "there is definitely a serious error in Xen, > needing fixing", but it will only be encountered the during development > of a new feature, so is for all practical concerns, limited to > development of the new feature in question. > > BUG() gets the point across very obviously, but is unhelpful when in > practice the test system wont explode because the dom0 kernel won't be > using this new state yet. > >> If there wasn't an increasing number of possible combinations I >> would be inclined to suggest that in all builds we check during >> boot that calculation and hardware provided values match for all >> possible (valid) combinations. > > I was actually considering an XTF test on the matter, which would be a > cleaning up of the one generating the example in the cover letter. > > As above, logic is only liable to be wrong during development of support > for a new state component, which is why it is reasonable to take away > the overhead in release builds. Well, okay then - let's hope all bugs here are obviously exposed during initial development, and no corner cases get missed. >>> The BUG() option is a total pain when developing new support - the first >>> version of this patch did use BUG(), but after hitting it 4 times in a >>> row (caused by issues with constants elsewhere), I decided against it. >> I can fully understand this aspect. I'm not opposed to printk_once() >> at all. My comment was purely towards the override. >> >>> If we had something which was a mix between WARN_ONCE() and a proper >>> printk() explaining what was going on, then I would have used that. >>> Maybe it's time to introduce one... >> I don't think there's a need here - what you have in terms of info >> put into the log is imo sufficient. > > Well - it needs to be sufficiently obvious to people who aren't you and > me. There are also other areas in Xen which would benefit from changing > their diagnostics to be as described. I generally disagree: A log message of this kind needs to be detailed enough to easily find where it originates in source. Then the source there should have enough information. Things are different when a log message implies an admin may be able to take some corrective action without actually changing source code in any way. That's not the case here afaict. Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |