[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 1/4] xen/dmalloc: Introduce dmalloc() APIs
On 14/01/2021 10:14, Jan Beulich wrote: > On 14.01.2021 00:16, Andrew Cooper wrote: >> On 05/01/2021 15:56, Jan Beulich wrote: >>> On 23.12.2020 17:34, Andrew Cooper wrote: >>>> RFC: >>>> * This probably wants to be less fatal in release builds >>> I'm not even convinced this wants to be a panic() in debug builds. >> Any memory leak spotted by this is an XSA, except in the narrow case of >> being due exclusively to a weird and non-default order of hypercalls. >> >> It absolutely needs to be something fatal in debug builds, for avoid >> going unnoticed by testing. > This is a perfectly valid position to take, but I'm afraid once > again isn't the only possible one. Since I do routinely look at > logs, I'm personally in favor of avoiding crashing the host > even for debug builds. The more that I've had pretty bad fallout > from crashes in the past, due to (afaict) bugs in a file system > driver in Linux that persisted over a longer period of time. By that logic, we should replace most assertions with printk()s. The only reason I didn't use assert in this patch is because the backtrace/etc is totally useless (we're in an RCU callback). No-one, not even you, reviews all the logs from OSSTest. It's sometimes hard enough to get anyone to look at the logs even when there are emails saying "$X looks broken". The point at which we can spot the resource leak is too late to leave a zombie domain around (we've already removed the domain from the domlist), and a printk() is not an acceptably robust way of signalling the problem. Any change in the wording will render detection useless, and some testing scenarios (stress in particular) will manage to wrap the console ring in short order which risks hiding all traces of the problem. We're talking about something which will only occur on staging for ill-reviewed patches, along with a (hopefully) reliable test developers can use themselves, *and* in due course, this test being run pre-push when we sort out the Gitlab CI. The bottom line is that making this fatal is the only way we have of getting people to pay attention to the severity of the issue, and I don't think it reasonable to hamper automated testing's ability to spot these issues, simply because you believe you're better than average at reading your log files. ~Andrew
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |