[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] xen: only clobber multicall elements without error
On 26/11/2018 15:58, Jan Beulich wrote: >>>> On 26.11.18 at 15:23, <jgross@xxxxxxxx> wrote: >> On 26/11/2018 15:01, Jan Beulich wrote: >>>>>> On 26.11.18 at 14:52, <jgross@xxxxxxxx> wrote: >>>> I don't think the hypervisor should explicitly try to make it as hard as >>>> possible for the guest to find problems in the code. >>> >>> That's indeed not the hypervisor's goal. Instead it tries to make >>> it as hard as possible for the guest (developer) to make wrong >>> assumptions. >> >> Let's look at the current example why I wrote this patch: >> >> The Linux kernel's use of multicalls should never trigger any single >> call to return an error (return value < 0). A kernel compiled for >> productive use will catch such errors, but has no knowledge which >> single call has failed, as it doesn't keep track of the single entries >> (non-productive kernels have an option available in the respective >> source to copy the entries before doing the multicall in order to have >> some diagnostic data available in case of such an error). Catching an >> error from a multicall right now means a WARN() with a stack backtrace >> (for the multicall itself, not for the entry causing the error). >> >> I have a customer report for a case where such a backtrace was produced >> and a kernel crash some seconds later, obviously due to illegally >> unmapped memory pages resulting from the failed multicall. Unfortunately >> there are multiple possibilities what might have gone wrong and I don't >> know which one was the culprit. The problem can't be a very common one, >> because there is only one such report right now, which might depend on >> a special driver. >> >> Finding this bug without a known reproducer and the current amount of >> diagnostic data is next to impossible. So I'd like to have more data >> available without having to hurt performance for the 99.999999% of the >> cases where nothing bad happens. >> >> In case you have an idea how to solve this problem in another way I'd be >> happy to follow that route. I'd really like to be able to have a better >> clue in case such an error occurs in future. > > Since you have a production kernel, I assume you also have a > production hypervisor. This hypervisor doesn't clobber the > arguments if I'm not mistaken. Therefore > - in the debugging scenario you (can) have all data available by > virtue of the information getting copied in the kernel, > - in the release scenario you have all data available since it's > left un-clobbered. > Am I missing anything (I don't view mixed debug/release setups > of kernel and hypervisor as overly important here)? No, you are missing nothing here. OTOH a debug hypervisor destroying debug data is kind of weird, so I posted this patch. I'll add the related Linux kernel patch (in case it is acked by Boris) with or without this hypervisor patch, but I thought it would be better to have the hypervisor patch in place, especially as e.g. a hypervisor from xen-unstable might have a bug which could be easier to diagnose with this patch in place. Juergen _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |