[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v3] x86/mm: Short circuit damage from "fishy" ref/typecount failure
On 29/01/2021 16:31, Jan Beulich wrote: > On 29.01.2021 17:17, Andrew Cooper wrote: >> On 29/01/2021 11:29, Jan Beulich wrote: >>> On 25.01.2021 18:59, Andrew Cooper wrote: >>>> On 20/01/2021 08:06, Jan Beulich wrote: >>>>> Also, as far as "impossible" here goes - the constructs all >>>>> anyway exist only to deal with what we consider impossible. >>>>> The question therefore really is of almost exclusively >>>>> theoretical nature, and hence something like a counter >>>>> possibly overflowing imo needs to be accounted for as >>>>> theoretically possible, albeit impossible with today's >>>>> computers and realistic timing assumptions. If a counter >>>>> overflow occurred, it definitely wouldn't be because of a >>>>> bug in Xen, but because of abnormal behavior elsewhere. >>>>> Hence I remain unconvinced it is appropriate to deal with >>>>> the situation by BUG(). >>>> I'm not sure how to be any clearer. >>>> >>>> I am literally not changing the current behaviour. Xen *will* hit a >>>> BUG() if any of these domain_crash() paths are taken. >>>> >>>> If you do not believe me, then please go and actually check what happens >>>> when simulating a ref-acquisition failure. >>> So I've now also played the same game on the ioreq path (see >>> debugging patch below, and again with some non-"//temp" >>> changes actually improving overall behavior in that "impossible" >>> case). No BUG()s hit, no leaks (thanks to the extra changes), >>> no other anomalies observed. >>> >>> Hence I'm afraid it is now really up to you to point out the >>> specific BUG()s (and additional context as necessary) that you >>> either believe could be hit, or that you have observed being hit. >> The refcounting logic was taken verbatim from ioreq, with the only >> difference being an order greater than 0. The logic is also identical >> to the vlapic logic. >> >> And the reason *why* it bugs is obvious - the cleanup logic >> unconditionally put()'s refs it never took to begin with, and hits >> underflow bugchecks. > In current staging, neither vmx_alloc_vlapic_mapping() nor > hvm_alloc_ioreq_mfn() put any refs they couldn't get. Hence > my failed attempt to repro your claimed misbehavior. I think I've figured out what is going on. They *look* as if they do, but the logic is deceptive. We skip both puts in free_*() if the typeref failed, and rely on the fact that the frame(s) are *also* on the domheap list for relinquish_resources() to put the acquire ref. Yet another bizzare recounting rule/behaviour which isn't written down. My bug really was setting v->vmtrace.buf too early. Furthermore, this pattern is not safe for use in the domain_create() path, because it currently depends on the domain being put on the domain list and relinquish_resources() being called to avoid leaking memory. I still argue that a typeref failure in these circumstances is the same kind of impossibility that we use BUG for elsewhere, and therefore that is what we should be doing. ~Andrew
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |