[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 9/9] libxl: Kill QEMU with "reaper" ruid
George Dunlap writes ("Re: [PATCH 9/9] libxl: Kill QEMU with "reaper" ruid"): > On Nov 28, 2018, at 5:02 PM, Ian Jackson <ian.jackson@xxxxxxxxxx> wrote: > > Also I think you need to handle errors properly ? Ie set and check > > errno. > > Don’t I want to pass up the errno values set by the getpwnam functions? By `set' I meant zero beforehand. See the manpage for getpw*. > Although I suppose I should also make sure that the uid I return is not > zero... Yes. > > I'm also not convinced that it is sensible to handle the case where we > > have multiple per-domain uids but no reaper uid. This turns host > > configuration errors into systems that are less secure than they > > should be in a really obscure way. > > At this point, we have a target_uid but no way of getting a lock for > reaper_uid. We have three options: > > 1. Don’t kill(-1) at all. > 2. Try to kill(-1) with setresuid(target_uid, target_uid, 0) > 3. kill(-1) with setresuid(reaper_uid, target_uid, 0) without holding the > lock. > > #1 means that a rogue qemu will not be destroyed. > > #2 means that there’s a race, whereby sometimes the rogue qemu is destroyed, > and sometimes the reaper process is destroyed by the rogue qemu first. > > #3 means there’s a race, whereby sometimes everything works fine, sometimes > both the rogue qemu and *the reaper process from another domain* is > destroyed, and sometimes this reaper process is killed by the reaper process > from another domain (leaving the rogue qemu alive). > > I think [#2] is obviously the best option. Yes, but it still needs to count as an error. > > CODING_STYLE, no call inside the condition please. > > I c&p’d this from libxl_internal.c:libxl__lock_domain_userdata(). > I take it you’re referring to the “ERROR HANDLING” section of CODING_STYLE? > It wasn’t obvious to me that refers to while() loops. > I take it you’d prefer "while(true) { rc = flock(); if(!rc) break; …}” > instead? Except that the return value from flock() is `r' according to CODING_STYLE. (And I'm not sure `true' is in scope and anyway personally I prefer `for (;;)' but that's a matter of taste so whatever.) > And if I may make a minor suggestion: This is the second time in this series > you’ve said “don’t do X” for fairly common code idioms without giving me > guidance as to what you’d like to see instead. I'm sorry that `no call inside the condition please' was not clear enough. I mean that the flock call should be outside any condition, because it is a call that might fail, and that consequently the loop termination will have to be done with an early loop exit using `break' rather than a condition. (Since saving the return value so that it could be used in a while() would be perverse.) > > * But crucially in such situations (i) overall destroy ao should > > return a failure error code (ii) the domain itself should not be > > destroyed in Xen. This means that `xl destroy' fails, and can be > > repeated after the problem is corrected. > > This means that in such a situation, we might successfully kill the > devicemodel, but leave a zombie domain lying around. Yes, precisely. > I suppose that might be the least-bad option, as 1) would be more > likely to alert the administrator to fix the configuration error, > and 2) the domain would hold the domid, such that any unkilled qemu > processes wouldn’t be able to cause mischief on other domains. Precisely. In more formal terms, they prevent the system getting into a completely uncontrolled and forbidden state, namely a state where there is possibly a qemu and maybe other stuff hanging about, which is not visible in xl and may cause future mischief. Since we cannot destroy everything at once, our system model must include a `half destroyed' state. In that state the domain must still show up in `xl'. We have chosen to `hang everything off' the Xen list of active domains, which means that we mustn't remove a domain from Xen until we have *successfully* destroyed all its other stuff. > Probably some of these principles should be in a comment somewhere. Yes. I think they apply to all of domain destruction in libxl. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |