[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Xen 4.18 release: Reminder about code freeze
Hi George, On 13/10/2023 12:32, George Dunlap wrote: On Fri, Oct 13, 2023 at 12:22 PM Julien Grall <julien@xxxxxxx> wrote:On 13/10/2023 11:16, George Dunlap wrote:On Thu, Oct 12, 2023 at 11:36 PM Stefano Stabellini <sstabellini@xxxxxxxxxx> wrote:On Thu, 12 Oct 2023, George Dunlap wrote:Stop tinkering in the hope that it hides the problem. You're only making it harder to fix properly.Making it harder to fix properly would be a valid reason not to commit the (maybe partial) fix. But looking at the fix again: diff --git a/tools/xenstored/domain.c b/tools/xenstored/domain.c index a6cd199fdc..9cd6678015 100644 --- a/tools/xenstored/domain.c +++ b/tools/xenstored/domain.c @@ -989,6 +989,7 @@ static struct domain *introduce_domain(const void *ctx, talloc_steal(domain->conn, domain); if (!restore) { + domain_conn_reset(domain); /* Notify the domain that xenstore is available */ interface->connection = XENSTORE_CONNECTED; xenevtchn_notify(xce_handle, domain->port); @@ -1031,8 +1032,6 @@ int do_introduce(const void *ctx, struct connection *conn, if (!domain) return errno; - domain_conn_reset(domain); - send_ack(conn, XS_INTRODUCE); It is a 1-line movement. Textually small. Easy to understand and to revert. It doesn't seem to be making things harder to fix? We could revert it any time if a better fix is offered. Maybe we could have a XXX note in the commit message or in-code comment?It moves a line from one function (do_domain_introduce()) into a completely different function (introduce_domain()), nested inside two if() statements; with no analysis on how the change will impact things.I am not the original author of the patch, and I am not the maintainer of the code, so I don't feel I have the qualifications to give you the answers you are seeking. Julien as author of the patch and xenstore reviewer might be in a better position to answer. Or Juergen as xenstore maintainer.I understand that; my main point is that the change is more complex than you're characterizing it. This is information necessary to understand whether the patch is correct, but it's not in the patch description, nor in the subsequent thread back in May.Are there any paths through do_domain_introduce() that now *won't* get a domain_conn_reset() call? Is that OK?Yes, the already-introduced and the restore code paths. The operations in the already-introduced or the restore code paths seem simple enough not to require a domain_conn_reset. Julien and Juergen should confirm.There is no "restore" codepath through do_domain_introduce(); it passes "false" for the "restore" argument. So we only have two paths to consider through do_domain_introduce(): The "not introduced and not restoring" path, and the "already-introduced" path. I'm not sure what the "simple" elements on the branch in introduce_domain() have to do with whether the content of the page needs to be cleaned up. As I said, I don't 100% understand this code, but it seems like if anything, the reset would be *more* important to have in the "reintroduce" case than in the "initial introduction" case, since I'd expect the "initial introduction" case to be empty already.Indeed, there should be no watches/transactions/buffered I/O for the initial introduction. However, the function is also clear part of the interface because we can't guaranteed it was zeroed. The latter matter for the initial introduction. I believe the rest is just called for simplicity.Doesn't it seem weird to you that we set a connection to CONNECTED, notify the domain that it is ready to go, and only *after* that we reset the connection to zero? What happens if a domain starts using the connection as soon as it receives the event channel notification and before domain_conn_reset is called?Yes, it does seem weird, which is why I said the following. :-)I mean, it certainly seems strange to set the state to CONNECTED, send off an event channel, and then after that delete all watches / transactions / buffered data and so on;But just because the current code is probably wrong, doesn't mean that the modified code is probably correct. If the problem is the delay between the xenevtchn_notify() in introduce_domain() and the domain_conn_reset() afterwards in do_domain(), would it make sense instead to move the notification into do_introduce(), after the domain_conn_reset()? It is, after all, in response to XS_INTRODUCE that we want to send the notification, not in dom0_init() or read_state_connection() (which seems to be more about restoring a domain).I understand that the event channel notification was specifically added for dom0less. But I don't see why we don't want to send it to dom0 as well. Technically, dom0 has exactly the same problem as dom0less domains it boots before Xenstored is running and therefore it may need to know when it is ready to receive commands.It seemed to work fine before 2022, when the notification wasintroduced. Indeed. But my point is that in theory the behavior between dom0 and dom0less domain should not be different. With your proposal we would continue to diverge with seems rather strange. Do you at least agree that on the paper, Xenstored should notify it is ready to accept commands the same way for every domain? How was that coordination done? I don't know but Juergen/Stefano might know. Will dom0 honor the "XENSTORE_RECONNECT" state during bring-up as described in xenstore-ring.txt? Same here. Personally I'd just take the patch as I wrote it, restoring dom0's pre-2022 behavior (after review and testing of course); but if you wanted to test & resubmit with a similar notification inside dom0_init(), I wouldn't object to it. Just to clarify, you suggest the revert because you are concerned that it could break dom0. Is that correct? Naturally it would be ideal if we could avoid the code duplication; but that's not possible without a lot more refactoring, which I don't think we should be doing at this stage in the release. You can by moving the code just at the end of introduce_domain(). Cheers, -- Julien Grall
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |