[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: dom0less vs xenstored setup race Was: xen | Failed pipeline for staging | 6a47ba2f
On 03/05/2023 4:22 pm, Juergen Gross wrote: > On 03.05.23 17:15, Julien Grall wrote: >> Hi, >> >> On 03/05/2023 15:38, andrew.cooper3@xxxxxxxxxx wrote: >>> Hello, >>> >>> After what seems like an unreasonable amount of debugging, we've >>> tracked >>> down exactly what is going wrong here. >>> >>> https://gitlab.com/xen-project/people/andyhhp/xen/-/jobs/4219721944 >>> >>> Of note is the smoke.serial log around: >>> >>> io: IN 0xffff90fec250 d0 20230503 14:20:42 INTRODUCE (1 233473 1 ) >>> obj: CREATE connection 0xffff90fff1f0 >>> *** d1 CONN RESET req_cons 00000000, req_prod 0000003a rsp_cons >>> 00000000, rsp_prod 00000000 >>> io: OUT 0xffff9105cef0 d0 20230503 14:20:42 WATCH_EVENT >>> (@introduceDomain domlist ) >>> >>> XS_INTRODUCE (in C xenstored at least, not checked O yet) always >>> clobbers the ring pointers. The added pressure on dom0 that the >>> xensconsoled adds with it's 4M hypercall bounce buffer occasionally >>> defers xenstored long enough that the XS_INTRODUCE clobbers the first >>> message that dom1 wrote into the ring. >>> >>> The other behaviour seen was xenstored observing a header looking >>> like this: >>> >>> *** d1 HDR { ty 0x746e6f63, rqid 0x2f6c6f72, txid 0x74616c70, len >>> 0x6d726f66 } >>> >>> which was rejected as being too long. That's "control/platform" in >>> ASCII, so the XS_INTRODUCE intersected dom1 between writing the header >>> and writing the payload. >>> >>> >>> Anyway, it is buggy for XS_INTRODUCE to be called on a live an >>> unsuspecting connection. It is ultimately init-dom0less's fault for >>> telling dom1 it's good to go before having waited for XS_INTRODUCE to >>> complete. >> >> So the problem is xenstored will set interface->connection to >> XENSTORE_CONNECTED before finalizing the connection. Caqn you try the >> following, for now, very hackish patch: >> >> diff --git a/tools/xenstore/xenstored_domain.c >> b/tools/xenstore/xenstored_domain.c >> index f62be2245c42..bbf85bbbea3b 100644 >> --- a/tools/xenstore/xenstored_domain.c >> +++ b/tools/xenstore/xenstored_domain.c >> @@ -688,6 +688,7 @@ static struct domain *introduce_domain(const void >> *ctx, >> talloc_steal(domain->conn, domain); >> >> if (!restore) { >> + domain_conn_reset(domain); >> /* Notify the domain that xenstore is >> available */ >> interface->connection = XENSTORE_CONNECTED; > > I think there are barriers missing (especially in order to work on Arm)? Yes there are. I think x86 skates by on side effects of hypercalls. > > And I think you will break dom0 with calling domain_conn_reset(), as the > kernel might already have written data into the xenbus page. So you might > want to make the call depend on !is_master_domain. And this is why I am very deliberately not doing anything until the documentation is matches reality, and is safe to use. For starters, shuffling this doesn't make any difference for a domU which hasn't been taught about this optional extension. Ignoring such cases is not an acceptable fix. ~Andrew
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |