[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: dom0less vs xenstored setup race Was: xen | Failed pipeline for staging | 6a47ba2f

  • To: Juergen Gross <jgross@xxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, alejandro.vallejo@xxxxxxxxx
  • From: andrew.cooper3@xxxxxxxxxx
  • Date: Wed, 3 May 2023 16:33:58 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ekEcTjd1EIWBQhUD19Z4YcSet7yF/pD6/o03EZiH10w=; b=OzPsNNZs+ZolpP5D/Wzqt5TUvJGSCns3IBQabZDuudoh0fNrwTF5XW1PmJfWeUJ27IFbHKs9yLN4Sy5qYJ0bvSMA79oKuaGqUOlZdCZj9cOdy4h1X7b1cpavU2ByMoBCirGtORg+DkIRsx7WkCdndoLhYHLdiNOUUQG9h7oTpG+Orc/6amzrAp34EPoWomejM2dURXNSmmrc7ZcOPj+Ov4s0TTERCtF1n/4cK1rYsZ2oSfQsCHyxgfTLYtP7SQZVhMwjYcf+d3I2ZSMBknDGl2I5q7urD+vLtPSD/ODec+deSaanHK//RQqjb9+iJ/PI62i5eDp//jNIcp50+d+kEw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=fXDNieHtHUKGnvmk4b3Zxjo2/fB0mBgzGr/wL3cmMwQTO4fFf9+mrZWKFMHPlWdmYt9FTYcbcZIWMvQ/PCIAQnkxRMrLuFr7GRuzDjY1jbCywjrI7SNcdvDtpB+dP9K67zUe2eeZcZQ+ddfGUaH3I583HzKxUgJVSph/942jwG0SnkQNngGnxO1kvpLtws8m/bqqR1InzZiVI0uWyEiUL1gcnaI1y7+mHtAf3tSqRjsI4ttTCdwn27/id9MLL+ONaqdclBxV/ZViKJ8ODUvhbVXJv13MOCDPM2yZ3kZwaNfZ9z8J1s46YTIsx3tNfXNZmPslZk1kpWsA/TPP2+DZYg==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: committers@xxxxxxxxxxxxxx, michal.orzel@xxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxxx, Julien Grall <jgrall@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Edwin Török <edwin.torok@xxxxxxxxx>
  • Delivery-date: Wed, 03 May 2023 15:34:15 +0000
  • Ironport-data: A9a23:Gda3bK3OGQxlFd4v5vbD5Ztxkn2cJEfYwER7XKvMYLTBsI5bpzICx zMbD2uDaPbbNmGhKIh1bIS1o01S7MWBz9RkGQRlpC1hF35El5HIVI+TRqvS04F+DeWYFR46s J9OAjXkBJppJpMJjk71atANlVEliefTAOK6ULWeUsxIbVcMYD87jh5+kPIOjIdtgNyoayuAo tq3qMDEULOf82cc3lk8tuTS+XuDgNyo4GlD5gFmPqgR1LPjvyJ94Kw3dPnZw0TQGuG4LsbiL 87fwbew+H/u/htFIrtJRZ6iLyXm6paLVeS/oiI+t5qK23CulQRrukoPD9IOaF8/ttm8t4sZJ OOhF3CHYVxB0qXkwIzxWvTDes10FfUuFLTveRBTvSEPpqFvnrSFL/hGVSkL0YMkFulfAlFOp fo0LCk3dwmMrriM37mEENNSiZF2RCXrFNt3VnBI6xj8VK9jareaBqLA6JlfwSs6gd1IEbDGf c0FZDFzbRPGJRpSJlMQD5F4l+Ct7pX9W2QA9BTJ+uxqvS6Kk1cZPLvFabI5fvSjQ8lPk1nej WXB52njWTkRNcCFyCrD+XWp7gPKtXqjCNtOT+PlqJaGhnWBgXc1Mg84SWCk//OIkWmQafByA n0Lr39GQa8asRbDosPGdx+3unmfpTYHRsFdVeY97WmlyLfQ4gufLngJSHhGctNOnNQtWTUg2 1uNntXoLT9iqruYTTSa7Lj8hTq2NCocK2MYYmkaRA8B7tvkiIo3iQ/DCN1kFcadhdrwHDDs3 z2QtwAuirMLl8kJ2q6nu1fdjFqEo5nCTgcxoALNTG+hxgp8aMiuYInAwUjW67NMIZiUSnGFv WMYgI6O4eYWF5aPmSeRBuIXE9mB5fmfOTnYqVdqFosm8XKm/HvLVYJa7Sx6JUxpGt0ZYjKva 0jW0Stc6IBSOj22arVwYKq6D8M3we7rEtGNaxzPRt9HY5w0fwje+ihrPBeUxzq0zxNqlrwjM 5CGd8rqFWwdFals0DuxQaEazKMvwSc9g2jUQPgX0iia7FZXX1bNIZ9tDbdERrlRAH+syOkNz +tiCg==
  • Ironport-hdrordr: A9a23:/OOaRq8hh1kQc0ndt+puk+DcI+orL9Y04lQ7vn2ZKCY4TiX8ra uTdZsguiMc5Ax+ZJhDo7C90di7IE80nKQdieN9AV7IZniEhILHFvAH0aLShxHmBi3i5qp8+M 5bAs9D4QTLfDpHZBDBkWyFL+o=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 03/05/2023 4:22 pm, Juergen Gross wrote:
> On 03.05.23 17:15, Julien Grall wrote:
>> Hi,
>> On 03/05/2023 15:38, andrew.cooper3@xxxxxxxxxx wrote:
>>> Hello,
>>> After what seems like an unreasonable amount of debugging, we've
>>> tracked
>>> down exactly what is going wrong here.
>>> https://gitlab.com/xen-project/people/andyhhp/xen/-/jobs/4219721944
>>> Of note is the smoke.serial log around:
>>> io: IN 0xffff90fec250 d0 20230503 14:20:42 INTRODUCE (1 233473 1 )
>>> obj: CREATE connection 0xffff90fff1f0
>>> *** d1 CONN RESET req_cons 00000000, req_prod 0000003a rsp_cons
>>> 00000000, rsp_prod 00000000
>>> io: OUT 0xffff9105cef0 d0 20230503 14:20:42 WATCH_EVENT
>>> (@introduceDomain domlist )
>>> XS_INTRODUCE (in C xenstored at least, not checked O yet) always
>>> clobbers the ring pointers.  The added pressure on dom0 that the
>>> xensconsoled adds with it's 4M hypercall bounce buffer occasionally
>>> defers xenstored long enough that the XS_INTRODUCE clobbers the first
>>> message that dom1 wrote into the ring.
>>> The other behaviour seen was xenstored observing a header looking
>>> like this:
>>> *** d1 HDR { ty 0x746e6f63, rqid 0x2f6c6f72, txid 0x74616c70, len
>>> 0x6d726f66 }
>>> which was rejected as being too long.  That's "control/platform" in
>>> ASCII, so the XS_INTRODUCE intersected dom1 between writing the header
>>> and writing the payload.
>>> Anyway, it is buggy for XS_INTRODUCE to be called on a live an
>>> unsuspecting connection.  It is ultimately init-dom0less's fault for
>>> telling dom1 it's good to go before having waited for XS_INTRODUCE to
>>> complete.
>> So the problem is xenstored will set interface->connection to
>> XENSTORE_CONNECTED before finalizing the connection. Caqn you try the
>> following, for now, very hackish patch:
>> diff --git a/tools/xenstore/xenstored_domain.c
>> b/tools/xenstore/xenstored_domain.c
>> index f62be2245c42..bbf85bbbea3b 100644
>> --- a/tools/xenstore/xenstored_domain.c
>> +++ b/tools/xenstore/xenstored_domain.c
>> @@ -688,6 +688,7 @@ static struct domain *introduce_domain(const void
>> *ctx,
>>                  talloc_steal(domain->conn, domain);
>>                  if (!restore) {
>> +                       domain_conn_reset(domain);
>>                          /* Notify the domain that xenstore is
>> available */
>>                          interface->connection = XENSTORE_CONNECTED;
> I think there are barriers missing (especially in order to work on Arm)?

Yes there are.  I think x86 skates by on side effects of hypercalls.

> And I think you will break dom0 with calling domain_conn_reset(), as the
> kernel might already have written data into the xenbus page. So you might
> want to make the call depend on !is_master_domain.

And this is why I am very deliberately not doing anything until the
documentation is matches reality, and is safe to use.

For starters, shuffling this doesn't make any difference for a domU
which hasn't been taught about this optional extension.  Ignoring such
cases is not an acceptable fix.




Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.