[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: dom0less vs xenstored setup race Was: xen | Failed pipeline for staging | 6a47ba2f


  • To: Juergen Gross <jgross@xxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, alejandro.vallejo@xxxxxxxxx
  • From: andrew.cooper3@xxxxxxxxxx
  • Date: Wed, 3 May 2023 16:33:58 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ekEcTjd1EIWBQhUD19Z4YcSet7yF/pD6/o03EZiH10w=; b=OzPsNNZs+ZolpP5D/Wzqt5TUvJGSCns3IBQabZDuudoh0fNrwTF5XW1PmJfWeUJ27IFbHKs9yLN4Sy5qYJ0bvSMA79oKuaGqUOlZdCZj9cOdy4h1X7b1cpavU2ByMoBCirGtORg+DkIRsx7WkCdndoLhYHLdiNOUUQG9h7oTpG+Orc/6amzrAp34EPoWomejM2dURXNSmmrc7ZcOPj+Ov4s0TTERCtF1n/4cK1rYsZ2oSfQsCHyxgfTLYtP7SQZVhMwjYcf+d3I2ZSMBknDGl2I5q7urD+vLtPSD/ODec+deSaanHK//RQqjb9+iJ/PI62i5eDp//jNIcp50+d+kEw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=fXDNieHtHUKGnvmk4b3Zxjo2/fB0mBgzGr/wL3cmMwQTO4fFf9+mrZWKFMHPlWdmYt9FTYcbcZIWMvQ/PCIAQnkxRMrLuFr7GRuzDjY1jbCywjrI7SNcdvDtpB+dP9K67zUe2eeZcZQ+ddfGUaH3I583HzKxUgJVSph/942jwG0SnkQNngGnxO1kvpLtws8m/bqqR1InzZiVI0uWyEiUL1gcnaI1y7+mHtAf3tSqRjsI4ttTCdwn27/id9MLL+ONaqdclBxV/ZViKJ8ODUvhbVXJv13MOCDPM2yZ3kZwaNfZ9z8J1s46YTIsx3tNfXNZmPslZk1kpWsA/TPP2+DZYg==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: committers@xxxxxxxxxxxxxx, michal.orzel@xxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxxx, Julien Grall <jgrall@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Edwin Török <edwin.torok@xxxxxxxxx>
  • Delivery-date: Wed, 03 May 2023 15:34:15 +0000
  • Ironport-data: A9a23:Gda3bK3OGQxlFd4v5vbD5Ztxkn2cJEfYwER7XKvMYLTBsI5bpzICx zMbD2uDaPbbNmGhKIh1bIS1o01S7MWBz9RkGQRlpC1hF35El5HIVI+TRqvS04F+DeWYFR46s J9OAjXkBJppJpMJjk71atANlVEliefTAOK6ULWeUsxIbVcMYD87jh5+kPIOjIdtgNyoayuAo tq3qMDEULOf82cc3lk8tuTS+XuDgNyo4GlD5gFmPqgR1LPjvyJ94Kw3dPnZw0TQGuG4LsbiL 87fwbew+H/u/htFIrtJRZ6iLyXm6paLVeS/oiI+t5qK23CulQRrukoPD9IOaF8/ttm8t4sZJ OOhF3CHYVxB0qXkwIzxWvTDes10FfUuFLTveRBTvSEPpqFvnrSFL/hGVSkL0YMkFulfAlFOp fo0LCk3dwmMrriM37mEENNSiZF2RCXrFNt3VnBI6xj8VK9jareaBqLA6JlfwSs6gd1IEbDGf c0FZDFzbRPGJRpSJlMQD5F4l+Ct7pX9W2QA9BTJ+uxqvS6Kk1cZPLvFabI5fvSjQ8lPk1nej WXB52njWTkRNcCFyCrD+XWp7gPKtXqjCNtOT+PlqJaGhnWBgXc1Mg84SWCk//OIkWmQafByA n0Lr39GQa8asRbDosPGdx+3unmfpTYHRsFdVeY97WmlyLfQ4gufLngJSHhGctNOnNQtWTUg2 1uNntXoLT9iqruYTTSa7Lj8hTq2NCocK2MYYmkaRA8B7tvkiIo3iQ/DCN1kFcadhdrwHDDs3 z2QtwAuirMLl8kJ2q6nu1fdjFqEo5nCTgcxoALNTG+hxgp8aMiuYInAwUjW67NMIZiUSnGFv WMYgI6O4eYWF5aPmSeRBuIXE9mB5fmfOTnYqVdqFosm8XKm/HvLVYJa7Sx6JUxpGt0ZYjKva 0jW0Stc6IBSOj22arVwYKq6D8M3we7rEtGNaxzPRt9HY5w0fwje+ihrPBeUxzq0zxNqlrwjM 5CGd8rqFWwdFals0DuxQaEazKMvwSc9g2jUQPgX0iia7FZXX1bNIZ9tDbdERrlRAH+syOkNz +tiCg==
  • Ironport-hdrordr: A9a23:/OOaRq8hh1kQc0ndt+puk+DcI+orL9Y04lQ7vn2ZKCY4TiX8ra uTdZsguiMc5Ax+ZJhDo7C90di7IE80nKQdieN9AV7IZniEhILHFvAH0aLShxHmBi3i5qp8+M 5bAs9D4QTLfDpHZBDBkWyFL+o=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 03/05/2023 4:22 pm, Juergen Gross wrote:
> On 03.05.23 17:15, Julien Grall wrote:
>> Hi,
>>
>> On 03/05/2023 15:38, andrew.cooper3@xxxxxxxxxx wrote:
>>> Hello,
>>>
>>> After what seems like an unreasonable amount of debugging, we've
>>> tracked
>>> down exactly what is going wrong here.
>>>
>>> https://gitlab.com/xen-project/people/andyhhp/xen/-/jobs/4219721944
>>>
>>> Of note is the smoke.serial log around:
>>>
>>> io: IN 0xffff90fec250 d0 20230503 14:20:42 INTRODUCE (1 233473 1 )
>>> obj: CREATE connection 0xffff90fff1f0
>>> *** d1 CONN RESET req_cons 00000000, req_prod 0000003a rsp_cons
>>> 00000000, rsp_prod 00000000
>>> io: OUT 0xffff9105cef0 d0 20230503 14:20:42 WATCH_EVENT
>>> (@introduceDomain domlist )
>>>
>>> XS_INTRODUCE (in C xenstored at least, not checked O yet) always
>>> clobbers the ring pointers.  The added pressure on dom0 that the
>>> xensconsoled adds with it's 4M hypercall bounce buffer occasionally
>>> defers xenstored long enough that the XS_INTRODUCE clobbers the first
>>> message that dom1 wrote into the ring.
>>>
>>> The other behaviour seen was xenstored observing a header looking
>>> like this:
>>>
>>> *** d1 HDR { ty 0x746e6f63, rqid 0x2f6c6f72, txid 0x74616c70, len
>>> 0x6d726f66 }
>>>
>>> which was rejected as being too long.  That's "control/platform" in
>>> ASCII, so the XS_INTRODUCE intersected dom1 between writing the header
>>> and writing the payload.
>>>
>>>
>>> Anyway, it is buggy for XS_INTRODUCE to be called on a live an
>>> unsuspecting connection.  It is ultimately init-dom0less's fault for
>>> telling dom1 it's good to go before having waited for XS_INTRODUCE to
>>> complete.
>>
>> So the problem is xenstored will set interface->connection to
>> XENSTORE_CONNECTED before finalizing the connection. Caqn you try the
>> following, for now, very hackish patch:
>>
>> diff --git a/tools/xenstore/xenstored_domain.c
>> b/tools/xenstore/xenstored_domain.c
>> index f62be2245c42..bbf85bbbea3b 100644
>> --- a/tools/xenstore/xenstored_domain.c
>> +++ b/tools/xenstore/xenstored_domain.c
>> @@ -688,6 +688,7 @@ static struct domain *introduce_domain(const void
>> *ctx,
>>                  talloc_steal(domain->conn, domain);
>>
>>                  if (!restore) {
>> +                       domain_conn_reset(domain);
>>                          /* Notify the domain that xenstore is
>> available */
>>                          interface->connection = XENSTORE_CONNECTED;
>
> I think there are barriers missing (especially in order to work on Arm)?

Yes there are.  I think x86 skates by on side effects of hypercalls.

>
> And I think you will break dom0 with calling domain_conn_reset(), as the
> kernel might already have written data into the xenbus page. So you might
> want to make the call depend on !is_master_domain.

And this is why I am very deliberately not doing anything until the
documentation is matches reality, and is safe to use.

For starters, shuffling this doesn't make any difference for a domU
which hasn't been taught about this optional extension.  Ignoring such
cases is not an acceptable fix.

~Andrew



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.