[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3] x86/mm: Short circuit damage from "fishy" ref/typecount failure

  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Fri, 29 Jan 2021 16:17:30 +0000
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Lclm+FXcibtWb/ZX9wTP/meuPs9bj5m8+jVcX5Z5tjE=; b=ix7pBt/WRE6qlQbzYQLlWKrXyJIBkKa0SIORc0umKt5cnw1k33Cl7EY3zA6SRCm0lx/paPNlLFYVsL17giWbjrrC/dqRqLl9QfIbxwme3O+xt/C2PpLfGdRkzHtXjaeMCr6Syln70ol1A00KOAyTnINstduV/89MkZEhrEo3bnFl5ZGvLxXFqFFmzqU5HyaDGkPPKAjJQUYsnu8OYD1KnDlLiD1eJlmjBlI2MMOoUuLHaBuabH26Rw6R5uJ7a8IyXMy4LthRfWLqwgVJUeqgZaUojL+iQ3lkz4HpSUmMSuOEIUKwIHkzo1YFpEjPcFfQKTG0kzU3VLytv+6xrzNCvg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=CV/8S6q1i6e12ZqJ5OYLXD1OHtjjeHgrBYp1y7LFc7VWPTbT2GD+ROg2ujQZtfhWbqPt6EPC47m2TfXuXBJSjROKO2BkzJ//cQVdr7vFIf+i5A2GXyCKqglcCZRemQBQlWkbXvyMBEVNSc/nPA0bO1ANjLxuwdDSrArgXROx/5baXUhZJ5BGoKsUwivHnLvye6WpbQcxXDfgJqQHUFmiCYJhrSW8t30xYWsGgVNfau3JcjKGJ8F3N0cg2r1sj3/rMMQaSVlwJR6UPVqzWyy0m7Mnja1Crv8QjYZT7ijGiQrNINEv1Yz3vuncY7b7c+m/7AmppVzZDQqz8/89jBrBaQ==
  • Authentication-results: esa3.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
  • Cc: Roger Pau Monné <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Paul Durrant <paul@xxxxxxx>, Tamas K Lengyel <tamas@xxxxxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Fri, 29 Jan 2021 16:17:46 +0000
  • Ironport-sdr: rrS97hm0DG3MbEQSIal1Y45RvhvmJHAUkgskkmzYr+KnRFdSnhsX7wJVjeDo1L41fbfb6mgH6b pMUgjR9z5Weee9kjtGBIiFWemeUkU/3EFR3ASEBcFdHg6KTwggsiYfKbU4Y26DHF603vG2QTh+ d0QHxwXKU4GPzxwZ0xYjYqyKcD7qW7sOipUbOLmqEeuZG9qWI61D/iOsssvQMNCDQrKVIxAnPD 5sMkn6d33tOugaG+nqEpUZIGL6JzzGh7ltoPqtDsIDCQDn4JOUewMT7A4R83mefNFn/yGEIysO 1fk=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 29/01/2021 11:29, Jan Beulich wrote:
> On 25.01.2021 18:59, Andrew Cooper wrote:
>> On 20/01/2021 08:06, Jan Beulich wrote:
>>> Also, as far as "impossible" here goes - the constructs all
>>> anyway exist only to deal with what we consider impossible.
>>> The question therefore really is of almost exclusively
>>> theoretical nature, and hence something like a counter
>>> possibly overflowing imo needs to be accounted for as
>>> theoretically possible, albeit impossible with today's
>>> computers and realistic timing assumptions. If a counter
>>> overflow occurred, it definitely wouldn't be because of a
>>> bug in Xen, but because of abnormal behavior elsewhere.
>>> Hence I remain unconvinced it is appropriate to deal with
>>> the situation by BUG().
>> I'm not sure how to be any clearer.
>> I am literally not changing the current behaviour.  Xen *will* hit a
>> BUG() if any of these domain_crash() paths are taken.
>> If you do not believe me, then please go and actually check what happens
>> when simulating a ref-acquisition failure.
> So I've now also played the same game on the ioreq path (see
> debugging patch below, and again with some non-"//temp"
> changes actually improving overall behavior in that "impossible"
> case). No BUG()s hit, no leaks (thanks to the extra changes),
> no other anomalies observed.
> Hence I'm afraid it is now really up to you to point out the
> specific BUG()s (and additional context as necessary) that you
> either believe could be hit, or that you have observed being hit.

The refcounting logic was taken verbatim from ioreq, with the only
difference being an order greater than 0.  The logic is also identical
to the vlapic logic.

And the reason *why* it bugs is obvious - the cleanup logic
unconditionally put()'s refs it never took to begin with, and hits
underflow bugchecks.

For specifics, a simulated regular ref failure:

diff --git a/xen/common/domain.c b/xen/common/domain.c
index 1051d86a20..314d258e31 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -171,9 +171,14 @@ static int vmtrace_alloc_buffer(struct vcpu *v)
     v->vmtrace.buf = pg;
     for ( i = 0; i < d->vmtrace_frames; i++ )
+    {
+        if ( i == 0 )
+            return -ENOMEM;
         /* Domain can't know about this page yet - something fishy
going on. */
         if ( !get_page_and_type(&pg[i], d, PGT_writable_page) )
+    }
     return 0;

and the simulated typeref failure:

diff --git a/xen/common/domain.c b/xen/common/domain.c
index db845ccc81..bd810157f4 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -172,8 +172,16 @@ static int vmtrace_alloc_buffer(struct vcpu *v)
     for ( i = 0; i < d->vmtrace_frames; i++ )
+        get_page(&pg[i], d);
         if ( i == 0 )
+        {
+            put_page(&pg[i]);
             return -ENOMEM;
+        }
+        get_page_type(&pg[i], PGT_writable_page);
+        continue;
         /* Domain can't know about this page yet - something fishy
going on. */
         if ( !get_page_and_type(&pg[i], d, PGT_writable_page) )

both yield:

(XEN) Xen BUG at /local/xen.git/xen/include/xen/mm.h:610
(XEN) RIP:    e008:[<ffff82d04020423e>]
(XEN) Xen call trace:
(XEN)    [<ffff82d04020423e>] R
(XEN)    [<ffff82d040205497>] F vcpu_create+0x245/0x32b
(XEN)    [<ffff82d04023ae5b>] F do_domctl+0xb48/0x1964
(XEN)    [<ffff82d04030c6b2>] F pv_hypercall+0x2e4/0x53d
(XEN)    [<ffff82d04039045d>] F lstar_enter+0x12d/0x140




Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.