Xen project Mailing List

Re: [PATCH 1/4] xen/dmalloc: Introduce dmalloc() APIs

From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Thu, 14 Jan 2021 15:30:16 +0000

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ncNNet4xyHdOK7B8MXAMYE66qwAWGPRpf7xuUEtwNmI=; b=oKiUKsVKHzhAf/5CdGCX72qrDKzBuPb71A43a7DTXkG0TrD9+YQNFQDdLnie45qUrs61tYz61SUbGmlJZ627knrD7KOzGV7kXuaH/8C1lCw7qFcjh/58fjnbhOTzmR4hh5S6dTvtbW7sHPwiJ7tp4Nj/IVG9oT5o7VI9W37jopBXsGtak9UDvH6NMvji4A4ydhivOW+G4YzintaI7K12bbT9wOt3yR9C+9OYcKLCIWzX/+FTmd1LhEjU4HmPMJHUrWfP+ZCRikXimbLEKbXjAMO1p+p+W+Zb53mPE7rVxb1QQ7nbcb3L/RXAi8smtRwOGRYWMVZD9Eeuu46qhZV1JQ==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=J+KRl8MxHVHM+Z/HgVtAEFCT3H1YmDW6t70v9mqEnjQJT/MQSiALQ6dU4xP6MfyYOEI0wghxJZ8OnYc2xDFOYmpCxtviYy3A2JMPW4BsOBEZ0qbPlVGA0mv6xad7GuAfETtRo6qZ6CcqQUvkq4T+snk9p7zc3bXar5kaZ7IobET6gcy/JJn48pV/P9MjoKGlHQyJesNh3cuunuVHarsx0I2pEGVqCfNkDQzYLbhqSQpxJHJ06CCQb3k5oTKos4p+inlhwZTBLoti0c50D/R35imUDXLM9w3fDX4/THWiCwuPVS5GhBSqOtcGsyrqKURT7mCYAwBaTUsP5tgz7sVxtQ==

Authentication-results: esa5.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com

Cc: Roger Pau Monné <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Tamas K Lengyel <tamas@xxxxxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Thu, 14 Jan 2021 15:30:34 +0000

Ironport-sdr: GRO0lCI4QVeiOwkaj2fyg/fsVj+vzRsbWBnmB078hJCqe0V3aeZ1/MKQKfU5aN2Obcx1yPprPN VYVXLihJ9ktfGc1sILdLIX0hPVIeqi5+Yh8n+RnUBhJnl88HYl/Kp5X5dZzhAmo1Pvcc8Ersq3 CpZ4/iL547uT+EkFgdGfpb3l+TiNnui5gB5U6o4N019hL5AzevKirUjgmlnIkP1L3sNEtKM0UJ 4+ZCYgZzowfROXYVKS8nrYeVs7dcb2SpPnra+uSr4R9YOj3TPme2QD8no+9+3VE9iUCfEQd6lJ V2Y=

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 14/01/2021 10:14, Jan Beulich wrote: > On 14.01.2021 00:16, Andrew Cooper wrote: >> On 05/01/2021 15:56, Jan Beulich wrote: >>> On 23.12.2020 17:34, Andrew Cooper wrote: >>>> RFC: >>>> * This probably wants to be less fatal in release builds >>> I'm not even convinced this wants to be a panic() in debug builds. >> Any memory leak spotted by this is an XSA, except in the narrow case of >> being due exclusively to a weird and non-default order of hypercalls. >> >> It absolutely needs to be something fatal in debug builds, for avoid >> going unnoticed by testing. > This is a perfectly valid position to take, but I'm afraid once > again isn't the only possible one. Since I do routinely look at > logs, I'm personally in favor of avoiding crashing the host > even for debug builds. The more that I've had pretty bad fallout > from crashes in the past, due to (afaict) bugs in a file system > driver in Linux that persisted over a longer period of time. By that logic, we should replace most assertions with printk()s. The only reason I didn't use assert in this patch is because the backtrace/etc is totally useless (we're in an RCU callback). No-one, not even you, reviews all the logs from OSSTest. It's sometimes hard enough to get anyone to look at the logs even when there are emails saying "$X looks broken". The point at which we can spot the resource leak is too late to leave a zombie domain around (we've already removed the domain from the domlist), and a printk() is not an acceptably robust way of signalling the problem. Any change in the wording will render detection useless, and some testing scenarios (stress in particular) will manage to wrap the console ring in short order which risks hiding all traces of the problem. We're talking about something which will only occur on staging for ill-reviewed patches, along with a (hopefully) reliable test developers can use themselves, *and* in due course, this test being run pre-push when we sort out the Gitlab CI. The bottom line is that making this fatal is the only way we have of getting people to pay attention to the severity of the issue, and I don't think it reasonable to hamper automated testing's ability to spot these issues, simply because you believe you're better than average at reading your log files. ~Andrew

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.