[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: HVM/PVH Balloon crash


  • To: Elliott Mitchell <ehem+xen@xxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Tue, 7 Sep 2021 10:03:51 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=A5EGaKlsiNTjnRo/nw/nllmUB/AG6tuR/OJmZarPtZ8=; b=djq5Vmyc4BHPYuA68x5tizrkd6sAmZL1lZURsRrI7kXeP30BnpU0q6wDLkv47dxMYrVhD10sf4i/UKj7nP52RSnFED+ZiDbyoiYg/wSdkb4yb5hykgVpeZqwzkEdU6wiDu6aqlxarvw1M6Be0kB/0/bVSM3PkO2RFqoETED/O9B82yIHwjgns7WS3PehcIqURdIRo2vrcoUCc0k/+RpVWT4AYVel/TJ5EM0Mugl8Ua3Pg09zYgmFU9WkzlEZuiaB3RsAI60eF36Q1DXfGLju3BvrXLoiUwYdOOiiQuvAZR16jPRxcSaEd42wvndtru/2qYtS6REkfwxdtA8+N3S9kQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=L3AmYJCtUZXOyyhLbTMQcllVPOL+RGMhLZqx8fi+HiIbQ7hszKWMV822/82ivYBTynjVI24R7QhNt3cJy/4dnxkapIEPRyzyQgUxpuVJLbcepjBynz3nGbLV8LaYw8XMnRvE7dVkxQ1ymZZ/j10UMQDa3HM6nvVzrawuxhhVcCABZ6Gb6/2VcCIwvM7WQZqGz5WQ6CfXPOc9izHIJ8M1N1oDI0cB1PZZ9k3hECDKP9ibOglpE/m/fxm9vvFdtiCB+EDF5pmjSxzYOXPPR5O32PKJg8AKN1vQ/HmVK0zmwpRDDcNkbOxWkwK4gqkAgjSS5PjKAUsFH3ijAHqrF6O2Og==
  • Authentication-results: lists.xenproject.org; dkim=none (message not signed) header.d=none;lists.xenproject.org; dmarc=none action=none header.from=suse.com;
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Tue, 07 Sep 2021 08:04:01 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 06.09.2021 22:47, Elliott Mitchell wrote:
> On Mon, Sep 06, 2021 at 09:52:17AM +0200, Jan Beulich wrote:
>> On 06.09.2021 00:10, Elliott Mitchell wrote:
>>> I brought this up a while back, but it still appears to be present and
>>> the latest observations appear rather serious.
>>>
>>> I'm unsure of the entire set of conditions for reproduction.
>>>
>>> Domain 0 on this machine is PV (I think the BIOS enables the IOMMU, but
>>> this is an older AMD IOMMU).
>>>
>>> This has been confirmed with Xen 4.11 and Xen 4.14.  This includes
>>> Debian's patches, but those are mostly backports or environment
>>> adjustments.
>>>
>>> Domain 0 is presently using a 4.19 kernel.
>>>
>>> The trigger is creating a HVM or PVH domain where memory does not equal
>>> maxmem.
>>
>> I take it you refer to "[PATCH] x86/pod: Do not fragment PoD memory
>> allocations" submitted very early this year? There you said the issue
>> was with a guest's maxmem exceeding host memory size. Here you seem to
>> be talking of PoD in its normal form of use. Personally I uses this
>> all the time (unless enabling PCI pass-through for a guest, for being
>> incompatible). I've not observed any badness as severe as you've
>> described.
> 
> I've got very little idea what is occurring as I'm expecting to be doing
> ARM debugging, not x86 debugging.
> 
> I was starting to wonder whether this was widespread or not.  As such I
> was reporting the factors which might be different in my environment.
> 
> The one which sticks out is the computer has an older AMD processor (you
> a 100% Intel shop?).

No, AMD is as relevant to us as is Intel.

>  The processor has the AMD NPT feature, but a very
> early/limited IOMMU (according to Linux "AMD IOMMUv2 functionality not
> available").
> 
> Xen 4.14 refused to load the Domain 0 kernel as PVH (not enough of an
> IOMMU).

That sounds odd at the first glance - PVH simply requires that there be
an (enabled) IOMMU. Hence the only thing I could imagine is that Xen
doesn't enable the IOMMU in the first place for some reason.

Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.