[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: HVM/PVH Ballon crash


  • To: Elliott Mitchell <ehem+xen@xxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Mon, 6 Sep 2021 09:52:17 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=7qfYe0MVEmrcUZ1IysB0i36pFwGoNwanHSmnwKiOeh0=; b=W3oVsyHyxIPcBbqn8IWX4EDYbv2rWkdUDxI4K/3kHQoCx3+W7BPrmy2knvRE1hBjShoQxNx8LxrrFEvV+XJwx6hGEksm/ucHkqfSVRW9wgk2/0x469lOvJBdt49wzMkj8vfrc4M1ir5ylLJ7DlWeVkbtr0HlScdZ9LwJTg/k+C3CFVK3bJXBzsGlc7iaugicQUh3OAtVdZ8pgYmJQERLVJjq9I0dWNE/RF6EDUI0pG+OSlotIHg5vvyzsDsFSqD+VEifX0+R9+36HY/4Vg2ryAQHP/hrpA0fmoNcvnk6Q4RLp2ZZfXk4AZRGT93WV0rceyqXbH9CLRlIxUAauE1C7A==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=kSBZKt7WpvrLaPsX36LVFe2MBkhIPHObVSsG0VaaX/FzPh5DQpIB1A0vhKuxLCqML/unMl/CHGi+LnJsPamYyS59nmUwMFM+kcz8efuuHI3vvrvMr3KVzciQEUg8bWjZPXU+kc1tmfrdmGLAGxnsSwZkUv1UcBali2jeFcv/KmEyRIR433GzfetW15VM/dRqngA1p42dzvDZkIPUAXbUKUX4Kxsh2mYA5qCU8HWs38KE8uhcKIO6o+/ByA1PCE7zROgggoItfoqcfs+R8zQMjB3ooY3glRxTmv8pqXzEpKOel8XQpmcejF6ZziwIUtMq4RnzeDE+8IRnCQ6fU1+O4A==
  • Authentication-results: lists.xenproject.org; dkim=none (message not signed) header.d=none;lists.xenproject.org; dmarc=none action=none header.from=suse.com;
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Mon, 06 Sep 2021 07:52:34 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 06.09.2021 00:10, Elliott Mitchell wrote:
> I brought this up a while back, but it still appears to be present and
> the latest observations appear rather serious.
> 
> I'm unsure of the entire set of conditions for reproduction.
> 
> Domain 0 on this machine is PV (I think the BIOS enables the IOMMU, but
> this is an older AMD IOMMU).
> 
> This has been confirmed with Xen 4.11 and Xen 4.14.  This includes
> Debian's patches, but those are mostly backports or environment
> adjustments.
> 
> Domain 0 is presently using a 4.19 kernel.
> 
> The trigger is creating a HVM or PVH domain where memory does not equal
> maxmem.

I take it you refer to "[PATCH] x86/pod: Do not fragment PoD memory
allocations" submitted very early this year? There you said the issue
was with a guest's maxmem exceeding host memory size. Here you seem to
be talking of PoD in its normal form of use. Personally I uses this
all the time (unless enabling PCI pass-through for a guest, for being
incompatible). I've not observed any badness as severe as you've
described.

> New observations:
> 
> I discovered this occurs with PVH domains in addition to HVM ones.
> 
> I got PVH GRUB operational.  PVH GRUB appeared at to operate normally
> and not trigger the crash/panic.
> 
> The crash/panic occurred some number of seconds after the Linux kernel
> was loaded.
> 
> 
> Mitigation by not using ballooning with HVM/PVH is workable, but this is
> quite a large mine in the configuration.
> 
> I'm wondering if perhaps it is actually the Linux kernel in Domain 0
> which is panicing.
> 
> The crash/panic occurring AFTER the main kernel loads suggests some
> action by the user domain is doing is the actual trigger of the
> crash/panic.

All of this is pretty vague: If you don't even know what component it
is that crashes / panics, I don't suppose you have any logs. Yet what
do you expect us to do without any technical detail?

Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.