[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/2] xen/arm: Add imx8q{m,x} platform glue


  • To: Julien Grall <julien@xxxxxxx>
  • From: Bertrand Marquis <Bertrand.Marquis@xxxxxxx>
  • Date: Thu, 21 Mar 2024 07:41:57 +0000
  • Accept-language: en-GB, en-US
  • Arc-authentication-results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=lists.xenproject.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com])
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none
  • Arc-message-signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=+gMWqMyEG57HtI+Nne8AiZgBeldtDAfIj+AIysJCgmw=; b=mFcKrfiH6ooB//+JoK0ATjhOTR/Kp+RSkja6JgMqvT+TQ2NB9wKAffnDmKSGoL6PEkqEKD/IN/iNuozCtVfCWm8hLlA08kocvkx2MY/05ztLsFp7yAtXuNZEoOOucTdgD1DInpTpIb4kWg4AG5TB6zR2wWWlpMRoaVI0HlK5ZeAY7RTdlS+0vnCuhYfGbF7K6mb4L//8eag1BjATLcGfj0eYUNBTlbb8PJqXF8y7ip5Hn0HPxECsejo4wn61n98RsXgHCRknM4RnyHuXxwqmK99M6Z+LtwyaJPTZsVchKgN7qICLZw3WAdmfWxHvjNSyFMsqbuDYokgCWASE+JmUPA==
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=+gMWqMyEG57HtI+Nne8AiZgBeldtDAfIj+AIysJCgmw=; b=L6jHudM5NxUULmxYlMQaNvgDOY+TpOrRuy8LG70PbsooBfYaD8OuM3gv9brT1/x4/i3KJUMrhcJsEig9MLazL7TGkY1llyPSzSoIdx4/k796e6l++va8WMPDnN+rx07wJw0WbwrTQ8+xgpJp5ewNZj+nXqKGtnj4NB+r8+o6euMSp+gM9iJ/AOY+AYN31H2nEVu5II57Mv1uaM4pDbIwH39oCrSqIPql81loOd7PeBO4GaVox0sda8ZuStZvsGgjbL0oAcV84E4AXKqHy4OG3VBU0mTEWJ8S/Rpn3+6hKB9L7zpDkRzKcOxhy/cF9wBhowlsnxHlAiy8AYBMlDtuNQ==
  • Arc-seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=inlfm12s9c5/+JuxtDprhFlfEZ+gQ1LhE5NJHRwq44FBViLh/b6yBJODTbaXn5XV1VVaKRDGkX02jEThnMFI9tfA2rL7ximWFuUj10LKsLx5E3XNpTx9c4w/xKFNeOJsspw9rC6hABrdtChl30/pgFv6YQ5jrR1Gd7e9v0m4jQPTmf4zDTf/ManZEofCsomG5W0BtbBJIF0mZVWLoPSn3HFKa7w6QnYHT7CDWcBHuXiX/edmC2jfnU+yONRWkd6AboGYRIyoJJ27rIHgWXWR5+u8CKlN0UynTu2FwnRxhDf/rd4WmG2GaL/HFC8Y8KwKFmAE0LBnFigqZncEX5+7mQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=EnnGPfVZ3Wy5xfWW5sEZ4XmzlFxoZBSs8iW6lXJABIJH7dNvZej9J/ebdjbTtqRfWQ5qXHPy5Fx9de0HFE/ydR2jGyr0fxD5+zVnGT3OC8IXpP0ShSq6z81fIut4yPAj74YdlQiS3v0Emk7psd39sQ91YrAsT8NACtGPCMZpOg3XIjoMKl57QXMmozXqh2K1egl0vWDc1KL/AAhJDZ4cIxgRTb6mq2a4Me9zisT55yusKHt4eIPCaLsa9saCXWqqLEMci/3fNdrukyX4zO81XSqRurNHWP3MOeeoctemB/Tf9SZewaCL3u8YXSFUTt3f477F67PrXn7Mdgabmx9Ujw==
  • Authentication-results-original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com;
  • Cc: John Ernberg <john.ernberg@xxxxxxxx>, Peng Fan <peng.fan@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Jonas Blixt <jonas.blixt@xxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Thu, 21 Mar 2024 07:42:47 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Nodisclaimer: true
  • Original-authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com;
  • Thread-index: AQHaVDuqVQ/kgxLZgUK/m8/hE0ez5bDz198AgAA1LICAAVx4AIAAQjaAgAEdtQCAAynKwIABnCuAgAZ8E4CAKNxCgIAApf0AgAKGQwCAAAbHgIAHmWEAgAtpgACAABVOgIAA6w+A
  • Thread-topic: [PATCH 1/2] xen/arm: Add imx8q{m,x} platform glue

Hi,

> On 20 Mar 2024, at 18:40, Julien Grall <julien@xxxxxxx> wrote:
> 
> Hi John,
> 
> On 20/03/2024 16:24, John Ernberg wrote:
>> Hi Bertrand,
>> On 3/13/24 11:07, Bertrand Marquis wrote:
>>> Hi,
>>> 
>>>> On 8 Mar 2024, at 15:04, Julien Grall <julien@xxxxxxx> wrote:
>>>> 
>>>> Hi John,
>>>> 
>>>> Thank you for the reply.
>>>> 
>>>> On 08/03/2024 13:40, John Ernberg wrote:
>>>>> On 3/7/24 00:07, Julien Grall wrote:
>>>>>>   > Ping on the watchdog discussion bits.
>>>>>> 
>>>>>> Sorry for the late reply.
>>>>>> 
>>>>>> On 06/03/2024 13:13, John Ernberg wrote:
>>>>>>> On 2/9/24 14:14, John Ernberg wrote:
>>>>>>>> 
>>>>>>>>>      * IMX_SIP_TIMER_*:  This seems to be related to the watchdog.
>>>>>>>>> Shouldn't dom0 rely on the watchdog provided by Xen instead? So those
>>>>>>>>> call will be used by Xen.
>>>>>>>> 
>>>>>>>> That is indeed a watchdog SIP, and also for setting the SoC internal 
>>>>>>>> RTC
>>>>>>>> if it is being used.
>>>>>>>> 
>>>>>>>> I looked around if there was previous discussion and only really
>>>>>>>> found [3].
>>>>>>>> Is the xen/xen/include/watchdog.h header meant to be for this kind of
>>>>>>>> watchdog support or is that more for the VM watchdog? Looking at the 
>>>>>>>> x86
>>>>>>>> ACPI NMI watchdog it seems like the former, but I have never worked 
>>>>>>>> with
>>>>>>>> x86 nor ACPI.
>>>>>> 
>>>>>> include/watchdog.h contains helper to configure the watchdog for Xen. We
>>>>>> also have per-VM watchdog and this is configured by the hypercall
>>>>>> SCHEDOP_watchdog.
>>>>>> 
>>>>>>>> 
>>>>>>>> Currently forwarding it to Dom0 has not caused any watchdog resets with
>>>>>>>> our watchdog timeout settings, our specific Dom0 setup and VM count.
>>>>>> 
>>>>>> IIUC, the SMC API for the watchdog would be similar to the ACPI NMI
>>>>>> watchdog. So I think it would make more sense if this is not exposed to
>>>>>> dom0 (even if Xen is doing nothing with it).
>>>>>> 
>>>>>> Can you try to hide the SMCs and check if dom0 still behave properly?
>>>>>> 
>>>>>> Cheers,
>>>>>> 
>>>>> This SMC manages a hardware watchdog, if it's not pinged within a
>>>>> specific interval the entire board resets.
>>>> 
>>>> Do you know what's the default interval? Is it large enough so Xen + dom0 
>>>> can boot (at least up to when the watchdog driver is initialized)?
>>>> 
>>>>> If I block the SMCs the watchdog driver in Dom0 will fail to ping the
>>>>> watchdog, triggering a board reset because the system looks to have
>>>>> become unresponsive. The reason this patch set started is because we
>>>>> couldn't ping the watchdog when running with Xen.
>>>>> In our specific system the bootloader enables the watchdog as early as
>>>>> possible so that we can get watchdog protection for as much of the boot
>>>>> as we possibly can.
>>>>> So, if we are to block the SMC from Dom0, then Xen needs to take over
>>>>> the pinging. It could be implemented similarly to the NMI watchdog,
>>>>> except that the system will reset if the ping is missed rather than
>>>>> backtrace.
>>>>> It would also mean that Xen will get a whole watchdog driver-category
>>>>> due to the watchdog being vendor and sometimes even SoC specific when it
>>>>> comes to Arm.
>>>>> My understanding of the domain watchdog code is that today the domain
>>>>> needs to call SCHEDOP_watchdog at least once to start the watchdog
>>>>> timer. Since watchdog protection through the whole boot process is
>>>>> desirable we'd need some core changes, such as an option to start the
>>>>> domain watchdog on init. >
>>>>> It's quite a big change to make
>>>> 
>>>> For clarification, above you seem to mention two changes:
>>>> 
>>>> 1) Allow Xen to use the HW watchdog
>>>> 2) Allow the domain to use the watchdog early
>>>> 
>>>> I am assuming by big change, you are referring to 2?
>>>> 
>>>> , while I am not against doing it if it
>>>>> makes sense, I now wonder if Xen should manage hardware watchdogs.
>>>>> Looking at the domain watchdog code it looks like if a domain does not
>>>>> get enough execution time, the watchdog will not be pinged enough and
>>>>> the guest will be reset. So either watchdog approach requires Dom0 to
>>>>> get execution time. Dom0 also needs to service all the PV backends it's
>>>>> responsible for. I'm not sure it's valuable to add another layer of
>>>>> watchdog for this scenario as the end result (checking that the entire
>>>>> system works) is achieved without it as well.
>>>>> So, before I try to find the time to make a proposal for moving the
>>>>> hardware watchdog bit to Xen, do we really want it?
>>>> 
>>>> Thanks for the details. Given that the watchdog is enabled by the 
>>>> bootloader, I think we want Xen to drive the watchdog for two reasons:
>>>> 1) In true dom0less environment, dom0 would not exist
>>>> 2) You are relying on Xen + Dom0 to boot (or at least enough to get the 
>>>> watchdog working) within the watchdog interval.
>>> 
>>> Definitely we need to consider the case where there is no Dom0.
>>> 
>>> I think there are in fact 3 different use cases here:
>>> - watchdog fully driven in a domain (dom0 or another): would work if it is 
>>> expected
>>>    that Xen + Domain boot time is under the watchdog initial refresh rate. 
>>> I think this
>>>    could make sense on some applications where your system depends on a 
>>> specific
>>>    domain to be properly booted to work. This would require an initial 
>>> refresh time
>>>    configurable in the boot loader probably.
>> This is our use-case. ^
>> Our dom0 is monitoring and managing the other domains in our system.
>> Without dom0 working the system isn't really working as a whole.
>> @Julien: Would you be ok with the patch set continuing in the direction
>> of the
>> original proposal, letting another party (or me at a later time) implement
>> the fully driven by Xen option?
> I am concerned about this particular point from Bertrand:
> 
> "would work if it is expected that Xen + Domain boot time is under the 
> watchdog initial refresh rate."
> 
> How will a user be able to figure out how to initially configure the 
> watchdog? Is this something you can easily configure in the bootloader at 
> runtime?

Definitely here it would be better to have the watchdog turned off by default 
and document how to enable it in the firmware.

Even if a long timeout is configured by default, a user could run into trouble 
if using a linux without watchdog or not running linux or using dom0less 
without a linux having access to it.
I agree with Julien here that the concern could be that users would come to us 
instead of NXP if there is system is doing a reset without reasons after some 
seconds or minutes.

> 
> 
> Overall, I am not for this approach at least in the current status. I would 
> be more inclined if there are some documentations explaining how this is 
> supposed to be configured on NXP, so others can use the code.
> 
> Anyway, this is why we have multiple Arm maintainers for this kind of 
> situation. If they other agrees with you, then they can ack the patch and 
> this can be merged.

I agree with Stefano that it would be good to have those board supported.

One thing i could suggest until there is a watchdog driver inside Xen is to 
have a clear Warning at Xen boot on those boards in the console so that we 
could at least identify the reason easily if a reset occurs for someone.

Cheers
Bertrand

> 
> Cheers,
> 
> -- 
> Julien Grall





 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.