|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Issue with shared information page on Xen/ARM 4.17
On Wed, Oct 04, 2023 at 11:55:05AM +0100, Julien Grall wrote:
> Hi Roger,
>
> On 04/10/2023 09:13, Roger Pau Monné wrote:
> > On Tue, Oct 03, 2023 at 12:18:35PM -0700, Elliott Mitchell wrote:
> > > On Tue, Oct 03, 2023 at 10:26:28AM +0200, Roger Pau Monné wrote:
> > > > On Thu, Sep 28, 2023 at 07:49:18PM -0700, Elliott Mitchell wrote:
> > > > > I'm trying to get FreeBSD/ARM operational on Xen/ARM. Current issue
> > > > > is
> > > > > the changes with the handling of the shared information page appear to
> > > > > have broken things for me.
> > > > >
> > > > > With a pre-4.17 build of Xen/ARM things worked fine. Yet with a build
> > > > > of the 4.17 release, mapping the shared information page doesn't work.
> > > >
> > > > This is due to 71320946d5edf AFAICT.
> > >
> > > Yes. While the -EBUSY line may be the one triggering, I'm unsure why.
> > > This seems a fairly reasonable change, so I had no intention of asking
> > > for a revert (which likely would have been rejected). There is also a
> > > real possibility the -EBUSY comes from elsewhere. Could also be
> > > 71320946d5edf caused a bug elsewhere to be exposed.
> >
> > A good way to know would be to attempt to revert 71320946d5edf and see
> > if that fixes your issue.
> >
> > Alternatively you can try (or similar):
> >
> > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> > index 6ccffeaea57d..105ef3faecfd 100644
> > --- a/xen/arch/arm/mm.c
> > +++ b/xen/arch/arm/mm.c
> > @@ -1424,6 +1424,8 @@ int xenmem_add_to_physmap_one(
> > page_set_xenheap_gfn(mfn_to_page(mfn), gfn);
> > }
> > else
> > + {
> > + printk("%u already mapped\n", space);
> > /*
> > * Mandate the caller to first unmap the page before mapping
> > it
> > * again. This is to prevent Xen creating an unwanted hole in
> > @@ -1432,6 +1434,7 @@ int xenmem_add_to_physmap_one(
> > * to unmap it afterwards.
> > */
> > rc = -EBUSY;
> > + }
> > p2m_write_unlock(p2m);
> > }
> >
> > > > > I'm using Tianocore as the first stage loader. This continues to work
> > > > > fine. The build is using tag "edk2-stable202211", commit fff6d81270.
> > > > > While Tianocore does map the shared information page, my reading of
> > > > > their
> > > > > source is that it properly unmaps the page and therefore shouldn't
> > > > > cause
> > > > > trouble.
> > > > >
> > > > > Notes on the actual call is gpfn was 0x0000000000040072. This is
> > > > > outside
> > > > > the recommended address range, but my understanding is this is
> > > > > supposed
> > > > > to be okay.
> > > > >
> > > > > The return code is -16, which is EBUSY.
> > > > >
> > > > > Ideas?
> > > >
> > > > I think the issue is that you are mapping the shared info page over a
> > > > guest RAM page, and in order to do that you would fist need to create
> > > > a hole and then map the shared info page. IOW: the issue is not with
> > > > edk2 not having unmapped the page, but with FreeBSD trying to map the
> > > > shared_info over a RAM page instead of a hole in the p2m. x86
> > > > behavior is different here, and does allow mapping the shared_info
> > > > page over a RAM gfn (by first removing the backing RAM page on the
> > > > gfn).
> > >
> > > An interesting thought. I thought I'd tried this, but since I didn't see
> > > such in my experiments list. What I had tried was removing all the pages
> > > in the suggested mapping range. Yet this failed.
> >
> > Yeah, I went too fast and didn't read the code correctly, it is not
> > checking that the provided gfn is already populated, but whether the
> > mfn intended to be mapped is already mapped at a different location.
> >
> > > Since this seemed reasonable, I've now tried and found it fails. The
> > > XENMEM_remove_from_physmap call returns 0.
> >
> > XENMEM_remove_from_physmap returning 0 is fine, but it seems to me
> > like edk2 hasn't unmapped the shared_info page. The OS has no idea
> > at which position the shared_info page is currently mapped, and hence
> > can't do anything to attempt to unmap it in order to cover up for
> > buggy firmware.
> >
> > edk2 should be the entity to issue the XENMEM_remove_from_physmap
> > against the gfn where it has the shared_info page mapped. Likely
> > needs to be done as part of ExitBootServices() method.
> >
> > FWIW, 71320946d5edf is an ABI change, and as desirable as such
> > behavior might be, a new hypercall should have introduced that had the
> > behavior that the change intended to retrofit into
> > XENMEM_add_to_physmap.
> I can see how you think this is an ABI change but the previous behavior was
> incorrect. Before this patch, on Arm, we would allow the shared page to be
> mapped twice. As we don't know where the firmware had mapped it this could
> result to random corruption.
>
> Now, we could surely decide to remove the page as x86 did. But this could
> leave a hole in the RAM. As the OS would not know where the hole is, this
> could lead to page fault randomly during runtime.
I would say it's the job of the firmware to notify the OS where the
hole is, by modifying the memory map handled to the OS. Or else
mapping the shared_info page in an unpopulated p2m hole.
When using UEFI there's RAM that will always be in-use by the
firmware, as runtime services cannot be shut down, and hence the
firmware must already have a way to remove/reserve such region(s) on
the memory map.
> Neither of the two behaviors help the users. In fact, I think they only make
> the experience worse because you don't know when the issue will happen.
>
> AFAICT, there is no way for an HVM guestto know which GFN was inuse. So in
> all the cases, I can't think of a way for the OS to workaround properly
> buggy firmware. Therefore, returning -EBUSY is the safest we can do for our
> users and I don't view it as a ABI change (someone rely on the previous
> behavior is bound to failure).
I fully agree the current behavior might not be the best one, but I do
consider this part of the ABI, specially as booting guests using edk2
has now stopped working after this change.
Introducing a different hypercall, or even using
XENMAPSPACE_shared_info with idx = 1 to signal the usage of the new
behavior should be used instead.
This would also allow unifying the behavior between x86 and Arm.
Thanks, Roger.
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |