[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL





On 6/4/19 10:17 AM, Jan Beulich wrote:
On 04.06.19 at 11:01, <julien.grall@xxxxxxx> wrote:
On 6/4/19 8:06 AM, Jan Beulich wrote:
On 03.06.19 at 19:15, <anthony.perard@xxxxxxxxxx> wrote:
It turns out that the first commit that fails to boot on rochester is
    e202feb713 xen/cmdline: Fix buggy strncmp(s, LITERAL, ss - s) construct
(even with the "eb8acba82a xen: Fix backport of .." applied)

Now that's particularly odd a regression candidate. It doesn't
touch any Arm code at all (nor does the fixup commit). And the
common code changes don't look "risky" either; the one thing that
jumps out as the most likely of all the unlikely candidates would
seem to be the xen/common/efi/boot.c change, but if there was
a problem there then the EFI boot on Arm would be latently
broken in other ways as well. Plus, of course, you say that the
same change is no problem on 4.12.

Of course the commit itself could be further "bisected" - all
changes other than the introduction of cmdline_strcmp() are
completely independent of one another.

I think this is just a red-herring. The commit is probably modifying
enough the layout of Xen that TLB conflict will appear.

Anthony said backporting 00c96d7742 "xen/arm: mm: Set-up page permission
for Xen mappings earlier on" makes staging-4.11 boots. This patch
removes some of the potential cause of TLB conflict.

I haven't suggested a backport of this patch so far, because there are
still TLB conflict possible within the function modified. It might also
be possible that it exposes more of TLB conflict as more work in Xen is
needed (see my MM-PARTn series).

I don't know whether backporting this patch is worth it compare to the
risk it introduces.

Well, if you don't backport this, what's the alternative road towards a
solution here? I'm afraid the two of you will need to decide one way or
another.

The "two" being?

Looking at the code again, we now avoid replacing 4KB entry with 2MB block entry without respecting the Break-Before-Make sequence. So this is one (actually two) less potential source of TLB conflict.

This patch may introduce more source of TLB conflict is the processor is caching intermediate walk. But this was already the case before, so it may be as bad as I first thought.

I would definitely like to hear an opinion from Stefano here.


In any event this sounds to me as if a similar problem could appear at
any time on any branch. Not a very nice state to be in ...
Thankfully most of those issues will appear at boot time. The update of Xen page-tables at runtime is sort of correct (missing a couple of lock).

But the failure will depend on your code. I expect that we would not see the failure in all the Arm platformed used in osstest but Thunder-X.

It is not a nice state to be, but the task is quite important as Xen was designed on wrong assumption. This implies to rework most of the boot and memory management.

Cheers,

--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.