[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Bringing up sequence for non-boot CPU fails



On Tue, 2014-02-18 at 14:36 +0200, Andrii Anisov wrote:
>         > +GLOBAL(enter_hyp_mode)
>         > +enter_hyp_mode:
>         > +        adr   r0, save
>         > +        stmea r0, {r4-r13,lr}
>         > +        ldr   r12, =0x102
>         > +        adr   r0, hyp_return
>         > +        dsb
>         > +        isb
>         > +        dmb
>         > +        smc   #0
>         
>         
>         Who/what implements this handler?
> 
> 
> Ian, this handler is implemented by ROM code, and this is the common
> OMAP sequence to switch to HYP mode. On our side we decided to leave
> switch to hyp in XEN for now.

OK, fair enough. I was wonder if maybe it left some cache lines dirty or
leaving caches enabled or something? It might be worth adding a full
cacheflush (i.e. the loop over set/way stuff).

>         Do you have any hardware debugging tools which could give some
>         insight?
>  
> Yep, we have one (TI's Code Composer Studio with STM560v2 JTAG) but it
> has no proper HYP mode debug support yet, TI says it will have in 6
> months or so :( So we the only thing we can do with it is stop CPU at
> some moment and see some registers, no breakpoints or stepping.

Hrm, I guess that might be sufficient to gain some insight.

> What we discovered yet is that the last command executed by CPU1
> before hang is 
>         mcr   CP32(r0, HSCTLR)       /* now paging is enabled */
> After this PC contains 0x00000004, CPSR.M is b11010 what is HYP mode,
> not abort.

I think this is correct for a trap taken from HYP mode -- you jump to
the corresponding vector but there is no actual mode change (since ABT
mode is PL1 and HYP mode is PL2 that would mean you dropped a privilege
level).

The patch at
http://lists.xen.org/archives/html/xen-devel/2013-09/msg00886.html might
help confirm this.

> It looks like we have broken MMU translation.

What do the fault status and fault address registers say?

Offset 0x4 is undefined instruction not prefetch abort which suggests
that there is at least some mapping present, but apparently not the
expected one so the instruction is invalid.

Is the debugger able to tell you what bytes it read instead of the real
instruction?

>         Usually these things are down to either missing cache flushes
>         or barriers, but tracking them down has historically been a
>         total pain.
> I suspected missing flushes during CPU1 MMU tables preparation but
> that code looks correct, I do not see any issues there.

Right, that's why I was wondering about firmware leaving dirty cache
lines around.

On the Cortex-A15 we (actually, firmware) need to set a bit in the ACTLR
to enable cache coherency etc -- I suppose it is worth checking that the
OMAP doesn't have anything similar but I expect that this would be
handled in the firmware already.

Ian.



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.