[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-ia64-devel] EFI Mapping Windows Install Crash Bug



Hi,

I'm a bit hesitant to jump the gun, but I think that I might have
isolated the cause of win2k3-sp2 crashing during install when my EFI
Mapping patches are applied. Well, perhaps not the cause, but I think I
know where it is dying.

    Quickly as background, the EFI Mapping parches move the mapping
    that EFI is taught on boot time to map memory where Linux places
    it ( basically pa + (0xe<60) ) instead of where Xen usually
    places it ( basically pa + (0xf<60) ). In order to protect this
    mapping from HVM domains a special region id is used. The
    hypervisor switches to that region id just before making any
    PAL, SAL or EFI calls, and switches back to the previous region
    id once the call completes.  As region 7 has to be changed,
    entries that are pinned into the TLB have to be repinned. And
    that is roughly where the fun begins.

As for the problem? It seems to be caused by ia64_mca_cpe_int_caller()
calling ia64_log_queue() which calls ia64_sal_get_state_info(). I
believe that the hypervisor dies in ia64_log_queue() somewhere after
ia64_sal_get_state_info() completes. That is, I am suspecting that the
call to ia64_sal_get_state_info() is returning bogus data.

Furthermore, my traces seem to indicate that the problem arises the
call to ia64_log_queue() and in turn to ia64_sal_get_state_info() is
made when the region id is already switched to make some other PAL, SAL
or EFI call (though I doubt it is particularly important which one).

This seems to make sense to me as ia64_mca_cpe_int_caller() is
"Triggered by sw interrupt from CPE polling routine.".

I am unsure about what to do about this problem, but for testing
purposes I simply removed the call to ia64_log_queue() from
ia64_mca_cpe_int_caller() and things seem to work.

When I say seem to work, this bug does not manifest every time I install
win2k3-sp2. So it can be hard to tell if a change has improved things or
not. But for now, I have not seen a crash occur with this hack in place
(+ various other changes which may or may not be relevant, but this one
seems to be particularly important).

I will investigate my theory that things die in ia64_log_queue()
further. But I wonder if there might be a way to permanently remove/move
the call to ia64_log_queue() out of ia64_mca_cpe_int_caller() and
possibly other PAL, SAL or EFI calls inside other MCA code.

-- 
Horms


_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.