[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check



On 13.05.13 17:21, Liu, Jinsong wrote:
> Christoph Egger wrote:
>> On 13.05.13 15:35, Liu, Jinsong wrote:
>>> Christoph Egger wrote:
>>>> On 13.05.13 12:44, Liu, Jinsong wrote:
>>>>>>> Please refer to the description of patch 2/2, especially
>>>>>>>
>>>>>>>     * For dom0, if really need check, it should check whether
>>>>>>>     vMCE injection for dom0 ready (say, exception trap bounce
>>>>>>>     check, which has been done at inject_vmce()), not check dom0
>>>>>>>     mcelog ready (which has been done at mce_softirq() before
>>>>>>> send global virq to dom0). 
>>>>>>>
>>>>>>> Which means before hypervisor send error log via virq to dom0,
>>>>>>> current code has checked whether mcelog ready at dom0 or not -->
>>>>>>> that's the right place for your concern, and it has indeed done
>>>>>>> check.
>>>>>>
>>>>>> I think, I do not understand the patch description.
>>>>>> Let me rephrase if I do now due to this discussion:
>>>>>>
>>>>>> The mcelog driver in Dom0 registers itself to the virq handler to
>>>>>> provide the machine check logging service.
>>>>>
>>>>> Yes.
>>>>>
>>>>>> Xen checks if a virq handler has been registered
>>>>>
>>>>> Yes.
>>>>>
>>>>>> but does not check
>>>>>> if the dom0 handler is actually ready to take the errors.
>>>>>> This patch fixes this.
>>>>>>
>>>>>
>>>>> I'm not clear your question 'does not check if the dom0 handler
>>>>> is actually ready to take the errors'. Could you elaborate more
>>>>> your concern at this point?
>>>>
>>>> Yes, this is exactly my question. You got it.
>>>>
>>>> Christoph
>>>
>>> Hmm, seems you misunderstand my word. What I meant is,
>>> I don't know what you are asking by 'does not check if the dom0
>>> handler is actually ready to take the errors'. Could you elaborate
>>> more your question? 
>>
>> I reread your patch description:
>>
>>> * For dom0, if really need check, it should check whether vMCE
>>                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>> injection for dom0 ready (say, exception trap bounce check, which
>>   ^^^^^^^^^^^^^^^^^^^^^^^^
>>> has been done at inject_vmce()), not check dom0 mcelog ready (which
>>>                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>> has been done at mce_softirq() before send global virq to dom0).
>>
>> My question is: Is it possible when mcelog driver registers
>> the virq handler that it cannot deal with machine check errors
>> immediately? 
>>
>> Christoph
> 
> Yes, there is a small time window when dom0 mcelog driver init:
> 
>   The last step of xen_late_init_mcelog()
>     --> bind_virq_for_mce
>       --> bind_virq_to_irqhandler
> 
>         irq = bind_virq_to_irq(virq, cpu);
>         if (irq < 0)
>                 return irq;
>         retval = request_irq(irq, handler, irqflags, devname, dev_id);
> 
> Time window: between bind_virq_to_irq and request_irq
> 
> If hypervisor inject virq to notify error log to dom0 exactly during
> this init time window, dom0 no-ops handler will not fetch error log.
> However, it's OK since error log in hypervisor is still there, until
> next time when hypervisor inject virq again, dom0 mcelog driver will
> fetch them. Considering it occurs rarely (and no harm), I think it's OK.
> 
> Patch 2/2 itself is not to fix this issue. Patch 2/2 is to remove
> is_vmce_ready() check since it's problematic (wrong check), overkilled
> (kill system unnecessary), deprecated (vMCE is not bound to host MCE
any more)
> and redundant (both vmce trap callback and mcelog virq has been checked
> in other hypervisor code points).
> 
> I agree the description is not clear at some points, so I will
> re-write the description later.

Thanks. From reading the new description
I think, I got it now why this check is wrong:

Whenever a vmce is injected into dom0 is_vmce_ready() checks if dom0
installed an virq handler for logging and in case dom0 does no logging
dom0 is crashed instead.

Assuming above is right then:
the code path in mcaction.c not only runs when a vmce is injected into
dom0, it also runs when a vmce is injected into a domU.
So that means when dom0 does no logging then it is crashed whenever
a vMCE is injected into any guest. OUCH!

Make Jan happy with his 'goto vmce_failed' concern and you have my ack.

Christoph


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.