[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [xen-unstable test] 110009: regressions - FAIL



On Tue, Jun 13, 2017 at 10:30 AM, Julien Grall <julien.grall@xxxxxxx> wrote:
> Hi Jan,
>
>
> On 12/06/2017 15:57, Jan Beulich wrote:
>>>>>
>>>>> On 12.06.17 at 16:30, <julien.grall@xxxxxxx> wrote:
>>>
>>> On 09/06/17 09:19, Jan Beulich wrote:
>>>>>>>
>>>>>>> On 07.06.17 at 10:12, <JBeulich@xxxxxxxx> wrote:
>>>>>>>>
>>>>>>>> On 06.06.17 at 21:19, <sstabellini@xxxxxxxxxx> wrote:
>>>>>>
>>>>>> On Tue, 6 Jun 2017, Jan Beulich wrote:
>>>>>>>>>>
>>>>>>>>>> On 06.06.17 at 16:00, <ian.jackson@xxxxxxxxxxxxx> wrote:
>>>>>>>>
>>>>>>>> Looking at the serial logs for that and comparing them with 10009,
>>>>>>>> it's not terribly easy to see what's going on because the kernel
>>>>>>>> versions are different and so produce different messages about
>>>>>>>> xenbr0
>>>>>>>> (and I think may have a different bridge port management algorithm).
>>>>>>>>
>>>>>>>> But the messages about promiscuous mode seem the same, and of course
>>>>>>>> promiscuous mode is controlled by userspace, rather than by the
>>>>>>>> kernel
>>>>>>>> (so should be the same in both).
>>>>>>>>
>>>>>>>> However, in the failed test we see extra messages about promis:
>>>>>>>>
>>>>>>>>   Jun  5 13:37:08.353656 [ 2191.652079] device vif7.0-emu left
>>>>>>>> promiscuous
>>>>>>>> mode
>>>>>>>>   ...
>>>>>>>>   Jun  5 13:37:08.377571 [ 2191.675298] device vif7.0 left
>>>>>>>> promiscuous mode
>>>>>>>
>>>>>>>
>>>>>>> Wouldn't those be another result of the guest shutting down /
>>>>>>> being shut down?
>>>>>>>
>>>>>>>> Also, the qemu log for the guest in the failure case says this:
>>>>>>>>
>>>>>>>>   Log-dirty command enable
>>>>>>>>   Log-dirty: no command yet.
>>>>>>>>   reset requested in cpu_handle_ioreq.
>>>>>>>
>>>>>>>
>>>>>>> So this would seem to call for instrumentation on the qemu side
>>>>>>> then, as the only path via which this can be initiated is - afaics -
>>>>>>> qemu_system_reset_request(), which doesn't have very many
>>>>>>> callers that could possibly be of interest here. Adding Stefano ...
>>>>>>
>>>>>>
>>>>>> I am pretty sure that those messages come from qemu traditional:
>>>>>> "reset
>>>>>> requested in cpu_handle_ioreq" is not printed by qemu-xen.
>>>>>
>>>>>
>>>>> Oh, indeed - I didn't pay attention to this being a *-qemut-*
>>>>> test. I'm sorry.
>>>>>
>>>>>> In any case, the request comes from qemu_system_reset_request, which
>>>>>> is
>>>>>> called by hw/acpi.c:pm_ioport_writew. It looks like the guest OS
>>>>>> initiated the reset (or resume)?
>>>>>
>>>>>
>>>>> Right, this and hw/pckbd.c look to be the only possible
>>>>> sources. Yet then it's still unclear what makes the guest go
>>>>> down.
>>>>
>>>>
>>>> So with all of the above in mind I wonder whether we shouldn't
>>>> revert 933f966bcd then - that debugging code is unlikely to help
>>>> with any further analysis of the issue, as reaching that code
>>>> for a dying domain is only a symptom as far as we understand it
>>>> now, not anywhere near the cause.
>>>
>>>
>>> Are you suggesting to revert on Xen 4.9?
>>
>>
>> Yes, if we revert now, then I'd say on both master and 4.9.
>
>
> I would be ok with that.

Reverting 933f966bcd

Acked-by: George Dunlap <george.dunlap@xxxxxxxxxx>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.