[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] x86/ioreq server: Fix DomU couldn't reboot when using p2m_ioreq_server p2m_type
On 09/05/17 06:21, Yu Zhang wrote: > > > On 5/8/2017 7:12 PM, George Dunlap wrote: >> On 08/05/17 11:52, Zhang, Xiong Y wrote: >>>>>>> On 06.05.17 at 03:51, <xiong.y.zhang@xxxxxxxxx> wrote: >>>>>>>>> On 05.05.17 at 05:52, <xiong.y.zhang@xxxxxxxxx> wrote: >>>>>>> 'commit 1679e0df3df6 ("x86/ioreq server: asynchronously reset >>>>>>> outstanding p2m_ioreq_server entries")' will call >>>>>>> p2m_change_entry_type_global() which set entry.recalc=1. Then >>>>>>> the following get_entry(p2m_ioreq_server) will return >>>>>>> p2m_ram_rw type. >>>>>>> But 'commit 6d774a951696 ("x86/ioreq server: synchronously reset >>>>>>> outstanding p2m_ioreq_server entries when an ioreq server unmaps")' >>>>>>> assume get_entry(p2m_ioreq_server) will return p2m_ioreq_server >>>>>>> type, then reset p2m_ioreq_server entries. The fact is the >>>>>>> assumption >>>>>>> isn't true, and sysnchronously reset function couldn't work. Then >>>>>>> ioreq.entry_count is larger than zero after an ioreq server unmaps, >>>>>>> finally this results DomU couldn't reboot. >>>>>> I've had trouble understanding this part already on v1 (btw, why is >>>>>> this one not tagged v2?), and since I still can't figure it I have >>>>>> to ask: >>>>>> Why is it that guest reboot is being impacted here? From what I >>>>>> recall >>>>>> a non-zero count should only prevent migration. >>>>> [Zhang, Xiong Y] Sorry, although they solve the same issue, the >>>>> solution is >>>>> totally different, so I didn't mark this as V2, I will mark the >>>>> following >>>>> as v2 with this solution. >>>>> During DomU reboot, it will first unmap ioreq server in shutdown >>>>> process, >>>>> then it call map ioreq server in boot process. The following >>>>> sentence in >>>>> p2m_set_ioreq_server() result mapping ioreq server failure, then DomU >>>>> couldn't continue booting. >>>>> If ( read_atomic(&p->ioreq.entry_count)) >>>>> goto out; >>>> It is clear that it would be this statement to be the problem one, >>>> but I continue to not see why this would affect reboot: The rebooted >>>> guest runs in another VM with, hence, a different p2m. I cannot see >>>> why there would be a non-zero ioreq.entry_count the first time an >>>> ioreq server claims the p2m_ioreq_server type for this new domain. >>>> >>> [Zhang, Xiong Y] This is what I see from xl dmesg when a DomU reboot >>> 1) unmap io_req_server with old domid >>> 2) map io_req_server with old domid >>> 3)unmap io_req_server with old domid >>> 4) map io_req_server with new domid >>> >>> The 1) and 2) are triggered by our device reset handler in qemu, it will >>> destroy old device handler, then create device handler with the old >>> domid >>> again. so we could see ioreq.entry_could > 0 with old domid, then reboot >>> process terminated. >> Oh, so it prevents reboot of XenGT, but not of normal guests? >> >> Why does a reboot cause the device to detach, re-attach, and then >> re-detach again? >> >> Also, I'm sorry for missing the bug during review, but it's a bit >> annoying to find out that the core functionality of patch -- detaching >> and re-attaching -- wasn't tested at all before submission. > > Thanks for your reply, George. > This error was introduced in the last version of patch "x86/ioreq > server: Asynchronously reset > outstanding p2m_ioreq_server entries.", which included 2 changes from > its previous version: > > 1> Do not reset p2m_ioreq_server back to p2m_ram_rw when an ioreq server > is mapped; > > 2> Use a helper function to return the p2m type - this new helper > routine returns p2m_ram_rw > instead of p2m_ioreq_server if there's no mapping ioreq server. In > previous versions I just returned > p2m_ioreq_server, and XenGT tests all passed successfully. But I had not > realized this issue for this > last version and did not have enough time to do the test for this > version in the last moment before > code freeze... Indeed, we were under a lot of time pressure and you did spend a lot of time outside of your normal office hours to try to get things fixed; and your (plural) testing did eventually turn this up before the release. Sorry for being a bit irritable. -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |