[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] Xen Crashes when releasing gnttab mappings - of a crashed domain.
Observation: ------------ When connecting two miniOs (using a shared ring), Xen (not a domain) crashes when the miniOs's exits.. Xen crashes and produces the following: (XEN) Xen call trace: (XEN) [<ff11d20d>] __bug+0x29/0x45 (XEN) [<ff107cb3>] gnttab_release_mappings+0xcb/0x2e5 (XEN) [<ff1046dd>] domain_kill+0x29/0x62 (XEN) [<ff10349a>] do_domctl+0x6d6/0xfbc (XEN) [<ff165755>] hypercall+0x95/0xb5 (XEN) (XEN) (XEN) **************************************** (XEN) Panic on CPU 1: (XEN) BUG at grant_table.c:1122 (XEN) **************************************** The cause: ---------- Xen tries to release the grant table mappings by accessing a remote domain grant table. But the remote domain seems to be non-existent and consequently Xen fails: find_domain_by_id (in gnttab_release_mappings) returns NULL. Analysis: --------- This situation described above should never happen: if I understand correctly, a domain should not be completely destroyed until there are no more references to it. See: put_domain(d) // sched.h Which is defined as follows: If ( atomic_dec_and_test( &(_d_->refcnt) ) domain_destroy(_d) It does however happen when a domain crashes. Note that there are two ways to "finish" with a domain (domain.c): 1. domain_kill (which calls domain_destroy) - releases all resources in a gracefull manner. 2. __domain_crash (which calls domain_shutdown) - which seems to kill the domain without proper releasing of resources that reference to it.. (this function is called on extreme cases) Our scenario: ------------- We are running two miniOs with the same profile: Open a ring (share a page with a grant ref and map a page from a remote domain) Write Read Close the ring (dealloc, unmap*) do_exit() Timeline - > MiniOs 1: .......... calls do_exit() -> .. domain_kill() -> .. gnttab_release_mapping() -> .. BUG() MiniOs 2: crashes** *When we unmap we use Xen's hypercall to unmap a grant reference and the gnttab_unmap_grant_ref structure. Note that we have a bug and do NOT set unmap_op.dev_bus_addr to 0 as we should. Xen's API (in public/grant_table.h) explicitly describes that it should be 0 or the grant reference will be treated as valid device mapping. ** Because of the bug descrived in * we cause the domain to crash. We observe: (XEN) grant_table.c:394: Bad frame number doesn't match gntref (XEN) mm.c:760: Attempt to implicitly unmap a granted PTE (XEN) domain_crash called from mm.c:761 Summary: ----------- 1. Setting unmap_op.dev_bus_addr removes the BUG and all is well. 2. But crashing Xen - even with our error - doesn't seem to be a healthy choice. :) Micha. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |