[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] bug in hvmloader xenbus_shutdown logic



Hi,

A couple of months ago, I run into a xenstored IO ring issue ( see 
http://lists.xen.org/archives/html/xen-devel/2013-07/msg00134.html) and at that 
time we add a protection on oxenstored side.

Somehow I meet this problem again. A suse11 hvm domain start and hang with 
error:

    XENBUS request ring is not quiescent (00000010:00000000)!
    XENBUS response ring is not quiescent (00000000:00000013): fixing up

It happens when pv-driver init xenbus modules

...
int xb_init_comms(void)
{
        struct xenstore_domain_interface *intf = xen_store_interface;
        int err;

        if (intf->req_prod != intf->req_cons)
                pr_err("XENBUS request ring is not quiescent "
                       "(%08x:%08x)!\n", intf->req_cons, intf->req_prod);

        if (intf->rsp_prod != intf->rsp_cons) {
                pr_warning("XENBUS response ring is not quiescent"
                           " (%08x:%08x): fixing up\n",
                           intf->rsp_cons, intf->rsp_prod);
                /* breaks kdump */
                if (!reset_devices)
                        intf->rsp_cons = intf->rsp_prod;
        }
...

With the protection logic, oxenstored will treat this domain as " malicious 
domains", but apparently it's not.

And this time I review all the related source code, and found that it may be 
caused by a bug in hvmloader.

In hvmloader after all the setup, there will be a xenbus_shutdown call which 
will clear up the ring,

void xenbus_shutdown(void)
{
...
    /* We zero out the whole ring -- the backend can handle this, and it's 
     * not going to surprise any frontends since it's equivalent to never 
     * having used the rings. */
    memset(rings, 0, sizeof *rings); // <-- clear it here
...
} 

And  memset is implemented as this 
void *
memset(void *s, int c, unsigned n)
{
    uint8_t b = (uint8_t) c;
    uint8_t *p = (uint8_t *)s;
    int i;
    for ( i = 0; i < n; i++ )
        *p++ = b;
    return s;
}

The ring structure is put in a shared page and definition is 
struct xenstore_domain_interface
{
    char req[XENSTORE_RING_SIZE]; /* Requests to xenstore daemon. */
    char rsp[XENSTORE_RING_SIZE]; /* Replies and async watch events. */
    XENSTORE_RING_IDX req_cons, req_prod;
    XENSTORE_RING_IDX rsp_cons, rsp_prod;
};

So here is the problem:
memset can not set all the page to zero in an atomic way, and during the clear 
up process oxenstored may access this ring.
So memset set every 8bit in this page to zero one by one and when req_cons set 
to zero oxenstored read req_cons and req_prod to compare, 
After that oxenstored will process as a normal incoming request and memset will 
zero out the left page. 
Finally oxenstored will increase req_cons but req_prod will remain as 0 and 
that status will be treated as a "bad connection".

To fix that we need to make memset atomicly (clear up req_cons and req_prod the 
same time at least) or we can make oxenstored do not to access the ring when 
hvmloader clearing up.

Any suggestion?






_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.