[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

> Hi,
> > I've had a few occasions where tapdisk has segfaulted:
> >
> > tapdisk[9180]: segfault at 7f7e3a5c8c10 ip 00007f7e387532d4 sp
> 00007f7e3a5c8c10 error 4 in libpthread-2.13.so[7f7e38748000+17000]
> > tapdisk:9180 blocked for more than 120 seconds.
> > tapdisk         D ffff88043fc13540     0  9180      1 0x00000000
> >
> > and then like:
> >
> > end_request: I/O error, dev tdc, sector 472008
> >
> > I can't be sure but I suspect that when this happened either one OSD was
> > offline, or the cluster lost quorum briefly.
> Interesting. There might be an issue if a request ends in error, I'll
> have to check that.
> I'll have a look on monday.

You say in tdrbd_finish_aiocb:

        while (1) {
                /* POSIX says write will be atomic or blocking */
                rv = write(prv->pipe_fds[1], (void*)&req, sizeof(req));

but from what I've read in "man 7 pipe", the statement about being atomic only 
applies if the pipe is open in non-blocking mode, and you open it with a call 
to pipe() (same as pipe2(,0)) and you never call fcntl to change it. This would 
be consistent with the random crashes I'm seeing - I thought they were related 
to transient errors but my ceph cluster has been perfectly stable for a few 
days now and it's still happening.

What do you think?



Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.