[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] Solaris and barriers
Hi, I'm part of the team integrating Xen support into Solaris and I'm trying to add barrier support to our implementation so that we can improve performance on our back end disk devices. There are some difference in the way that I/O is peformed by Solaris and Linux that make this a little more challenging that it might appear. Hence, this email to try and get some feedback on possible solutions. We are trying to map the semantics of a BLKIF_OP_WRITE_BARRIER onto the behaviour of the Solaris I/O sub-system. The first thing to observe is that we believe that when a BLKIF_OP_WRITE_BARRIER returns to the front end, all previously issued write operations (including the write which triggered the barrier) are complete. Secondly, Solaris separates this notion of a "barrier" away from the write operation and provides the DKIOCFLUSHWRITECACHE ioctl which can be used to request that all previously issues I/Os are flushed to disk. We are currently implementing the behaviour of BLKIF_OP_WRITE_BARRIER in two ways: 1 If a write is requested and the front end write cache is not enabled, then we issue a BLKIF_OP_WRITE_BARRIER which causes the back-end to wait for completion of the write and then to issue a DKIOCFLUSHWRITECACHE ioctl on the underlying device to ensure the write-cache is flushed before returning to the front end If the write cache is enabled, then we issue a BLKIF_OP_WRITE request, which doesn't require a DKIOCFLUSHWRITECACHE ioctl in the back-end. Clearly performance here is greater. 2 If we receive a DKIOCFLUSHWRITECACHE in the front end, then we now have a problem. Because we have received a requirement to ensure previous writes are flushed, but we have no write associated with the request with which to issue a BLKIF_OP_WRITE_BARRIER. We have modified our Solaris front end so that we can issue 0 byte writes and the existing Solaris back-end receives the write and passes it onto the lower level drivers which return success and then eventually result in an ioctl to flush pending writes. This is where we hit the problem I'm trying to solve. On Linux, if a zero byte write is received, then the blkback device returns a failure response to the front end, presumably because a zero byte write will not be accepted by the lower level drivers. This is where I need to work out what my options are for making Solaris work correctly as a domU on a Linux dom0. Things I am considering include: a Caching a previously issued write and when running on Linux issuing the write in place of a zero-byte write so as to succeed. This is a bit of a nasty hack, and is not efficient. b Investigating removing the restriction on zero-byte writes in the Linux blkback driver. I'm not knowledgeable enough about the Linux kernel to know if this would work and would appreciate feedback on this suggestion. This would require zero-byte write support in the Linux block layer, which I am led to believe is not currently allowed. c Adding a new protocol operation, BLKIF_OP_BARRIER, which would require support in Linux blkblack to return success on receipt. In Solaris, we would issue the DKIOCFLUSHWRITECACHE ioctl on our layered device and return. In Linux, I've been informed that it might be implemented via blkdev_issue_flush() functionality. There are pros and cons for each of the above suggestions and they are not mutually exclusive. I imagine we will need to implement (a) in order to work with existing installations. (b) and (c) are alternatives to get around the perceived problem that Linux doesn't like 0 byte writes and to provide a clean solution to the problem which would minimise the requirement for the hack described in (a). (b) would be simpler in some ways, but we still would have issues with the installed base unless we installed a new flag in xenstore which indicated that it was acceptable for a client to pass a zero byte write to the back end. (c) is probably the cleanest approach and would, I think, provide a complete solution when coupled with (a). Ok, those are the alternatives which seem viable to me right now. I've considered and discarded other schemes/alternatives none of which were as desirable as the ones I've listed. I'd really appreciate some feedback from the community at this point. Thanks, Gary -- Gary Pennington Solaris Core OS Sun Microsystems Gary.Pennington@xxxxxxx _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |