[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Shouldn't backend devices for VMX domain disks be opened with O_DIRECT?
Hi, On Thu, 2006-02-02 at 18:09 -0600, Anthony Liguori wrote: > Referring to the original question, which has been quoted away, > journaling doesn't require that data be written to disk per-say but that > writes occur in a particular order. A journal is always recoverable > given that writes occur in the expected order. Sure... it's *internally* consistent, maybe. But you need more than that. You need guarantees that things are on disk, else external consistency guarantees will be broken. Consider things like sendmail fsync()ing a spool file before telling the sender that the email has been accepted. After that acknowledgement, the sender can delete the mail from its queues knowing that the recipient MTA definitely has the data, and even if it crashes, the mail won't be lost. Databases frequently have similar consistency requirements. If a power failure loses writes that you have told the domU have completed --- even if you maintain write ordering --- then you *are* putting application correctness at risk, there's no doubt about it. > A buffer cache will have > no effect on that order so you're no more likely to have corruption than > if you disabled the buffer cache. Not if it's being used as a write-through cache. If it's write-back, it will have a major impact on ordering. > You especially want the buffer cache if you have LVM partitions. > Sectors on an LVM disk are not necessarily contiguous and can even span > multiple disks. You definitely want the IO scheduler involved there. That does not at all imply the use of the buffer cache. All that you need to satisfy this is AIO (asynchronous *submission* of the IO) combined with O_DIRECT IO (synchronous *completion*) --- ie. you can submit multiple IOs concurrently, but you know for sure when each one completes. That still lets the elevator get strongly involved in the scheduling and reordering of the IOs, but lets you know reliably when things hit disk. Fortunately, that's just what blkback is doing --- it's using submit_bio to submit the write IOs without waiting for completion, and is using the bio's bi_end_io callback to process the IO completion once it is hard on disk. --Stephen _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |