[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC PATCH] xen-block: introduces extra request to pass-through SCSI commands



On 29/02/16 04:37, Bob Liu wrote:
> 1) What is this patch about?
> This patch introduces an new block operation (BLKIF_OP_EXTRA_FLAG).
> A request with BLKIF_OP_EXTRA_FLAG set means the following request is an
> extra request which is used to pass through SCSI commands.
> This is like a simplified version of XEN_NETIF_EXTRA_* in netif.h.
> It can be extended easily to transmit other per-request/bio data from frontend
> to backend e.g Data Integrity Field per bio.
> 
> 2) Why we need this?
> Currently only raw data segments are transmitted from blkfront to blkback, 
> which
> means some advanced features are lost.
>  * Guest knows nothing about features of the real backend storage.
>       For example, on bare-metal environment INQUIRY SCSI command can be used
>       to query storage device information. If it's a SSD or flash device we
>       can have the option to use the device as a fast cache.
>       But this can't happen in current domU guests, because blkfront only
>       knows it's just a normal virtual disk
> 
>  * Failover Clusters in Windows
>       Failover clusters require SCSI-3 persistent reservation target disks,
>       but now this can't work in domU.
> 
> 3) Known issues:
>  * Security issues, how to 'validate' this extra request payload.
>    E.g SCSI operates on LUN bases (the whole disk) while we really just want 
> to
>    operate on partitions

It's not only validation: some operations just affect the whole LUN
(e.g. Reserve/Release). And what about "multi-LUN" commands like
"report LUNs"?

>  * Can't pass SCSI commands through if the backend storage driver is bio-based
>    instead of request-based.
> 
> 4) Alternative approach: Using PVSCSI instead:
>  * Doubt PVSCSI can support as many type of backend storage devices as 
> Xen-block.

pvSCSI won't need to support all types of backends. It's enough to
support those where passing through SCSI commands makes sense.

Seems to be a similar issue as the above mentioned problem with
bio-based backend storage drivers.

>  * Much longer path:
>    ioctl() -> SCSI upper layer -> Middle layer -> PVSCSI-frontend -> 
> PVSCSI-backend -> Target framework(LIO?) ->
> 
>    With xen-block we only need:
>    ioctl() -> blkfront -> blkback ->

I'd like to see performance numbers before making a decision.

>  * xen-block has been existed for many years, widely used and more stable.

Adding another SCSI passthrough capability wasn't accepted for pvSCSI
(that's the reason I used the Target Framework). Why do you think it
will be accepted for pvblk?

This is not my personal opinion, just a heads up from someone who had a
try already. ;-)


Juergen

> Welcome any input, thank you!
> 
> Signed-off-by: Bob Liu <bob.liu@xxxxxxxxxx>
> ---
>  xen/include/public/io/blkif.h |   73 
> +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 73 insertions(+)
> 
> diff --git a/xen/include/public/io/blkif.h b/xen/include/public/io/blkif.h
> index 99f0326..7c10bce 100644
> --- a/xen/include/public/io/blkif.h
> +++ b/xen/include/public/io/blkif.h
> @@ -635,6 +635,28 @@
>  #define BLKIF_OP_INDIRECT          6
>  
>  /*
> + * Recognised only if "feature-extra-request" is present in backend xenbus 
> info.
> + * A request with BLKIF_OP_EXTRA_FLAG indicates an extra request is followed
> + * in the shared ring buffer.
> + *
> + * By this way, extra data like SCSI command, DIF/DIX and other 
> per-request/bio
> + * data can be transmitted from frontend to backend.
> + *
> + * The 'wire' format is like:
> + *  Request 1: xen_blkif_request
> + * [Request 2: xen_blkif_extra_request]    (only if request 1 has 
> BLKIF_OP_EXTRA_FLAG)
> + *  Request 3: xen_blkif_request
> + *  Request 4: xen_blkif_request
> + * [Request 5: xen_blkif_extra_request]    (only if request 4 has 
> BLKIF_OP_EXTRA_FLAG)
> + *  ...
> + *  Request N: xen_blkif_request
> + *
> + * If a backend does not recognize BLKIF_OP_EXTRA_FLAG, it should *not* 
> create the
> + * "feature-extra-request" node!
> + */
> +#define BLKIF_OP_EXTRA_FLAG (0x80)
> +
> +/*
>   * Maximum scatter/gather segments per request.
>   * This is carefully chosen so that sizeof(blkif_ring_t) <= PAGE_SIZE.
>   * NB. This could be 12 if the ring indexes weren't stored in the same page.
> @@ -703,10 +725,61 @@ struct blkif_request_indirect {
>  };
>  typedef struct blkif_request_indirect blkif_request_indirect_t;
>  
> +enum blkif_extra_request_type {
> +     BLKIF_EXTRA_TYPE_SCSI_CMD = 1,          /* Transmit SCSI command.  */
> +};
> +
> +struct scsi_cmd_req {
> +     /*
> +      * Grant mapping for transmiting SCSI command to backend, and
> +      * also receive sense data from backend.
> +      * One 4KB page is enough.
> +      */
> +     grant_ref_t cmd_gref;
> +     /* Length of SCSI command in the grant mapped page. */
> +     unsigned int cmd_len;
> +
> +     /*
> +      * SCSI command may require transmiting data segment length less
> +      * than a sector(512 bytes).
> +      * Record num_sg and last segment length in extra request so that
> +      * backend can know about them.
> +      */
> +     unsigned int num_sg;
> +     unsigned int last_sg_len;
> +};
> +
> +/*
> + * Extra request, must follow a normal-request and a normal-request can
> + * only be followed by one extra request.
> + */
> +struct blkif_request_extra {
> +     uint8_t type;           /* BLKIF_EXTRA_TYPE_* */
> +     uint16_t _pad1;
> +#ifndef CONFIG_X86_32
> +     uint32_t _pad2;         /* offsetof(blkif_...,u.extra.id) == 8 */
> +#endif
> +     uint64_t id;
> +     struct scsi_cmd_req scsi_cmd;
> +} __attribute__((__packed__));
> +typedef struct blkif_request_extra blkif_request_extra_t;
> +
> +struct scsi_cmd_res {
> +     unsigned int resid_len;
> +     /* Length of sense data returned in grant mapped page. */
> +     unsigned int sense_len;
> +};
> +
> +struct blkif_response_extra {
> +     uint8_t type;  /* BLKIF_EXTRA_TYPE_* */
> +     struct scsi_cmd_res scsi_cmd;
> +} __attribute__((__packed__));
> +
>  struct blkif_response {
>      uint64_t        id;              /* copied from request */
>      uint8_t         operation;       /* copied from request */
>      int16_t         status;          /* BLKIF_RSP_???       */
> +    struct blkif_response_extra extra;
>  };
>  typedef struct blkif_response blkif_response_t;
>  
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.