| [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
 Re: [Xen-devel] [PATCH] xen/blkback: Check for insane amounts of request on the ring.
 
 
On 6/11/2013 3:42 AM, Jan Beulich wrote:
 
On 10.06.13 at 18:43, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> wrote:
 
On Mon, Jun 10, 2013 at 04:52:35PM +0100, Jan Beulich wrote:
 
On 07.06.13 at 22:11, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> wrote:
 
On Tue, Jun 04, 2013 at 03:57:06PM -0400, Konrad Rzeszutek Wilk wrote:
 
+       /* N.B. 'rp', not 'rc'. */
+       if (RING_REQUEST_CONS_OVERFLOW(&blk_rings->common, rp)) {
+               pr_warn(DRV_PFX "Frontend provided bogus ring requests (%d - %d 
= %d).
 
Halting ring processing on dev=%04x\n",
 
+                       rp, rc, rp - rc, blkif->vbd.pdevice);
 
Hm, I seem to be able to get:
[  189.398095] xen-blkback:Frontend provided bogus ring requests (125 - 115 =
10). Halting ring processing on dev=800011
or:
[  478.558699] xen-blkback:Frontend provided bogus ring requests (95 - 94 =
1). Halting ring processing on dev=800011
Which is clearly wrong. Piggybacking on the rsp_prod_pvt does not seem to
cut it.
 
We see that too, but not very frequently. One thing is that
rsp_prod_pvt doesn't get printed along with rc and rp, thus
making it not immediately obvious how this can be off in any way.
Among the instance there are cases where the printed
difference is 32, which makes me wonder whether part of the
problem is the >= in the macro (we may want > here).
And then we might have been living with some sort of issue in the
past, because the existing use of the macro just causes the loop
to be exited, with it getting re-entered subsequently (i.e. at worst
causing performance issues).
 
My observation was that the rsp_prod_pvt was lagging behind b/c the
READ requests weren't completed. In other words, the processing
of the ring was stalled b/c 'make_response' hadn't been called yet.
Which meant that rsp_prod was not updated to rsp_prod_pvt (backend
does not care about that value, only frontend does).
 
I don't buy this: rsp_prod is being updated by the backend just for
the frontend's sake, so this value really doesn't need looking at (or
else we'd become susceptible to the guest maliciously writing that
field).
rsp_prod_pvt, otoh, is never behind rsp_prod, and if the guest
produces requests that don't have matching space for responses,
the guest is doing something bogus (and perhaps malicious).
 
I believe this is what I saw with the rsp_prod_pvt added in the printk. 
Unfortunately I did not save the logs. 
 
Going back to the rc an rp check solves the immediate 'insane ring
check'.
 
Consequently, while this check is better than none at all, I think it
is still too lax, and we really want to check against the produced
responses. Just that other than for the rc check using >=, we'd
need > for the rp one.
But first of all let me see if I can get the original broken check to
trigger wrongly here (so far only our stage testing caught these),
and look at by how much rsp_prod_pvt really lags.
 
OK.
 
Jan
 
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
 
 |