[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [win-pv-devel] xenvbd (8.x) - blkback/tapdisk3 problems



> -----Original Message-----
> From: win-pv-devel [mailto:win-pv-devel-bounces@xxxxxxxxxxxxxxxxxxxx] On
> Behalf Of Martin Cerveny
> Sent: 28 October 2016 10:40
> To: win-pv-devel@xxxxxxxxxxxxxxxxxxxx
> Subject: [win-pv-devel] xenvbd (8.x) - blkback/tapdisk3 problems
> 
> Hello.
> 
> I have problems with xenvbd (8.x). There was NOT problem with older pv-
> drivers xenvbd (7.2x).
> Questions @ bottom.
> 
> I use remote raw disk as source (multipath+iscsi+iser+ib).
> Two configs:
> 
> --------------------------
> 
> 1) use direct blkback (format=raw, vdev=hda, access=rw,
> target=/dev/mapper/3600144f07a0542580000568ba94a0001)
> 
> Performance is good, but __unusable__ for working.
> 
> Every few seconds/minutes (randomly, depends on disk load) the windows
> hung on io-operations. I usually saw this more often during write operations.
> 
> Sometimes (1:10) I saw "PdoReset" in "DebugView" (DomU):
> 
> 00003034        10:12:32        XENVBD|__PdoReset:Target[0] ====>
> 00003035        10:12:32        XENVBD|__PdoPauseDataPath:Target[0] : Waiting
> for 5 Submitted requests
> 00003036        10:12:52        XENVBD|NotifierDpc:Target[0] : Paused, 5
> outstanding
> 00003037        10:12:53        XENVBD|NotifierDpc:Target[0] : Paused, 4
> outstanding
> 00003038        10:12:53        XENVBD|NotifierDpc:Target[0] : Paused, 3
> outstanding
> 00003039        10:12:53        XENVBD|NotifierDpc:Target[0] : Paused, 2
> outstanding
> 00003040        10:12:53        XENVBD|NotifierDpc:Target[0] : Paused, 1
> outstanding
> 00003041        10:12:53        XENVBD|__PdoPauseDataPath:Target[0] : 0/5
> Submitted requests left (21711 iterrations)
> 00003042        10:12:53        XENVBD|__FrontendSetState:Target[0] : ENABLED
> ----> CLOSING
> 00003043        10:12:53        XENVBD|__FrontendSetState:Target[0] : in state
> CONNECTED
> 00003044        10:12:53        XENVBD|__FrontendSetState:Target[0] : in state
> CLOSING
> 00003045        10:12:53        XENVBD|__FrontendSetState:Target[0] : CLOSING 
> -
> ---> CLOSED
> 00003046        10:12:53        XENVBD|__FrontendSetState:Target[0] : in state
> CLOSED
> 00003047        10:12:53        XENVBD|__FrontendSetState:Target[0] : CLOSED 
> --
> --> ENABLED
> 00003048        10:12:53        XENVBD|FrontendWriteUsage:Target[0] : DUMP
> NOT_HIBER PAGE
> 00003049        10:12:53        XENVBD|PdoUpdateInquiryData:Target[0] : VDI-
> UUID = {00000000-0000-0000-0000-000000000000}
> 00003050        10:12:53        XENVBD|FrontendPrepare:Target[0] : BackendId 0
> (/local/domain/0/backend/vbd/3/768)
> 00003051        10:12:53        XENVBD|__FrontendSetState:Target[0] : in state
> PREPARED
> 00003052        10:12:53        XENVBD|__FrontendSetState:Target[0] : in state
> CONNECTED
> 00003053        10:12:53        XENVBD|__FrontendSetState:Target[0] : in state
> ENABLED
> 00003054        10:12:53        XENVBD|__PdoReset:Target[0] <====
> 
> There is also restart log in Dom0, but no errors on disks/iscsi:
> 
> [ 3919.034421] xen-blkback:backend/vbd/3/768: prepare for reconnect
> [ 3919.039869] xen-blkback:ring-ref 32, event-channel 40, protocol 1 (x86_64-
> abi)
> 

Yes, XENVBD is being asked to reset because Windows thinks the storage is 
stalled and it looks like it was probably right. Suggests a loss of event 
notification somewhere.

> Sometimes (1:1000) systems hungs totally (this is from screenshot, not able
> to save log)
> 
> XENDISK:PdoSendTrimSynchronous:fail2
> XENDISK:PdoSendTrimSynchronous:fail1 (c0000185)
> 

That means that XENVBD has failed the trim SRB, probably because the backend 
doesn't handle it.

> When using OLDER pvdrivers 7.2x, no hunging but also some interesting
> logs in "DebugView" (DomU):
> 
> 00000035        556.56622314    XENVBD|__BufferReaperThread:Reaping
> Buffers (185 > 32)
> 00000036        557.56567383    XENVBD|__BufferReaperThread:Reaping
> Buffers (362 > 32)
> 00000037        558.56231689    XENVBD|__BufferReaperThread:Reaping
> Buffers (209 > 32)
> 

That is normal and expected. Just some caches being flushed.

> ---------------------
> 
> 2) use tapdisk3 (format=raw, vdev=hda, access=rw, script=block-tap,
> target=aio:/dev/mapper/3600144f07a0542580000568ba94a0001)
> 
> Performance is __bad__, but usable for working.
> 
> There is __no__ errors, but performance dropped ~20-50%! Also as
> expected when
> do "CrystalDiskMark", Dom0 "tapdisk" takes 100% of one cpu (is it
> singlethreaded ?), and vmstat reports ~100000 context switches !
> (I think that there is some more optimalization as described
> http://xenserver.org/discuss-virtualization/virtualization-
> blog/entry/tapdisk3.html )
> 

Tapdisk does do some speculative polling, but I'm surprised you see such high 
CPU usage. Tapdisk is XenServer specific though, so xs-devel is where such 
issues should be raised.

> ---------------------
> 
> Enviroments:
> - Windows7 x64
> - tested signed winpv drivers 8.1 and primary on development drivers 8.2
> - xen 4.5.3, 4.6 and primary 4.7.0
> - kernels "XenServer" - kernel-3.10.41-353.380450 (and others from XS6.5)
> and kernel-3.10.96-495.383045.x86_64 (and others from XS7)
> - blktap3 - blktap-3.0.0.xs1001-xs6.5.0 and blktap-3.2.0.xs1087-xs7.0.0.x86_64
> 
> ---------------------
> 
> Questions:
> 
> What is buggy in "direct blkback" chain ?

No idea. Possibly blkback, possibly the underlying storage. Your kernels are 
old and blkback has undergone many changes in more recent kernels.

> Was it tested ?

By XenServer? No. XenServer makes no use of blkback.

> Is there some simple observability tool scripts for Dom0 (for example for
> systemtap) to study blkback behaviour ?
> 

I don't know without looking at the code. It's possible there are some debugfs 
nodes.

> How to check that using "optimized" tapdisk3 ?
> Is install tapdisk3 and xen compiling with "--disable-blktap2" sufficient ?
> Is the performance drop of tapdisk3 and high load in Dom0 expected ?
> 

I think you should ask this on xs-devel, but tapdisk3 is heavily tested across 
many storage types by XenServer and XS7.0 shipped with 8.1 PV drivers so any 
vast performance drop should have been picked up by system test.

  Paul

> Thanks for answers, Martin Cerveny
> 
> _______________________________________________
> win-pv-devel mailing list
> win-pv-devel@xxxxxxxxxxxxxxxxxxxx
> https://lists.xenproject.org/cgi-bin/mailman/listinfo/win-pv-devel
_______________________________________________
win-pv-devel mailing list
win-pv-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/cgi-bin/mailman/listinfo/win-pv-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.