[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] xen-blkback unmap with network retansmission will cause a coredump



El 23/09/14 a les 15.27, Chentao(Boby) ha escrit:
> 
> 
> On 2014/9/22 18:01, Roger Pau Monné wrote:
>> El 20/09/14 a les 12.57, Chentao(Boby) ha escrit:
>>> Hi konrad and roger,
>>>
>>>     When xen-blkback module executes unmap operation, and at the same time 
>>> the skb of network retansmission uses this map page, it will cause a crash 
>>> of hostos.
>>> The crash stack of this problem is like below.
>>> <ffffffff8041133e>{do_page_fault+0x38e}
>>> <ffffffff8040d9e8>{page_fault+0x28}
>>> <ffffffff80223cdb>{memcpy+0xb}
>>> <ffffffff802325c2>{swiotlb_tbl_map_single+0x212}
>>> <ffffffff8023274a>{swiotlb_map_page+0x17a}
>>> <ffffffffa03468e6>{tg3:tg3_start_xmit+0x656}
>>> <ffffffff80354d14>{dev_hard_start_xmit+0x334}
>>> <ffffffff803721be>{sch_direct_xmit+0x1ae}
>>>
>>>     I search website, found citrix engineers has met this problem long time 
>>> ago. And I realized citrix engineers solve this problem according to modify 
>>> kernel stack.
>>> Because this modification is very large, linux kernel community hasn't 
>>> accept it until now. I have a immature thought, in dispatch_rw_block_io 
>>> function, if this io
>>> is a write operation, we use grant copy hypercall instead of grant map 
>>> hypercall. I verify my modification and it can solve this problem.
>>>
>>>     What's your opinion of my modification? I am very looking forward to 
>>> your reply. Any reply is appreciated.
>>
>> Hello,
>>
>> Yes, using grant-copy instead of grant-map is going to solve the
>> problem, but it also defeats the purpose of persistent grants. I'm
>> afraid it is going to introduce a noticeable performance penalty.
>>
> Roger, you are right. We found 20%+ performance penalty in 1M 100% sequential 
> write 128 depth
> 
> when workload is running on ramdisk.
> 
>> IMHO a better solution would be to use GNTTABOP_unmap_and_replace with
>> the scratch balloon page instead of GNTTABOP_unmap_grant_ref. See
>> arch/x86/xen/p2m.c m2p_remove_override for an example implementation of
>> this procedure.
>>
> You mean if we replace GNTTABOP_unmap_grant_ref with 
> GNTTABOP_unmap_and_replace in xen-blkback module,
> 
> that will solve the problem. Is my understanding right?

Well, it's not a straight replacement. You will need to issue a
multicall that bundles the grant ref replacement and a MMU operation to
update the scratch page VA to point to the MFN. This is because the
grant replace will remove the MFN from the scratch page VA.

You can find an example about how to do this in m2p_remove_override on
the Linux kernel file arch/x86/xen/p2m.c.

Another option would be to introduce a new hypercall like David
suggests, that does a replacement without redirecting <new_addr> to the
null entry, this way you should be able to avoid the multicall.

Roger.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.