[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] abstract model of IOMMU unmaping/mapping failures



>>> On 06.04.16 at 09:38, <quan.xu@xxxxxxxxx> wrote:
> On April 01, 2016 7:57pm, <JBeulich@xxxxxxxx> wrote:
>> >>> On 31.03.16 at 11:06, <quan.xu@xxxxxxxxx> wrote:
>> > 4. __gnttab_unmap_common():rollback (no change)
>> >
>> > (Existing code)
>> >        >>...
>> >              if ( !kind )
>> >             err = iommu_unmap_page(ld, op->frame);
>> >         else if ( !(kind & MAPKIND_WRITE) )
>> >             err = iommu_map_page(ld, op->frame, op->frame,
>> > IOMMUF_readable);
>> >
>> >         double_gt_unlock(lgt, rgt);
>> >
>> >         if ( err )
>> >             rc = GNTST_general_error;
>> >        <<...
>> >
>> > Similarly no change required, as error has been correctly handled.
>> 
>> I wouldn't call this "correctly handled", but for the hardware domain it 
>> should
>> be good enough, and by crashing DomU-s simply reporting the error up the call
>> tree is sufficient. One question though is whether the loops higher up the 
>> call
>> tree in grant_table.c shouldn't be exited when noticing the domain has 
>> crashed,
>> both to avoid unnecessary work and to reduce the risk of secondary problems.
>> 
> 
> For this point, I suppose that the domain structure is not destructed (we 
> are safe to call domain's element, e.g. ld->grant_table) in do_grant_table_op 
> hypercall cycle,
> even the domain is crashed down. I am not quite sure, whether it is correct 
> or not, could you explain more?

Explain what more? Sure, struct domain stays around until the
domain gets actually cleaned up, so accesses to its grant table
(and alike) remain valid while execution didn't leave the context
of that guest (vCPU) yet.

>> > 7. set_identity_p2m_entry():rollback (no change).
>> >
>> > (Existing code)
>> >   >>...
>> >     if ( !paging_mode_translate(p2m->domain) )
>> >     {
>> >         if ( !need_iommu(d) )
>> >             return 0;
>> >         return iommu_map_page(d, gfn, gfn,
>> IOMMUF_readable|IOMMUF_writable);
>> >     }
>> >   <<...
>> >     if ( is_hardware_domain(d) && !iommu_use_hap_pt(d) )
>> >         ret = iommu_map_page(d, gfn, gfn,
>> IOMMUF_readable|IOMMUF_writable);
>> >    >> ...
>> >
>> > error handling and rollback already in-place.
>> 
>> For the first portion of the quoted code you mean. I don't see any rollback 
>> in
>> the paging_mode_translate() case.
>> 
> 
> Does it refer to as the below call trees:
>  set_identity_p2m_entry()--rmrr_identity_mapping()--intel_iommu_add_device() 
> --...
>  
> set_identity_p2m_entry()--rmrr_identity_mapping()--intel_iommu_remove_device()--...
>  

No, my comment solely referred to set_identity_p2m_entry(),
but it looks like I mis-read the code and there's indeed nothing
to roll back inside that function.

>> > 8. p2m_remove_page():rollback one level(change required).
>> >
>> > (Existing code)
>> >   >>...
>> >     if ( !paging_mode_translate(p2m->domain) )
>> >     {
>> >         if ( need_iommu(p2m->domain) )
>> >             for ( i = 0; i < (1 << page_order); i++ )
>> >                 iommu_unmap_page(p2m->domain, mfn + i);
>> >         return 0;
>> >     }
>> >   <<...
>> >
>> > There is no error checking of iommu_unmap_page. We need to add.
>> > However, there are several callers of this function which don't handle
>> > error at all. I'm not sure whether hardware domain will use this function.
>> > Since it is in core p2m logic (which I don't have much knowledge), I
>> > hope we can print a warning and simply return error here (given the
>> > fact that non-hardware domains are all crashed immediately within
>> > unmap call)
>> 
>> Yes, at least error propagation needs to be added here. I don't think more is
>> required. (Be careful, btw, with adding warnings - you must not spam the log
>> with such.)
>> 
> 
> 
>  agreed. A quick question about error propagation:
> ...
> for ( i = 0; i < (1UL << page_order); i++ )
>     iommu_unmap_page(p2m->domain, gfn + i);
> ...
> 
> As you mentioned, as a special case, unmapping should perhaps continue 
> despite an error, in an attempt to do best effort cleanup.
> Then i could modify the code as:
> 
> ...
> for ( i = 0; i < (1UL << page_order); i++ )
> {
>     rc = iommu_unmap_page(p2m->domain, gfn + i);
>     if ( rc )
>       ret = rc;
> }
> ..
>    return ret;
> ...
> 
> It looks cumbersome to me, any suggestion?

What's cumbersome here?

>> > 9. p2m_pt_set_entry(): open (propose no-rollback).
>> >
>> > (Existing code)
>> >   >>...
>> >            for ( i = 0; i < (1UL << page_order); i++ )
>> >                 iommu_map_page(p2m->domain, gfn + i, mfn_x(mfn) + i,
>> >                                iommu_pte_flags);
>> >         else
>> >             for ( i = 0; i < (1UL << page_order); i++ )
>> >                 iommu_unmap_page(p2m->domain, gfn + i);
>> >   <<...
>> >
>> > Above is in the end of the func.
>> >
>> > I'm not sure whether it will be called for hardware domain ( PVH mode
>> > is one potential concern. as we know, PVH has been introduced on Xen
>> > 4.4 as a DomU, and on Xen 4.5 as a Dom0.).
>> 
>> Indeed you can get here for Dom0 only when it's non-PV.
> 
> A quick question, is non-PV the same as PVH for hardware domain?

Right now, yes. Once PVHv2 becomes usable for Dom0, that
would be another such variant.

>> > 11. __get_page_type(): open (propose no-rollback).
>> > The following is in detail:
>> >     >>...
>> >       if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )
>> >         {
>> >             if ( (x & PGT_type_mask) == PGT_writable_page )
>> >                 iommu_unmap_page(d, mfn_to_gmfn(d,
>> page_to_mfn(page)));
>> >             else if ( type == PGT_writable_page )
>> >                 iommu_map_page(d, mfn_to_gmfn(d,
>> page_to_mfn(page)),
>> >                                page_to_mfn(page),
>> >
>> IOMMUF_readable|IOMMUF_writable);
>> >         }
>> >     <<...
>> >
>> > It's almost in the end of the func.
>> > As the limited set of cases where __get_page_type() calls
>> > iommu_{,un}map_page(), and a get_page_type() invocation that
>> > previously was known to never fail (or at least so we hope, based on
>> > the existing code) [I got it from Jan's reply.],
>> 
>> Please don't mis-quote me: I said this for a specific caller of the 
>> function, not
>> about the function itself.
>> 
> 
> Sorry for that.
> 
>> > we can classify it as normal PV domain and be treated as a fatal error.
>> > So for I am inclined to leave it as today.
>> 
> 
> I think one level propagation, only propagated in __get_page_type(), is 
> enough.

Not sure what you mean with "one level propagation" here.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.