[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v12 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.

To: Yu Zhang <yu.c.zhang@xxxxxxxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>
From: George Dunlap <george.dunlap@xxxxxxxxxx>
Date: Fri, 7 Apr 2017 15:22:12 +0100
Cc: Kevin Tian <kevin.tian@xxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxx, Paul Durrant <paul.durrant@xxxxxxxxxx>, zhiyuan.lv@xxxxxxxxx, Jun Nakajima <jun.nakajima@xxxxxxxxx>
Delivery-date: Fri, 07 Apr 2017 14:22:20 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 07/04/17 15:05, Yu Zhang wrote:
> 
> 
> On 4/7/2017 9:56 PM, George Dunlap wrote:
>> On 07/04/17 12:31, Jan Beulich wrote:
>>>>>> On 07.04.17 at 12:55, <yu.c.zhang@xxxxxxxxxxxxxxx> wrote:
>>>> On 4/7/2017 6:26 PM, Jan Beulich wrote:
>>>>>>>> On 07.04.17 at 11:53, <yu.c.zhang@xxxxxxxxxxxxxxx> wrote:
>>>>>> On 4/7/2017 5:40 PM, Jan Beulich wrote:
>>>>>>>>>> On 06.04.17 at 17:53, <yu.c.zhang@xxxxxxxxxxxxxxx> wrote:
>>>>>>>> @@ -965,7 +987,7 @@ static mfn_t ept_get_entry(struct p2m_domain
>>>>>>>> *p2m,
>>>>>>>>         if ( is_epte_valid(ept_entry) )
>>>>>>>>         {
>>>>>>>>             if ( (recalc || ept_entry->recalc) &&
>>>>>>>> -             p2m_is_changeable(ept_entry->sa_p2mt) )
>>>>>>>> +             p2m_check_changeable(ept_entry->sa_p2mt) )
>>>>>>> I think the distinction between these two is rather arbitrary, and I
>>>>>>> also think this is part of the problem above: Distinguishing
>>>>>>> log-dirty
>>>>>>> from ram-rw requires auxiliary data to be consulted. The same
>>>>>>> ought to apply to ioreq-server, and then there wouldn't be a need
>>>>>>> to have two p2m_*_changeable() flavors.
>>>>>> Well, I think we have also discussed this quite long ago, here is
>>>>>> the link.
>>>>>> https://lists.xen.org/archives/html/xen-devel/2016-09/msg01017.html
>>>>> That was a different discussion, namely not considering this ...
>>>>>
>>>>>>> Of course the subsequent use p2m_is_logdirty_range() may then
>>>>>>> need amending.
>>>>>>>
>>>>>>> In the end it looks like you have the inverse problem here compared
>>>>>>> to above: You should return ram-rw when the reset was already
>>>>>>> initiated. At least that's how I would see the logic to match up
>>>>>>> with
>>>>>>> the log-dirty handling (where the _effective_ rather than the last
>>>>>>> stored type is being returned).
>>>>> ... at all.
>>>> Sorry I still do not get it.
>>> Leaving ioreq-server out of the picture, what the function returns
>>> to the caller is the type as it would be if a re-calculation would have
>>> been done on the entry, even if that hasn't happened yet (the
>>> function here shouldn't change any entries after all). I think we
>>> want the same behavior here for what have been ioreq-server
>>> entries (and which would become ram-rw after the next re-calc).
>> How about something like the attached? (In-lined for convenience.)
>>
>>   -George
>>
>> After an ioreq server has unmapped, the remaining p2m_ioreq_server
>> entries need to be reset back to p2m_ram_rw. This patch does this
>> asynchronously with the current p2m_change_entry_type_global()
>> interface.
>>
>> New field entry_count is introduced in struct p2m_domain, to record
>> the number of p2m_ioreq_server p2m page table entries. One nature of
>> these entries is that they only point to 4K sized page frames, because
>> all p2m_ioreq_server entries are originated from p2m_ram_rw ones in
>> p2m_change_type_one(). We do not need to worry about the counting for
>> 2M/1G sized pages.
>>
>> This patch disallows mapping of an ioreq server, when there's still
>> p2m_ioreq_server entry left, in case another mapping occurs right after
>> the current one being unmapped, releases its lock, with p2m table not
>> synced yet.
>>
>> This patch also disallows live migration, when there's remaining
>> p2m_ioreq_server entry in p2m table. The core reason is our current
>> implementation of p2m_change_entry_type_global() lacks information
>> to resync p2m_ioreq_server entries correctly if global_logdirty is
>> on.
>>
>> We still need to handle other recalculations, however; which means
>> that when doing a recalculation, if the current type is
>> p2m_ioreq_server, we check to see if p2m->ioreq.server is valid or
>> not.  If it is, we leave it as type p2m_ioreq_server; if not, we reset
>> it to p2m_ram as appropriate.
>>
>> To avoid code duplication, lift recalc_type() out of p2m-pt.c and use
>> it for all type recalculations (both in p2m-pt.c and p2m-ept.c).
>>
>> Signed-off-by: Yu Zhang <yu.c.zhang@xxxxxxxxxxxxxxx>
>> Signed-off-by: George Dunlap <george.dunlap@xxxxxxxxxx>
>> Reviewed-by: Paul Durrant <paul.durrant@xxxxxxxxxx>
>> ---
>>   xen/arch/x86/hvm/ioreq.c  |  8 ++++++
>>   xen/arch/x86/mm/hap/hap.c |  9 ++++++
>>   xen/arch/x86/mm/p2m-ept.c | 73
>> +++++++++++++++++++++++++++++------------------
>>   xen/arch/x86/mm/p2m-pt.c  | 58 ++++++++++++++++++++++++-------------
>>   xen/arch/x86/mm/p2m.c     |  9 ++++++
>>   xen/include/asm-x86/p2m.h | 26 ++++++++++++++++-
>>   6 files changed, 135 insertions(+), 48 deletions(-)
>>
>> diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
>> index 5bf3b6d..07a6c26 100644
>> --- a/xen/arch/x86/hvm/ioreq.c
>> +++ b/xen/arch/x86/hvm/ioreq.c
>> @@ -955,6 +955,14 @@ int hvm_map_mem_type_to_ioreq_server(struct domain
>> *d, ioservid_t id,
>>
>>       spin_unlock_recursive(&d->arch.hvm_domain.ioreq_server.lock);
>>
>> +    if ( rc == 0 && flags == 0 )
>> +    {
>> +        struct p2m_domain *p2m = p2m_get_hostp2m(d);
>> +
>> +        if ( read_atomic(&p2m->ioreq.entry_count) )
>> +            p2m_change_entry_type_global(d, p2m_ioreq_server,
>> p2m_ram_rw);
>> +    }
>> +
>>       return rc;
>>   }
>>
>> diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
>> index a57b385..4b591fe 100644
>> --- a/xen/arch/x86/mm/hap/hap.c
>> +++ b/xen/arch/x86/mm/hap/hap.c
>> @@ -187,6 +187,15 @@ out:
>>    */
>>   static int hap_enable_log_dirty(struct domain *d, bool_t log_global)
>>   {
>> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
>> +
>> +    /*
>> +     * Refuse to turn on global log-dirty mode if
>> +     * there are outstanding p2m_ioreq_server pages.
>> +     */
>> +    if ( log_global && read_atomic(&p2m->ioreq.entry_count) )
>> +        return -EBUSY;
>> +
>>       /* turn on PG_log_dirty bit in paging mode */
>>       paging_lock(d);
>>       d->arch.paging.mode |= PG_log_dirty;
>> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
>> index cc1eb21..9f41067 100644
>> --- a/xen/arch/x86/mm/p2m-ept.c
>> +++ b/xen/arch/x86/mm/p2m-ept.c
>> @@ -533,6 +533,7 @@ static int resolve_misconfig(struct p2m_domain *p2m,
>> unsigned long gfn)
>>               {
>>                   for ( gfn -= i, i = 0; i < EPT_PAGETABLE_ENTRIES; ++i )
>>                   {
>> +                    p2m_type_t nt;
>>                       e = atomic_read_ept_entry(&epte[i]);
>>                       if ( e.emt == MTRR_NUM_TYPES )
>>                           e.emt = 0;
>> @@ -542,10 +543,15 @@ static int resolve_misconfig(struct p2m_domain
>> *p2m, unsigned long gfn)
>>                                                  _mfn(e.mfn), 0, &ipat,
>>                                                  e.sa_p2mt ==
>> p2m_mmio_direct);
>>                       e.ipat = ipat;
>> -                    if ( e.recalc && p2m_is_changeable(e.sa_p2mt) )
>> -                    {
>> -                         e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn +
>> i, gfn + i)
>> -                                     ? p2m_ram_logdirty : p2m_ram_rw;
>> +                    nt = p2m_recalc_type(e.recalc, e.sa_p2mt, p2m, gfn
>> + i);
>> +                    if ( nt != e.sa_p2mt ) {
>> +                         if ( e.sa_p2mt == p2m_ioreq_server )
>> +                         {
>> +                             ASSERT(p2m->ioreq.entry_count > 0);
>> +                             p2m->ioreq.entry_count--;
>> +                         }
>> +
>> +                         e.sa_p2mt = nt;
>>                            ept_p2m_type_to_flags(p2m, &e, e.sa_p2mt,
>> e.access);
>>                       }
>>                       e.recalc = 0;
>> @@ -562,23 +568,24 @@ static int resolve_misconfig(struct p2m_domain
>> *p2m, unsigned long gfn)
>>
>>                   if ( recalc && p2m_is_changeable(e.sa_p2mt) )
>>                   {
>> -                     unsigned long mask = ~0UL << (level *
>> EPT_TABLE_ORDER);
>> -
>> -                     switch ( p2m_is_logdirty_range(p2m, gfn & mask,
>> -                                                    gfn | ~mask) )
>> -                     {
>> -                     case 0:
>> -                          e.sa_p2mt = p2m_ram_rw;
>> -                          e.recalc = 0;
>> -                          break;
>> -                     case 1:
>> -                          e.sa_p2mt = p2m_ram_logdirty;
>> -                          e.recalc = 0;
>> -                          break;
>> -                     default: /* Force split. */
>> -                          emt = -1;
>> -                          break;
>> -                     }
>> +                    unsigned long mask = ~0UL << (level *
>> EPT_TABLE_ORDER);
>> +
>> +                    ASSERT(e.sa_p2mt != p2m_ioreq_server);
>> +                    switch ( p2m_is_logdirty_range(p2m, gfn & mask,
>> +                                                   gfn | ~mask) )
>> +                    {
>> +                    case 0:
>> +                        e.sa_p2mt = p2m_ram_rw;
>> +                        e.recalc = 0;
>> +                        break;
>> +                    case 1:
>> +                        e.sa_p2mt = p2m_ram_logdirty;
>> +                        e.recalc = 0;
>> +                        break;
>> +                    default: /* Force split. */
>> +                        emt = -1;
>> +                        break;
>> +                    }
> 
> 
> So here I guess only the "ASSERT(e.sa_p2mt != p2m_ioreq_server);" is
> new, right?

Yes, that's right -- the whole block was indented wrong; when I added
the ASSERT() my editor aligned it correctly, and I couldn't very well
leave it that way. :-)

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

References:
- [Xen-devel] [PATCH v12 0/6] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type.
  - From: Yu Zhang
- [Xen-devel] [PATCH v12 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  - From: Yu Zhang
- Re: [Xen-devel] [PATCH v12 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  - From: Jan Beulich
- Re: [Xen-devel] [PATCH v12 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  - From: Yu Zhang
- Re: [Xen-devel] [PATCH v12 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  - From: Jan Beulich
- Re: [Xen-devel] [PATCH v12 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  - From: Yu Zhang
- Re: [Xen-devel] [PATCH v12 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  - From: Jan Beulich
- Re: [Xen-devel] [PATCH v12 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  - From: George Dunlap
- Re: [Xen-devel] [PATCH v12 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  - From: Yu Zhang

Prev by Date: Re: [Xen-devel] [PATCH 0/5] Towards a restartable oxenstored
Next by Date: Re: [Xen-devel] [PATCH v12 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
Previous by thread: Re: [Xen-devel] [PATCH v12 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
Next by thread: Re: [Xen-devel] [PATCH v12 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.