[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server





On 4/12/2016 12:31 AM, Jan Beulich wrote:
On 11.04.16 at 13:14, <yu.c.zhang@xxxxxxxxxxxxxxx> wrote:
On 4/9/2016 6:28 AM, Jan Beulich wrote:
On 31.03.16 at 12:53, <yu.c.zhang@xxxxxxxxxxxxxxx> wrote:
@@ -168,13 +226,72 @@ static int hvmemul_do_io(
           break;
       case X86EMUL_UNHANDLEABLE:
       {
-        struct hvm_ioreq_server *s =
-            hvm_select_ioreq_server(curr->domain, &p);
+        struct hvm_ioreq_server *s;
+        p2m_type_t p2mt;
+
+        if ( is_mmio )
+        {
+            unsigned long gmfn = paddr_to_pfn(addr);
+
+            (void) get_gfn_query_unlocked(currd, gmfn, &p2mt);
+
+            switch ( p2mt )
+            {
+                case p2m_ioreq_server:
+                {
+                    unsigned long flags;
+
+                    p2m_get_ioreq_server(currd, &flags, &s);

As the function apparently returns no value right now, please avoid
the indirection on both values you're after - one of the two
(presumably s) can be the function's return value.

Well, current implementation of p2m_get_ioreq_server() has spin_lock/
spin_unlock surrounding the reading of flags and the s, but I believe
we can also use the s as return value.

The use of a lock inside the function has nothing to do with how it
returns values to the caller.


Agree. I'll use s as return value then.

           /* If there is no suitable backing DM, just ignore accesses */
           if ( !s )
           {
-            rc = hvm_process_io_intercept(&null_handler, &p);
+            switch ( p2mt )
+            {
+            case p2m_ioreq_server:
+            /*
+             * Race conditions may exist when access to a gfn with
+             * p2m_ioreq_server is intercepted by hypervisor, during
+             * which time p2m type of this gfn is recalculated back
+             * to p2m_ram_rw. mem_handler is used to handle this
+             * corner case.
+             */

Now if there is such a race condition, the race could also be with a
page changing first to ram_rw and then immediately further to e.g.
ram_ro. See the earlier comment about assuming the page to be
writable.


Thanks, Jan. After rechecking the code, I suppose the race condition
will not happen. In hvmemul_do_io(), get_gfn_query_unlocked() is used
to peek the p2mt for the gfn, but get_gfn_type_access() is called inside
hvm_hap_nested_page_fault(), and this will guarantee no p2m change shall
occur during the emulation.
Is this understanding correct?

Ah, yes, I think so. So the comment is misleading.


I'll remove the comment, together with the p2m_ram_rw case. Thanks. :)

+static int hvm_map_mem_type_to_ioreq_server(struct domain *d,
+                                            ioservid_t id,
+                                            hvmmem_type_t type,
+                                            uint32_t flags)
+{
+    struct hvm_ioreq_server *s;
+    int rc;
+
+    /* For now, only HVMMEM_ioreq_server is supported */
+    if ( type != HVMMEM_ioreq_server )
+        return -EINVAL;
+
+    if ( flags & ~(HVMOP_IOREQ_MEM_ACCESS_READ |
+                   HVMOP_IOREQ_MEM_ACCESS_WRITE) )
+        return -EINVAL;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server.lock);
+
+    rc = -ENOENT;
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server.list,
+                          list_entry )
+    {
+        if ( s == d->arch.hvm_domain.default_ioreq_server )
+            continue;
+
+        if ( s->id == id )
+        {
+            rc = p2m_set_ioreq_server(d, flags, s);
+            if ( rc == 0 )
+                gdprintk(XENLOG_DEBUG, "%u %s type HVMMEM_ioreq_server.\n",
+                         s->id, (flags != 0) ? "mapped to" : "unmapped from");

Why gdprintk()? I don't think the current domain is of much
interest here. What would be of interest is the subject domain.


s->id is not the domain_id, but id of the ioreq server.

That's understood. But gdprintk() itself logs the current domain,
which isn't as useful as the subject one.


Oh, I see. So the correct routine here should be dprintk(), right?

--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -132,6 +132,19 @@ static void ept_p2m_type_to_flags(struct p2m_domain
*p2m, ept_entry_t *entry,
               entry->r = entry->w = entry->x = 1;
               entry->a = entry->d = !!cpu_has_vmx_ept_ad;
               break;
+        case p2m_ioreq_server:
+            entry->r = !(p2m->ioreq.flags & P2M_IOREQ_HANDLE_READ_ACCESS);
+           /*
+            * write access right is disabled when entry->r is 0, but whether
+            * write accesses are emulated by hypervisor or forwarded to an
+            * ioreq server depends on the setting of p2m->ioreq.flags.
+            */
+            entry->w = (entry->r &&
+                        !(p2m->ioreq.flags & P2M_IOREQ_HANDLE_WRITE_ACCESS));
+            entry->x = entry->r;

Why would we want to allow instruction execution from such pages?
And with all three bits now possibly being clear, aren't we risking the
entries to be mis-treated as not-present ones?


Hah. You got me. Thanks! :)
Now I realized it would be difficult if we wanna to emulate the read
operations for HVM. According to Intel mannual, entry->r is to be
cleared, so should entry->w if we do not want ept misconfig. And
with both read and write permissions being forbidden, entry->x can be
set only on processors with EXECUTE_ONLY capability.
To avoid any entry to be mis-treated as not-present. We have several
solutions:
a> do not support the read emulation for now - we have no such usage
case;
b> add the check of p2m_t against p2m_ioreq_server in is_epte_present -
a bit weird to me.
Which one do you prefer? or any other suggestions?

That question would also need to be asked to others who had
suggested supporting both. I'd be fine with a, but I also don't view
b as too awkward.


According to Intel mannual, an entry is regarded as not present, if
bit0:2 is 0. So adding a p2m type check in is_epte_present() means we
will change its semantics, if this is acceptable(with no hurt to
hypervisor). I'd prefer option b>

Does anyone else have different suggestions other than b> ?


+    /*
+     * Each time we map/unmap an ioreq server to/from p2m_ioreq_server,
+     * we mark the p2m table to be recalculated, so that gfns which were
+     * previously marked with p2m_ioreq_server can be resynced.
+     */
+    p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw);

What does "resynced" here mean? I.e. I can see why this is wanted
when unmapping a server, but when mapping a server there shouldn't
be any such pages in the first place.


There shouldn't be. But if there is(misbehavior from the device model
side), it can be recalculated back to p2m_ram_rw(which is not quite
necessary as the unmapping case).

DM misbehavior should not result in such a problem - the hypervisor
should refuse any bad requests.

OK. I can add code to guarantee from hypervisor side no entries changed
to p2m_ioreq_server before the mapping happens.

B.R.
Yu

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.