[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [XenPPC] [PATCH] [RFC] Xencomm patch to fix modules



I agree with most of Hollis's comments, but have some of my own.

First, I do not think that the implementation of is_phys_contiguous() answers the question in its name and IMNSHO is bogus. Perhaps something like: mm/sparse.c vaddr_in_vmalloc_area 232 static int vaddr_in_vmalloc_area(void *addr)

And use if (!vaddr_in_vmalloc_area)

More importantly... (and this needs to be addressed before the patch is accepted)
I like the "map" name change, but not sure about "early".
There are 2 reasons why we had mini/inline/early:
1) because the allocator was not ready, I think this applies to a small number of hcalls 2) we cannot "sleep" (in_interrupt()) and the allocator sleeps, Mainly evtchn related and console.

So early covers (1) but (2) will be problematic, I noted the ones below that may reflect (2), I for one, have not been diligent in commenting why mini/inline is actually used and I think we need to do so.

So giving this some more thought, and jerking you around even more (sorry), aside from using inline to optimize this:

 - We can detect (1) with slab_is_available() and use mini
 - We can detect (2) with in_interrupt() and try GFP_ATOMIC then mini.

Not a request, but something to think about.
Mini, has always bugged me, it seems to me that this could be satisfied by a per_cpu static, or preallocated buffer to for the xen descriptor. This may not be viable because it assumes no interrupts, and though for asynchronous interrupts this may be a safe assumption, if one were to suffer a series of page faults (from a wild pointer in this path) we would have a silent hang, which is never good.


More on the patch below.

On Jan 17, 2007, at 10:20 PM, Jerone Young wrote:

I haven't had a chance to fully test this patch. As I have been having
problems with my blade today. But here is the code for review. I'll get
back to the list once it is fully tested.

Make sure you run some VIO workloads to make sure we get the in_interrupt() cases.


Signed-off-by: Jerone Young <jyoung5@xxxxxxxxxx>

diff -r e5962db17966 arch/powerpc/platforms/xen/gnttab.c
--- a/arch/powerpc/platforms/xen/gnttab.c Wed Jan 17 10:25:30 2007 -0600 +++ b/arch/powerpc/platforms/xen/gnttab.c Tue Jan 16 23:50:28 2007 -0600
@@ -263,8 +263,9 @@ int HYPERVISOR_grant_table_op(unsigned i
                memcpy(&setup, op, sizeof(setup));
                argsize = sizeof(setup);

-               frame_list = xencomm_create_inline(
-                       xen_guest_handle(setup.frame_list));
+               frame_list = xencomm_map(
+                       xen_guest_handle(setup.frame_list),
+                       (sizeof(ulong) * setup.nr_frames));

                set_xen_guest_handle(setup.frame_list, frame_list);
                memcpy(op, &setup, sizeof(setup));
@@ -286,7 +287,7 @@ int HYPERVISOR_grant_table_op(unsigned i
                return -ENOSYS;
        }

I believe this is an in_interrupt() case.

-       desc = xencomm_create_inline(op);
+       desc = xencomm_map(op, argsize);

        ret = plpar_hcall_norets(XEN_MARK(__HYPERVISOR_grant_table_op), cmd,
                                 desc, count);
diff -r e5962db17966 arch/powerpc/platforms/xen/hcall.c
--- a/arch/powerpc/platforms/xen/hcall.c Wed Jan 17 10:25:30 2007 -0600 +++ b/arch/powerpc/platforms/xen/hcall.c Wed Jan 17 00:17:17 2007 -0600
@@ -50,11 +50,16 @@
* In general, we need a xencomm descriptor to cover the top-level data
  * structure (e.g. the domctl op), plus another for every embedded
pointer to
  * another data structure (i.e. for every GUEST_HANDLE).
+ *
+ * Some hypercalls are made before the memory subsystem is up, so
instead of
+ * calling xencomm_create(), we allocate XENCOMM_MINI_AREA bytes from
the stack
+ * to hold the xencomm descriptor.
  */

 int HYPERVISOR_console_io(int cmd, int count, char *str)
 {
-       void *desc = xencomm_create_inline(str);
+
+       void *desc = xencomm_map_early(str, count);
Early and interrupt


        return plpar_hcall_norets(XEN_MARK(__HYPERVISOR_console_io),
                                  cmd, count, desc);
@@ -63,7 +68,8 @@ EXPORT_SYMBOL(HYPERVISOR_console_io);

 int HYPERVISOR_event_channel_op(int cmd, void *op)
 {
-       void *desc = xencomm_create_inline(op);
+
+       void *desc = xencomm_map_early(op, sizeof(evtchn_op_t));

interrupt


        return plpar_hcall_norets(XEN_MARK(__HYPERVISOR_event_channel_op),
                                cmd, desc);
@@ -120,8 +126,8 @@ EXPORT_SYMBOL(HYPERVISOR_xen_version);

 int HYPERVISOR_xen_version(int cmd, void *arg)
 {
-       if (is_kernel_addr((unsigned long)arg)) {
-               void *desc = xencomm_create_inline(arg);
+       if (is_phys_contiguous((unsigned long)arg)) {
+               void *desc = xencomm_map_early(arg, sizeof(__u64));
                return plpar_hcall_norets(XEN_MARK(__HYPERVISOR_xen_version), 
cmd,
desc);
        }
        return HYPERVISOR_xen_version_userspace(cmd, arg);
@@ -129,7 +135,7 @@ int HYPERVISOR_xen_version(int cmd, void

 int HYPERVISOR_physdev_op(int cmd, void *op)
 {
-       void *desc = xencomm_create_inline(op);
+       void *desc = xencomm_map(op, sizeof(physdev_op_t));

interrupt


        return plpar_hcall_norets(XEN_MARK(__HYPERVISOR_physdev_op),
                                cmd, desc);
@@ -139,6 +145,7 @@ int HYPERVISOR_sched_op(int cmd, void *a
 int HYPERVISOR_sched_op(int cmd, void *arg)
 {
        struct xencomm_desc *desc;
+       int argsize = 0;

        switch (cmd) {
        case SCHEDOP_yield:
@@ -151,24 +158,29 @@ int HYPERVISOR_sched_op(int cmd, void *a
                evtchn_port_t *ports;
                struct sched_poll sched_poll;

+               argsize = sizeof(struct sched_poll);
+
                memcpy(&sched_poll, arg, sizeof(sched_poll));

-               ports = xencomm_create_inline(
-                       xen_guest_handle(sched_poll.ports));
+               ports = xencomm_map(
+                       xen_guest_handle(sched_poll.ports),
+                       (sizeof(evtchn_port_t) * sched_poll.nr_ports));
Early
                set_xen_guest_handle(sched_poll.ports, ports);
                memcpy(arg, &sched_poll, sizeof(sched_poll));
-               
+
        }
                break;
        case SCHEDOP_shutdown:
+               argsize = sizeof(struct sched_shutdown);
        case SCHEDOP_remote_shutdown:
+               argsize = sizeof(struct sched_remote_shutdown);
                break;
        default:
                printk(KERN_ERR "%s: unknown sched op %d\n", __func__, cmd);
                return -ENOSYS;
        }

-       desc = xencomm_create_inline(arg);
+       desc = xencomm_map(arg, argsize);
not sure which case this is.


        return plpar_hcall_norets(XEN_MARK(__HYPERVISOR_sched_op),
                                cmd, desc);
diff -r e5962db17966 drivers/xen/core/xencomm.c
--- a/drivers/xen/core/xencomm.c        Wed Jan 17 10:25:30 2007 -0600
+++ b/drivers/xen/core/xencomm.c        Wed Jan 17 01:42:11 2007 -0600
@@ -123,9 +123,87 @@ void *xencomm_create_inline(void *ptr)
 {
        unsigned long paddr;

-       BUG_ON(!is_kernel_addr((unsigned long)ptr));
-
        paddr = (unsigned long)xencomm_pa(ptr);
        BUG_ON(paddr & XENCOMM_INLINE_FLAG);
        return (void *)(paddr | XENCOMM_INLINE_FLAG);
 }
+
See the comment below that should remove this function entirely, Hollis please ACK.
+/* "mini" routines, for stack-based communications: */
+static void *xencomm_alloc_mini(void *area, int arealen)
+{
+       unsigned long base = (unsigned long)area;
+       unsigned int left_in_page;
+
+       left_in_page = PAGE_SIZE - base % PAGE_SIZE;
+
+       /* we probably fit right at the front of area */
+       if (left_in_page >= sizeof(struct xencomm_mini)) {
+               return area;
+       }
+
+       /* if not, see if area is big enough to advance to the next page */
+       if ((arealen - left_in_page) >= sizeof(struct xencomm_mini))
+               return (void *)(base + left_in_page);
+
+       /* area was too small */
+       return NULL;
+}
+
+int xencomm_create_mini(void *area, int arealen, void *buffer,
+               unsigned long bytes, struct xencomm_desc **ret)
+{
+       struct xencomm_desc *desc;
+       int rc;
+       
+       desc = xencomm_alloc_mini(area, arealen);
+       if (!desc)
+               return -ENOMEM;
+       desc->nr_addrs = XENCOMM_MINI_ADDRS;
+
+       rc = xencomm_init(desc, buffer, bytes);
+       if (rc)
+               return rc;
+
+       *ret = desc;
+       return 0;
+}
+
+void *xencomm_map(void *ptr, unsigned long bytes)
+{
+       int rc;
+       struct xencomm_desc *desc;
+
+       if (is_phys_contiguous((unsigned long)ptr))
+               return xencomm_create_inline(ptr);
+       
+       rc =  xencomm_create(ptr, bytes, &desc, GFP_KERNEL);
+
+       if (rc)
+               return NULL;
+       
+       return xencomm_pa(desc);
+}
+
+void *__xencomm_map_early(void *ptr, unsigned long bytes,
+                       char *xc_area)
+{
+       int rc;
+       struct xencomm_desc *desc;
+
+       if (is_phys_contiguous((unsigned long)ptr))
+               return xencomm_create_inline(ptr);
+
+       rc = xencomm_create_mini(xc_area, XENCOMM_MINI_AREA,ptr, bytes,
+                               &desc);
+
+       if (rc)
+               return NULL;
+       
+       return (void*)__pa(desc);
+}
+
+/* check if is physically contiguous memory */
+int is_phys_contiguous(unsigned long addr)
+{
+       return (addr < VMALLOC_START) || (addr >= VMALLOC_END);
+}
diff -r e5962db17966 include/xen/xencomm.h
--- a/include/xen/xencomm.h     Wed Jan 17 10:25:30 2007 -0600
+++ b/include/xen/xencomm.h     Wed Jan 17 01:38:30 2007 -0600
@@ -23,10 +23,27 @@

 #include <xen/interface/xencomm.h>

+#define XENCOMM_MINI_ADDRS 3
+struct xencomm_mini {
+       struct xencomm_desc _desc;
+       uint64_t address[XENCOMM_MINI_ADDRS];
+};
+#define XENCOMM_MINI_AREA (sizeof(struct xencomm_mini) * 2)
+
 extern int xencomm_create(void *buffer, unsigned long bytes,
                          struct xencomm_desc **desc, gfp_t type);
 extern void xencomm_free(struct xencomm_desc *desc);
 extern void *xencomm_create_inline(void *ptr);
+extern int xencomm_create_mini(void *area, int arealen, void *buffer,
+                       unsigned long bytes, struct xencomm_desc **ret);
+extern void *xencomm_map(void *ptr, unsigned long bytes);
+extern void *__xencomm_map_early(void *ptr, unsigned long bytes,
+                               char *xc_area);
+extern int is_phys_contiguous(unsigned long addr);
+

Since we are completely automating MINI we can simplify (remove) all the alignment logic by asking gcc for some help.

+#define xencomm_map_early(ptr, bytes) \
+       ({char xc_area[XENCOMM_MINI_AREA];\
        ({char xc_area[XENCOMM_MINI_AREA] \
                __attribute__((__aligned__(sizeof(struct xencomm_mini))));\

In fact it can now be a the struct instead of a char.

+        __xencomm_map_early(ptr, bytes, xc_area);})

 /* provided by architecture code: */
 extern unsigned long xencomm_vtop(unsigned long vaddr);



_______________________________________________
Xen-ppc-devel mailing list
Xen-ppc-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ppc-devel


_______________________________________________
Xen-ppc-devel mailing list
Xen-ppc-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ppc-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.