| [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
 Re: [PATCH 5/5] xen/xsm: Address hypercall ABI problems
 
 
On 17/11/2021 23:14, Andrew Cooper wrote:
 
On 08/11/2021 09:50, Jan Beulich wrote:
 
On 05.11.2021 14:55, Andrew Cooper wrote:
 Currently, __HYPERVISOR_xsm_op enters xsm_ops.do_{xsm,compat}_op() 
which means
that if any other XSM module registers a handler, we'll break the 
hypercall
ABI totally by having the behaviour depend on the selected XSM module.
There are 2 options:
1) Rename __HYPERVISOR_xsm_op to __HYPERVISOR_flask_op.  If 
another XSM 
     module wants hypercalls, they'll have to introduce a new top-level
     hypercall.
2) Make the current flask_op() be common, and require new 
hypercalls to fit 
     compatibly with xen_flask_op_t.  This can be done fairly easily by
     subdividing the .cmd space.
In this patch, we implement option 2.
Move the stub {do,compat}_xsm_op() implementation into a new 
xsm_op.c so we
can more easily do multi-inclusion compat magic.  Also add a new 
private.h,
because flask/hook.c's local declaration of {do,compat}_flask_op() 
wasn't 
remotely safe.
The top level now dispatches into {do,compat}_flask_op() based on 
op.cmd, and 
handles the primary copy in/out.
 
I'm not convinced this is the only reasonable way of implementing 2).
I could also see number space to be separated in different ways, ...
 
Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
---
CC: Daniel De Graaf <dgdegra@xxxxxxxxxxxxx>
CC: Daniel Smith <dpsmith@xxxxxxxxxxxxxxxxxxxx>
Only lightly tested.  Slightly RFC.  There are several things which 
aren't 
great, but probably want addressing in due course.
1) The public headers probably want to lose the flask name (in a 
compatible 
     way), now that the hypercall is common.  This probably wants to be
     combined with no longer taking a void handle.
 
... leaving per-module public headers and perhaps merely adding an
abstracting
struct xen_xsm_op_t {
     uint32_t op;
     ... /* placeholder */
};
or (making explicit one possible variant of number space splitting)
union xen_xsm_op_t {
     uint32_t op;
     struct {
         uint16_t cmd;
         uint16_t mod;
     } u;
     ... /* placeholder */
};
in, say, a new public/xsm.h.
 
That doesn't work.  The ABI is fixed at sizeof(xen_flask_op_t) for all 
other modules, because a caller which doesn't know which module is in 
use must not have Xen over-read/write the object passed while it's 
trying to figure things out. 
 
As a result I consider this change either going too far (because of
not knowing future needs) or not far enough (by e.g. leaving
do_xsm_op() to use xen_flask_op_t.
 
Well - what it does is prevent someone from breaking the ABI with 
literally this patch 
diff --git a/xen/xsm/silo.c b/xen/xsm/silo.c
index 3550dded7b4e..36b82fd3bd3e 100644
--- a/xen/xsm/silo.c
+++ b/xen/xsm/silo.c
@@ -100,6 +100,11 @@ static int silo_argo_send(const struct domain 
*d1, const struct domain *d2) 
 #endif
+static long silo_do_xsm_op(XEN_GUEST_HANDLE_PARAM(void) op)
+{
+    /* ... */
+}
+
 static const struct xsm_ops __initconstrel silo_xsm_ops = {
     .evtchn_unbound = silo_evtchn_unbound,
     .evtchn_interdomain = silo_evtchn_interdomain,
@@ -110,6 +115,7 @@ static const struct xsm_ops __initconstrel 
silo_xsm_ops = {
     .argo_register_single_source = silo_argo_register_single_source,
     .argo_send = silo_argo_send,
 #endif
+    .do_xsm_op = silo_do_xsm_op,
 };
 const struct xsm_ops *__init silo_init(void)
This is a damage limitation patch, but without knowing how the next 
module is going to want to do hypercalls differently, it is rather 
hard to judge what is appropriate.  I certainly didn't waste much time 
thinking about it. 
 
   2) {do,compat}_xsm_op() are currently identical other than the 
dispatched-to
     functions because the size of xsm_flask_op_t is invariant with
COMPAT-ness.  We could simplfy things by only having one, and 
dispatching
     to {do,compat}_*_op() directly, but I'm not sure whether the 
complexity is 
     worth it.
 
Perhaps not, I would say, not the least because (as said above) I
think we shouldn't put in place restrictions which may get in the
way of adding some future module.
Extending struct xen_flask_op to become a generic XSM interface
structure (or even just for Flask's own purposes) also is not as
straightforward as it might seem: There's currently no provision
for sub-structs which would grow the overall size of the structure:
The copy_{to,from}_guest() invocations for existing sub-ops may not
copy more that the present worth of sizeof(struct xen_flask_op).
Yet your implementation of do_xsm_op() moves this deficiency from
Flask to XSM.
 
Deficiency yes, but necessary to avoid breaking the ABI for caller 
which doesn't know which module is in use.  This cannot be fixed 
without swapping to approach 1. 
   3) Bloat-o-meter says these changes add 16 extra bytes to dm_op() 
and I can't
     figure out what could possibly be causing this.
 
Without comparing the object files in closer detail it's guesswork,
but might this be register scheduling differences resulting from
the changed sizeof(struct xsm_ops)? I've been observing similar
seemingly unmotivated changes to generated code ...
 
The only thing it can plausibly be is a knock-on effect from the 
structure of xsm_ops changing. 
Normally, I wouldn't be to surprised to see some displacement fields 
dropping from 4 to 1 byte as xsm_ops is getting smaller, but 
everything is alt_call()'d up so should be a 7-byte `call *disp32(%rip)`. 
 
--- a/xen/xsm/flask/flask_op.c
+++ b/xen/xsm/flask/flask_op.c
@@ -22,6 +22,8 @@
  #include <objsec.h>
  #include <conditional.h>
  +#include "../private.h"
 
Kind of odd: I'd expect a file named such to not get included
across directory levels, unless a single component was split in
such a way (to me Flask and XSM core are separate, yet still
related components).
 
Its all a tangled mess because logically separating XSM and Flask was 
a task done when SILO was introduced. 
There is not an appropriately located file (under xen/xsm/ ) where the 
prototypes could reasonably live, and this felt like the lesser of the 
available evils.
 
I guess it is worth adding, so we're all on the same page.
The thing I actually need to do is fix the fact that the prototypes for 
{do,compat}_flask_op() are local in xen/xsm/flask/hooks.c and not in a 
header shared with xen/xsm/flask/flask_op.c. 
The thing I wanted to do is stop it being so trivially easy to break the 
ABI with a 5-line patch. 
Any bikeshedding beyond that can happen in due course.
~Andrew
 
 |