|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [PATCH 1/5] xen/common: introduce a new framework for save/restore of 'domain' context
> -----Original Message-----
[snip]
> > +
> > +#include <xen/save.h>
> > +
> > +struct domain_context {
> > + bool log;
> > + struct domain_save_descriptor desc;
> > + domain_copy_entry copy;
>
> As your new framework is technically an extension of existing one, it
> would be good to explain why we diverge in the definitions.
>
I don't follow. What is diverging? I explain in the commit comment that this is
a parallel framework. Do I need to justify why it is not a carbon copy of the
HVM one?
> > + void *priv;
> > +};
> > +
> > +static struct {
> > + const char *name;
> > + bool per_vcpu;
> > + domain_save_handler save;
> > + domain_load_handler load;
> > +} handlers[DOMAIN_SAVE_CODE_MAX + 1];
> > +
> > +void __init domain_register_save_type(unsigned int tc, const char *name,
> > + bool per_vcpu,
> > + domain_save_handler save,
> > + domain_load_handler load)
> > +{
> > + BUG_ON(tc > ARRAY_SIZE(handlers));
> > +
> > + ASSERT(!handlers[tc].save);
> > + ASSERT(!handlers[tc].load);
> > +
> > + handlers[tc].name = name;
> > + handlers[tc].per_vcpu = per_vcpu;
> > + handlers[tc].save = save;
> > + handlers[tc].load = load;
> > +}
> > +
> > +int domain_save_entry(struct domain_context *c, unsigned int tc,
> > + const char *name, const struct vcpu *v, void *src,
>
> I realize that 'src' is not const because how you define copy, however I
> would rather prefer if we rework the interface to avoid to keep the
> const in the definition. This may mean to have to define two callback
> rather than one.
That seems a bit ugly for the sake of a const but I guess I could create a
union with a copy_in and copy_out. I'll see how that looks.
>
> > + size_t src_len)
>
> On 64-bit architecture, size_t would be 64-bit. But the record is only
> storing 32-bit. Would it make sense to add some sanity check in the code?
>
True. Given this is new I think I'll just bump the length to 64-bits.
> > +{
> > + int rc;
> > +
> > + if ( c->log && tc != DOMAIN_SAVE_CODE(HEADER) &&
> > + tc != DOMAIN_SAVE_CODE(END) )
> > + gdprintk(XENLOG_INFO, "%pv save: %s (%lu)\n", v, name, src_len);
> > +
> > + if ( !IS_ALIGNED(src_len, 8) )
>
> Why not adding an implicit padding? This would avoid to chase error
> during saving because the len wasn't a multiple of 8.
>
I don't think implicit padding is worth it. This error should only happen if a
badly defined save record type is added to the code so perhaps I ought to add
an ASSERT here as well as deal with the error.
> > + return -EINVAL;
> > +
> > + BUG_ON(tc != c->desc.typecode);
> > + BUG_ON(v->vcpu_id != c->desc.instance);
>
> Does the descriptor really need to be filled by domain_save()? Can't it
> be done here, so we can avoid the two BUG_ON()s?
>
Yes it can but this serves as a sanity check that the save code is actually
doing what it should. Hence why these are BUG_ON()s and not error exits.
> > + c->desc.length = src_len;
> > +
> > + rc = c->copy(c->priv, &c->desc, sizeof(c->desc));
> > + if ( rc )
> > + return rc;
> > +
> > + return c->copy(c->priv, src, src_len);
> > +}
> > +
> > +int domain_save(struct domain *d, domain_copy_entry copy, void *priv,
> > + unsigned long mask, bool dry_run)
> > +{
> > + struct domain_context c = {
> > + .copy = copy,
> > + .priv = priv,
> > + .log = !dry_run,
> > + };
> > + struct domain_save_header h = {
> > + .magic = DOMAIN_SAVE_MAGIC,
> > + .version = DOMAIN_SAVE_VERSION,
> > + };
> > + struct domain_save_header e;
> > + unsigned int i;
> > + int rc;
> > +
> > + ASSERT(d != current->domain);
> > +
> > + if ( d->is_dying )
> > + return -EINVAL;
> > +
> > + domain_pause(d);
> > +
> > + c.desc.typecode = DOMAIN_SAVE_CODE(HEADER);
> > +
> > + rc = DOMAIN_SAVE_ENTRY(HEADER, &c, d->vcpu[0], &h, sizeof(h));
> > + if ( rc )
> > + goto out;
> > +
> > + for ( i = 0; i < ARRAY_SIZE(handlers); i++ )
>
> AFAIU, with this solution, if there are dependency between records, the
> dependencies will have to a lower "index". I think we may have some
> dependency with guest transparent migration such as we need to restore
> the event ABI before restoring the event channels.
>
Is that just a downstream implementation detail though? I would hope that there
are no ordering dependencies between records.
> Should we rely on the index for the dependency?
>
If we do need ordering dependencies then I guess it would need to be explicit.
> In any case, I think we want to document it.
>
I can certainly document that save handlers are invoked in code order.
> > + {
> > + domain_save_handler save = handlers[i].save;
> > +
> > + if ( (mask && !test_bit(i, &mask)) || !save )
>
> The type of mask suggests it is not possible to have more than 32
> different types of record if we wanted to be arch agnostic. Even if we
> focus on 64-bit arch, this is only 64 records.
>
> This is not very future proof, but might be ok if this is not exposed
> outside of the hypervisor (I haven't looked at the rest of the series
> yet). However, we at least need a BUILD_BUG_ON() in place. So please
> make sure DOMAIN_SAVE_CODE_MAX is always less than 64.
>
> Also what if there is a bit set in the mask and the record is not
> existing? Shouldn't we return an error?
>
TBH I think 32 will be enough... I would not expect this context space to grow
very much, if at all, once transparent migration is working. I think I'll just
drop the mask for now; it's not actually clear to me we'll make use of it...
just seemed like a nice-to-have.
> > + continue;
> > +
> > + memset(&c.desc, 0, sizeof(c.desc));
> > + c.desc.typecode = i;
> > +
> > + if ( handlers[i].per_vcpu )
> > + {
> > + struct vcpu *v;
> > +
> > + for_each_vcpu ( d, v )
> > + {
> > + c.desc.instance = v->vcpu_id;
> > +
> > + rc = save(v, &c, dry_run);
> > + if ( rc )
> > + goto out;
> > + }
> > + }
> > + else
> > + {
> > + rc = save(d->vcpu[0], &c, dry_run);
> > + if ( rc )
> > + goto out;
> > + }
> > + }
> > +
> > + memset(&c.desc, 0, sizeof(c.desc));
> > + c.desc.typecode = DOMAIN_SAVE_CODE(END);
> > +
> > + rc = DOMAIN_SAVE_ENTRY(END, &c, d->vcpu[0], &e, 0);
> > +
> > + out:
> > + domain_unpause(d);
> > +
> > + return rc;
> > +}
> > +
> > +int domain_load_entry(struct domain_context *c, unsigned int tc,
> > + const char *name, const struct vcpu *v, void *dst,
> > + size_t dst_len, bool exact)
> > +{
> > + int rc;
> > +
> > + if ( c->log && tc != DOMAIN_SAVE_CODE(HEADER) &&
> > + tc != DOMAIN_SAVE_CODE(END) )
> > + gdprintk(XENLOG_INFO, "%pv load: %s (%lu)\n", v, name, dst_len);
> > +
> > + BUG_ON(tc != c->desc.typecode);
> > + BUG_ON(v->vcpu_id != c->desc.instance);
>
> Is it really warrant to crash the host? What would happen if the values
> were wrong?
>
It would mean the code is fairly seriously buggy in that the load handler is
trying to load a record other than the type it registered for, or for a vcpu
other than the one it was passed.
> > +
> > + if ( (exact ?
> > + (dst_len != c->desc.length) : (dst_len < c->desc.length)) ||
>
> Using ternary in if is really confusing. How about:
>
> dst_len < c->desc.length || (exact && dst_len != c->desc.length) ||
>
> I understand that there would be two check for the exact case but I
> think it is better than a ternary.
I'm going to re-work this I think.
>
> However what is the purpose of the param 'exact'? If you set it to false
> how do you know the size you read?
The point of the exact parameter is that whether the caller can only consume a
record of the correct length, or whether it can handle an undersize one which
gets zero-padded. (It's the same as the zeroextend option in HVM records).
>
> > + !IS_ALIGNED(c->desc.length, 8) )
> > + return -EINVAL;
> > +
> > + rc = c->copy(c->priv, dst, c->desc.length);
> > + if ( rc )
> > + return rc;
> > +
> > + if ( !exact )
> > + {
> > + dst += c->desc.length;
> > + memset(dst, 0, dst_len - c->desc.length);
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +int domain_load(struct domain *d, domain_copy_entry copy, void *priv,
> > + unsigned long mask)
> > +{
> > + struct domain_context c = {
> > + .copy = copy,
> > + .priv = priv,
> > + .log = true,
> > + };
> > + struct domain_save_header h, e;
> > + int rc;
> > +
> > + ASSERT(d != current->domain);
> > +
> > + if ( d->is_dying )
> > + return -EINVAL;
>
> What does protect you against the doing dying right after this check?
>
Nothing. It's just to avoid doing pointless work if we can.
> > +
> > + rc = c.copy(c.priv, &c.desc, sizeof(c.desc));
> > + if ( rc )
> > + return rc;
> > +
> > + if ( c.desc.typecode != DOMAIN_SAVE_CODE(HEADER) ||
> > + c.desc.instance != 0 )
> > + return -EINVAL;
> > +
> > + rc = DOMAIN_LOAD_ENTRY(HEADER, &c, d->vcpu[0], &h, sizeof(h), true);
> > + if ( rc )
> > + return rc;
> > +
> > + if ( h.magic != DOMAIN_SAVE_MAGIC || h.version != DOMAIN_SAVE_VERSION )
> > + return -EINVAL;
> > +
> > + domain_pause(d);
> > +
> > + for (;;)
> > + {
> > + unsigned int i;
> > + domain_load_handler load;
> > + struct vcpu *v;
> > +
> > + rc = c.copy(c.priv, &c.desc, sizeof(c.desc));
> > + if ( rc )
> > + break;
> > +
> > + if ( c.desc.typecode == DOMAIN_SAVE_CODE(END) ) {
> > + rc = DOMAIN_LOAD_ENTRY(END, &c, d->vcpu[0], &e, 0, true);
> > + break;
> > + }
> > +
> > + rc = -EINVAL;
> > + if ( c.desc.typecode >= ARRAY_SIZE(handlers) ||
> > + c.desc.instance >= d->max_vcpus )
>
> Nothing in the documention suggests that c.desc.instance should be 0
> when the record is for the domain.
>
Ok. I'll put a comment somewhere.
> > + break;
> > +
> > + i = c.desc.typecode;
> > + load = handlers[i].load;
> > + v = d->vcpu[c.desc.instance];
> > +
> > + if ( mask && !test_bit(i, &mask) )
> > + {
> > + /* Sink the data */
> > + rc = c.copy(c.priv, NULL, c.desc.length);
> > + if ( rc )
> > + break;
> > +
> > + continue;
> > + }
> > +
> > + rc = load ? load(v, &c) : -EOPNOTSUPP;
> > + if ( rc )
> > + break;
> > + }
> > +
> > + domain_unpause(d);
> > +
> > + return rc;
> > +}
> > +
> > +/*
> > + * Local variables:
> > + * mode: C
> > + * c-file-style: "BSD"
> > + * c-basic-offset: 4
> > + * tab-width: 4
> > + * indent-tabs-mode: nil
> > + * End:
> > + */
> > diff --git a/xen/include/public/save.h b/xen/include/public/save.h
> > new file mode 100644
> > index 0000000000..84981cd0f6
> > --- /dev/null
> > +++ b/xen/include/public/save.h
> > @@ -0,0 +1,74 @@
> > +/*
> > + * save.h
> > + *
> > + * Structure definitions for common PV/HVM domain state that is held by
> > + * Xen and must be saved along with the domain's memory.
> > + *
> > + * Copyright Amazon.com Inc. or its affiliates.
> > + *
> > + * Permission is hereby granted, free of charge, to any person obtaining a
> > copy
> > + * of this software and associated documentation files (the "Software"), to
> > + * deal in the Software without restriction, including without limitation
> > the
> > + * rights to use, copy, modify, merge, publish, distribute, sublicense,
> > and/or
> > + * sell copies of the Software, and to permit persons to whom the Software
> > is
> > + * furnished to do so, subject to the following conditions:
> > + *
> > + * The above copyright notice and this permission notice shall be included
> > in
> > + * all copies or substantial portions of the Software.
> > + *
> > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
> > OR
> > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> > THE
> > + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> > + * DEALINGS IN THE SOFTWARE.
> > + */
> > +
> > +#ifndef __XEN_PUBLIC_SAVE_H__
> > +#define __XEN_PUBLIC_SAVE_H__
> > +
> > +#include "xen.h"
> > +
> > +/* Each entry is preceded by a descriptor */
> > +struct domain_save_descriptor {
> > + uint16_t typecode;
> > + uint16_t instance;
> > + /*
> > + * Entry length not including this descriptor. Entries must be padded
> > + * to a multiple of 8 bytes to make sure descriptors remain correctly
> > + * aligned.
> > + */
> > + uint32_t length;
> > +};
> > +
> > +/*
> > + * Each entry has a type associated with it. DECLARE_DOMAIN_SAVE_TYPE
> > + * binds these things together.
> > + */
> > +#define DECLARE_DOMAIN_SAVE_TYPE(_x, _code, _type) \
> > + struct __DOMAIN_SAVE_TYPE_##_x { _type t; char c[_code]; };
> > +
> > +#define DOMAIN_SAVE_TYPE(_x) \
> > + typeof (((struct __DOMAIN_SAVE_TYPE_##_x *)(0))->t)
> > +#define DOMAIN_SAVE_CODE(_x) \
> > + (sizeof (((struct __DOMAIN_SAVE_TYPE_##_x *)(0))->c))
> > +#define DOMAIN_SAVE_MASK(_x) (1u << DOMAIN_SAVE_CODE(_x))
> > +
> > +/* Terminating entry */
> > +struct domain_save_end {};
> > +DECLARE_DOMAIN_SAVE_TYPE(END, 0, struct domain_save_end);
> > +
> > +#define DOMAIN_SAVE_MAGIC 0x53415645
> > +#define DOMAIN_SAVE_VERSION 0x00000001
> > +
> > +/* Initial entry */
> > +struct domain_save_header {
> > + uint32_t magic; /* Must be DOMAIN_SAVE_MAGIC */
> > + uint32_t version; /* Save format version */
>
> Would it make sense to have the version of Xen in the stream?
>
Why? How would it help?
Paul
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |