[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 1/4] xen: report how much memory a domain has on each NUMA node



On mer, 2014-03-05 at 15:04 +0000, Jan Beulich wrote:
> >>> On 05.03.14 at 15:36, Dario Faggioli <dario.faggioli@xxxxxxxxxx> wrote:
> > --- a/xen/common/domctl.c
> > +++ b/xen/common/domctl.c
> > @@ -574,6 +574,51 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) 
> > u_domctl)
> >      }
> >      break;
> >  
> > +    case XEN_DOMCTL_numainfo:
> > +    {
> > +        uint32_t node, max_node_index, last_online_node;
> > +        xen_domctl_numainfo_t *ni = &op->u.numainfo;
> > +        uint64_t *memkb_on_node;
> > +        struct page_info *page;
> > +
> > +        /*
> > +         * We report back info about the min number of nodes between how
> > +         * much of them the caller can handle and the number of them that
> > +         * are actually online.
> > +         */
> > +        last_online_node = last_node(node_online_map);
> > +        max_node_index = min_t(uint32_t, ni->max_node_index, 
> > last_online_node);
> > +        ni->max_node_index = max_node_index;
> > +
> > +        ret = -ENOMEM;
> > +        memkb_on_node = xzalloc_array(uint64_t, max_node_index);
> 
> max_node_index + 1
> 
Ouch. Will fix, thanks.

> > +        if ( !memkb_on_node )
> > +            break;
> > +
> > +        spin_lock(&d->page_alloc_lock);
> > +        page_list_for_each(page, &d->page_list)
> 
> No new non-preemptable potentially long running loops like this,
> please. See XSA-77.
> 
Not super familiar with XSAs, but do you perhaps 45 ("Several long
latency operations are not preemptible",
xenbits.xenproject.org/xsa/advisory-45.html)? 77 seems to be about
"Hypercalls exposed to privilege rings 1 and 2 of HVM guests"...

In any case, I see what you mean, and Juergen was also pointing out that
this is going to take a lot. I of course agree, but was, when
implementing, not sure about what the best solution was.

I initially thought of, as Juergen says, instrumenting page allocation
and adding the accounting there. However, I think that means doing
something very similar to having a MAX_NUMNODES big array for each
domain (or are there other ways?).

I think such information may turn out handy for other things in the
future but, for now, it'd be adding it for the sake of this hypercall
only... Is that acceptable? Any other ideas? Am I missing something?

> > +        {
> > +            node = phys_to_nid((paddr_t)page_to_mfn(page) << PAGE_SHIFT);
> 
> Please don't open-code pfn_to_paddr()/page_to_maddr().
> 
Ok. BTW, for this, I looked at what dump_numa() in xen/arch/x86/numa.c
does.

Should we fix that to --and I mean wrt both open coding, and also the
non-preemptability? Or it doesn't matter in this case, as it's just a
debug key handler?

> > +            /* For nodes that are offline, don't touch the counter */
> > +            if ( node <= max_node_index && node_online(node) )
> 
> How can a node a running domain has memory on be offline?
> Looks like you want an assertion here instead.
> 
Right. I did put this here to be on the safe side, and I agree that an
ASSERT() is much more correct and effective for that. Will do.

> > +                memkb_on_node[node]++;
> > +        }
> > +        spin_unlock(&d->page_alloc_lock);
> > +
> > +        for ( node = 0; node <= max_node_index; node++ )
> > +        {
> > +            memkb_on_node[node] <<= (PAGE_SHIFT-10);
> > +            if ( copy_to_guest_offset(ni->memkb_on_node, node,
> > +                                      &memkb_on_node[node], 1) )
> > +                break;
> 
> I'd suggest to do the copying in one go after the loop.
> 
Also ok.

> > --- a/xen/include/public/domctl.h
> > +++ b/xen/include/public/domctl.h
> > @@ -315,6 +315,26 @@ typedef struct xen_domctl_max_vcpus 
> > xen_domctl_max_vcpus_t;
> >  DEFINE_XEN_GUEST_HANDLE(xen_domctl_max_vcpus_t);
> >  
> >  
> > +/* XEN_DOMCTL_numainfo */
> > +struct xen_domctl_numainfo {
> > +    /*
> > +     * IN: maximum addressable entry in the caller-provided arrays.
> > +     * OUT: minimum between the maximum addressable entry in the
> > +     *      caller-provided arrays and largest online node identifier
> > +     *      in the system.
> > +     */
> > +    uint32_t max_node_index;
> 
> With that IN/OUT specification (and the matching implementation
> above), how would the caller know it passed too small a value to
> fit all information?
> 
It doesn't. I designed it to be similar to XEN_SYSCTL_numainfo, which
retrieves _system_wide_ NUMA information. Should I use a new, OUT only,
parameter for the required number of elements (or index of the largest),
so that the caller can compare the twos and figure things out?

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.