[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] VPMU interrupt unreliability



On Thu, Oct 19, 2017 at 11:40 AM, Andrew Cooper
<andrew.cooper3@xxxxxxxxxx> wrote:
>
> On 19/10/17 16:09, Kyle Huey wrote:
> > On Wed, Oct 11, 2017 at 7:09 AM, Boris Ostrovsky
> > <boris.ostrovsky@xxxxxxxxxx> wrote:
> >> On 10/10/2017 12:54 PM, Kyle Huey wrote:
> >>> On Mon, Jul 24, 2017 at 9:54 AM, Kyle Huey <me@xxxxxxxxxxxx> wrote:
> >>>> On Mon, Jul 24, 2017 at 8:07 AM, Boris Ostrovsky
> >>>> <boris.ostrovsky@xxxxxxxxxx> wrote:
> >>>>>>> One thing I noticed is that the workaround doesn't appear to be
> >>>>>>> complete: it is only checking PMC0 status and not other counters 
> >>>>>>> (fixed
> >>>>>>> or architectural). Of course, without knowing what the actual problem
> >>>>>>> was it's hard to say whether this was intentional.
> >>>>>> handle_pmc_quirk appears to loop through all the counters ...
> >>>>> Right, I didn't notice that it is shifting MSR_CORE_PERF_GLOBAL_STATUS
> >>>>> value one by one and so it is looking at all bits.
> >>>>>
> >>>>>>>> 2. Intercepting MSR loads for counters that have the workaround
> >>>>>>>> applied and giving the guest the correct counter value.
> >>>>>>> We'd have to keep track of whether the counter has been reset (by the
> >>>>>>> quirk) since the last MSR write.
> >>>>>> Yes.
> >>>>>>
> >>>>>>>> 3. Or perhaps even changing the workaround to disable the PMI on that
> >>>>>>>> counter until the guest acks via GLOBAL_OVF_CTRL, assuming that works
> >>>>>>>> on the relevant hardware.
> >>>>>>> MSR_CORE_PERF_GLOBAL_OVF_CTRL is written immediately after the quirk
> >>>>>>> runs (in core2_vpmu_do_interrupt()) so we already do this, don't we?
> >>>>>> I'm suggesting waiting until the *guest* writes to the (virtualized)
> >>>>>> GLOBAL_OVF_CTRL.
> >>>>> Wouldn't it be better to wait until the counter is reloaded?
> >>>> Maybe!  I haven't thought through it a lot.  It's still not clear to
> >>>> me whether MSR_CORE_PERF_GLOBAL_OVF_CTRL actually controls the
> >>>> interrupt in any way or whether it just resets the bits in
> >>>> MSR_CORE_PERF_GLOBAL_STATUS and acking the interrupt on the APIC is
> >>>> all that's required to reenable it.
> >>>>
> >>>> - Kyle
> >>> I wonder if it would be reasonable to just remove the workaround
> >>> entirely at some point.  The set of people using 1) several year old
> >>> hardware, 2) an up to date Xen, and 3) the off-by-default performance
> >>> counters is probably rather small.
> >> We'd probably want to only enable this for affected processors, not
> >> remove it outright. But the problem is that we still don't know for sure
> >> whether this issue affects NHM only, do we?
> >>
> >> (https://lists.xenproject.org/archives/html/xen-devel/2017-07/msg02242.html
> >> is the original message)
> > Yes, the basic problem is that we don't know where to draw the line.
>
> vPMU is disabled by default for security reasons,


Is there any document about the possible attack via the vPMU? The
document I found (such as [1] and XSA-163) just briefly say that the
vPMU should be disabled due to security concern.


[1] https://xenbits.xen.org/docs/unstable/misc/xen-command-line.html

>
> and also broken, in a
> way which demonstrates that vPMU isn't getting much real-world use.

I also noticed that AWS seems support part of the vPMU
functionalities, which were used by Netflix to optimize their
applications' performance, according to
http://www.brendangregg.com/blog/2017-05-04/the-pmcs-of-ec2.html .

I guess the security issue should be solved by AWS? However, without
knowing how the attack could be conducted, I'm not sure how AWS avoids
the attack concern for vPMU.

>
> As far as I'm concerned, all options (including rm -rf and start from
> scratch) are acceptable, especially if this ends up giving us a better
> overall subsystem.
>
> Do we know how other hypervisors work around this issue?

Maybe the solution of AWS is a choice? I'm not sure. I'm just thinking aloud. :)

Thanks,

Meng

-- 
Meng Xu
Ph.D. Candidate in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.