Xen project Mailing List

Re: [PATCH v1 0/7] Implement support for external IPT monitoring

To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

From: Michał Leszczyński <michal.leszczynski@xxxxxxx>

Date: Thu, 18 Jun 2020 16:59:35 +0200 (CEST)

Cc: Kevin Tian <kevin.tian@xxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Jun Nakajima <jun.nakajima@xxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Tamas K Lengyel <tamas.k.lengyel@xxxxxxxxx>, Ian Jackson <ian.jackson@xxxxxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>

Delivery-date: Thu, 18 Jun 2020 15:00:25 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Thread-index: UCWaOVhIz1IuIRy8y9FmAQDSGi2Khw==

Thread-topic: Implement support for external IPT monitoring

----- 17 cze 2020 o 18:19, Andrew Cooper andrew.cooper3@xxxxxxxxxx napisał(a): > On 17/06/2020 04:02, Tamas K Lengyel wrote: >> On Tue, Jun 16, 2020 at 2:17 PM Andrew Cooper <andrew.cooper3@xxxxxxxxxx> >> wrote: >>> On 16/06/2020 19:47, Michał Leszczyński wrote: >>>> ----- 16 cze 2020 o 20:17, Andrew Cooper andrew.cooper3@xxxxxxxxxx >>>> napisał(a): >>>> >>>>> Are there any restrictions on EPT being enabled in the first place? I'm >>>>> not aware of any, and in principle we could use this functionality for >>>>> PV guests as well (using the CPL filter). Therefore, I think it would >>>>> be helpful to not tie the functionality to HVM guests, even if that is >>>>> the only option enabled to start with. >>>> I think at the moment it's not required to have EPT. This patch series >>>> doesn't >>>> use any translation feature flags, so the output address is always a >>>> machine >>>> physical address, regardless of context. I will check if it could be easily >>>> used with PV. >>> If its trivial to add PV support then please do. If its not, then don't >>> feel obliged, but please do at least consider how PV support might look >>> in the eventual feature. >>> >>> (Generally speaking, considering "how would I make this work in other >>> modes where it is possible" leads to a better design.) >>> >>>>> The buffer mapping and creation logic is fairly problematic. Instead of >>>>> fighting with another opencoded example, take a look at the IOREQ >>>>> server's use of "acquire resource" which is a mapping interface which >>>>> supports allocating memory on behalf of the guest, outside of the guest >>>>> memory, for use by control tools. >>>>> >>>>> I think what this wants is a bit somewhere in domain_create to indicate >>>>> that external tracing is used for this domain (and allocate whatever >>>>> structures/buffers are necessary), acquire resource to map the buffers >>>>> themselves, and a domctl for any necessary runtime controls. >>>>> >>>> I will check this out, this sounds like a good option as it would remove >>>> lots of >>>> complexity from the existing ipt_enable domctl. >>> Xen has traditionally opted for a "and turn this extra thing on >>> dynamically" model, but this has caused no end of security issues and >>> broken corner cases. >>> >>> You can see this still existing in the difference between >>> XEN_DOMCTL_createdomain and XEN_DOMCTL_max_vcpus, (the latter being >>> required to chose the number of vcpus for the domain) and we're making >>> good progress undoing this particular wart (before 4.13, it was >>> concerning easy to get Xen to fall over a NULL d->vcpu[] pointer by >>> issuing other hypercalls between these two). >>> >>> There is a lot of settings which should be immutable for the lifetime of >>> the domain, and external monitoring looks like another one of these. >>> Specifying it at createdomain time allows for far better runtime >>> behaviour (you are no longer in a situation where the first time you try >>> to turn tracing on, you end up with -ENOMEM because another VM booted in >>> the meantime and used the remaining memory), and it makes for rather >>> more simple code in Xen itself (at runtime, you can rely on it having >>> been set up properly, because a failure setting up will have killed the >>> domain already). >> I'm not in favor of this being a flag that gets set during domain >> creation time. It could certainly be the case that some users would >> want this being on from the start till the end but in other cases you >> may want to enable it intermittently only for some time in-between >> particular events. If it's an on/off flag during domain creation you >> pretty much force that choice on the users and while the overhead of >> PT is better than say MTF it's certainly not nothing. In case there is >> an OOM situation enabling IPT dynamically the user can always just >> pause the VM and wait till memory becomes available. > > There is nothing wrong with having "turn tracing on/off at runtime" > hypercalls. It is specifically what I suggested two posts up in this > thread, but it should be limited to the TraceEn bit in RTIT_CTL. > > What isn't ok is trying to allocate the buffers, write the TOPA, etc on > first-enable or first-map, because the runtime complexity of logic like > this large, and far too easy to get wrong in security relevant ways. > > The domain create flag would mean "I wish to use tracing with this > domain", and not "I want tracing enabled from the getgo". > I'm trying to implement this suggestion right now. I've already switched to acquire_resource and now I want to make buffers statically allocated on domain creation. I think it would be reasonable to add an option like "vmtrace_ipt_size" in xl.cfg. By default it would be 0 (meaning "disabled"), and when it's set to non-zero value by the user, the IPT buffers of given size will be allocated for each vCPU upon domain creation. Could you give some hints about how to correctly add a new option in xl.cfg in a way that it's accessible on the hypervisor part? This part related to configuration parsing/processing is what I don't understand yet. >>>>> What semantics do you want for the buffer becoming full? Given that >>>>> debugging/tracing is the goal, I presume "pause vcpu on full" is the >>>>> preferred behaviour, rather than drop packets on full? >>>>> >>>> Right now this is a ring-style buffer and when it would become full it >>>> would >>>> simply wrap and override the old data. >>> How does the consumer spot that the data has wrapped? What happens if >>> data starts getting logged, but noone is listening? What happens if the >>> consumer exits/crashes/etc and stops listening as a consequence? >>> >>> It's fine to simply state what will happen, and possibly even "don't do >>> that then", but the corner cases do at least need thinking about. >> AFAIU the current use-case is predominantly to be used in conjunction >> with VMI events where you want to be able to see the trace leading up >> to a particular vmexit. So in the case when the buffer is wrapped >> in-between events and data is lost that's not really of concern. > > That's all fine. I imagine the output here is voluminous, and needs > help being cut down as much as possible. > > On a tangent, I presume you'd like to include VM-fork eventually, which > ought to include copying the trace buffer on fork? > > ~Andrew

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.