[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Livepatching and Xen Security

On 18/05/17 17:40, George Dunlap wrote:
> There are four general areas I think there may be bugs.
> ## Unprivileged access to Livepatching hypercalls
> ## Bugs in the patch creation tools which create patches with vulnerabilities
> ## Bugs in the patch-application code such that vulnerabilities exist
> after application
> ## Bugs which allow a guest to prevent the application of a livepatch
> # Testing status and requirements
> I'm told we already test that unprivileged guests cannot access the
> Livepatch hypercalls in osstest; if so, that aspect should should be
> covered.

Specifically, http://xenbits.xen.org/docs/xtf/test-livepatch-priv-check.html

(There is a docs bug I have noticed while grabbing that link, which I
have just pushed a fix for.  The live docs will be updated whenever cron
next runs.)

(I'd also like to take this opportunity to highlight an issue which
became apparent while writing that test; unstable hypercall ABIs,
wherever they reside, make this kind of testing prone to false negatives.)

> All that's needed would be for vendors to describe what kinds of
> testing they have done for Livepatching.  I think there are two
> factors which come into play:
> 1. Having tested live-patching thoroughly for at least some version of
> the codebase
> 2. Having tested live-patching for one of the Xen 4.9 RCs.
> Thoughts?

As a statement of what XenServer is doing:

XenServer 7.1 is based on Xen 4.7 and we are providing livepatches
(where applicable) with hypervisor hotfixes.  Thus far, XSAs 204, 207,
212-215 have been included in livepatch form, as well as a number of
other general bugfixes which were save to livepatch.

For both these hotfixes, we had to bugfix the livepatch creation tools
to generate a livepatch.  We also had to modify the XSA 213 patch to
create a livepatch.  The (pre 4.8) Xen code was buggy and used the
.fixup section when it should have used .text.unlikley, which caused the
livepatch tools to fail a cross-check of the exception frame references
when building the patch.

Thus, we are 0 for 2 on the tools being able to DTRT when given a set of
real-world fixes.

Independent of this, the nature of what qualifies as "a correct patch"
is subjective and very context dependent.  Consider a scenario with two
users, the same version of the livepatch tools, an identical source
patch, and an identical source version of Xen.  There is a very real
possibility that these two users could get one valid and one invalid
patch based solely on something like the compiler settings used to build
the hypervisor they are patching.

From an XSA point of view, we do not want to be issuing advisories
saying "If you are on OS $A, with livepatch tools $B, Hypervisor $C
compiled with these specific build options, then trying to create a
livepatch for patch $D will appear to work properly but leave a timebomb
in your hypervisor".  ISTR an issue which hit during development was
where CentOS releasing an minor update to GCC and caused chaos by
altering how the string literals got sorted.

What if a user creates a livepatch for a change which isn't remotely
safe to livepatch, uploads it, and their hypervisor goes bang?  This
would qualify under the definition of "correct" in so far as the patch
was correctly doing what it was told, and thus, fall within the security
criteria presented here.

There is already a very high user requirement in the first place to
evaluate whether patches are safe to livepatch.  This includes
interaction with other livepatches, interactions with patches in the
vendors patch queue, interaction with customer hardware, and there is no
way this can be decided automatically.

Therefore, I think it would be a mistake for us to include anything
pertaining to "creating a livepatch, correct or otherwise" within a
support statement.  There are many variables which we as upstream can't

As for the 4th point, about what a guest can do to prevent application
of a livepatch.

The default timeout is insufficient to quiesce Xen if a VM with a few
VCPUs is migrating.  In this scenario, I believe p2m_lock contention is
the underlying reason, but the point stands that there are plenty of
things a guest can do to prevent Xen being able to suitably quiesce.

As a host administrator attempting to apply the livepatch, you get
informed that Xen failed to quiesce and the livepatch application
failed.  Options range from upping the timeout on the next patching
attempt, to possibly even manually pausing the troublesome VM for a second.

I also think it unwise to consider any scenarios like this within the
security statement, otherwise we will have to issue an XSA stating
"Guests doing normal unprivileged things can cause Xen to be
insufficient quiescent to apply livepatches with the deliberately
conservative defaults".  What remediation would we suggest for this?

On the points of unexpected access to the hypercalls, and Xen doing the
wrong thing when presented with a legitimate correct livepatch, I think
these are in principle fine for inclusion within a support statement.

I would ask however how confident we are that there are no ELF parsing
bugs in the code?  I think it might be very prudent to try and build a
userspace harness for it and let ALF have a go.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.