[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 3/3] [GSOC14] FIX:- Race condition between initializing shared ring and mempaging.

On 21/07/14 15:11, Andres Lagar Cavilla wrote:
On Mon, Jul 21, 2014 at 5:01 AM, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
On 21/07/14 08:31, Andres Lagar Cavilla wrote:
On Sun, Jul 20, 2014 at 5:49 PM, Dushyant Behl <myselfdushyantbehl@xxxxxxxxx> wrote:

Sorry for this extremely late response.

On Fri, Jun 27, 2014 at 4:03 PM, Ian Campbell <Ian.Campbell@xxxxxxxxxx> wrote:
> On Mon, 2014-06-16 at 23:50 +0530, Dushyant Behl wrote:
>> This patch is part of the work done under the gsoc project -
>> Lazy Restore Using Memory Paging.
>> This patch is meant to fix a known race condition bug in mempaging
>> ring setup routines. The race conditon was between initializing
> "condition"
>> mem paging and initializing shared ring, earlier the code initialized
>> mem paging before removing the ring page from guest's physical map
>> which could enable the guest to interfere with the ring initialisation.
>> Now the code removes the page from the guest's physical map before
>> enabling mempaging so that the guest cannot clobber the ring after
>> we initialise it.
>> Signed-off-by: Dushyant Behl <myselfdushyantbehl@xxxxxxxxx>
>> Reviewed-by: Andres Lagar-Cavilla <andres@xxxxxxxxxxxxxxxx>
>> Reviewed-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>> ---
>> Âtools/libxc/xc_mem_paging_setup.c | 24 ++++++++++++++++++++----
>> Â1 file changed, 20 insertions(+), 4 deletions(-)
>> diff --git a/tools/libxc/xc_mem_paging_setup.c b/tools/libxc/xc_mem_paging_setup.c
>> index 679c606..a4c4798 100644
>> --- a/tools/libxc/xc_mem_paging_setup.c
>> +++ b/tools/libxc/xc_mem_paging_setup.c
>> @@ -73,6 +73,24 @@ int xc_mem_paging_ring_setup(xc_interface *xch,
>> Â Â Â Â Â}
>> Â Â Â}
>> + Â Â/*
>> + Â Â * Remove the page from guest's physical map so that the guest will not
>> + Â Â * be able to break security by pretending to be toolstack for a while.
>> + Â Â */
>> + Â Âif ( xc_domain_decrease_reservation_exact(xch, domain_id, 1, 0, ring_page) )
> ring_page here is a void * mapping of the page, isn't it? Not a
> xen_pfn_t[] array of pfns, so surely this isn't right.

Yes, you are correct and the invocation should be with ring_pfn. But
if I change this with
the invocation done below as -
if ( xc_domain_decrease_reservation_exact(xch, domain_id, 1, 0, &ring_pfn) )Â

then the code fails to enable mem paging.
Can anyone help as to why this would happen?
The call that enables paging will use the hvm param in the hypervisor to look up the page by its pfn within the guest physical address space. If you call decrease reservation, then the page is removed from the guest phys address space, and the hvm param will point to a hole. And the call will fail of course. So you need to call decrease reservation strictly after enable paging.

You cannot have a period of time where the paging/access/events are enabled and the guest can interfere with the ring page, as the Xen side of ring pages are not robust to untrusted input.
It's deja vu all over again!

Note that the XSA commmit, 6ae2df93c27, does exactly that. Enable paging/access/sharing, and only after that decrease reservation (and after that unpause). So the window is open ... methinks?

No - it does a domain pause around this set of critical operations, so the guest is guaranteed not to be running, and therefore cannot interfere.

At some point the assertion "xen side ... not robust" has to be declared false, because it's either FUD, or untenable to ship a hypervisor that can be crashed by guest input. You fixed an important bug with the vcpu run off, so I'll refrain from saying "I don't see any problems there". But I think I don't see any problems there ....

My patches have not been committed so there known (and well documented) holes in the Xen side of the interface. Therefore, the assertion is completely correct at the moment. The situation will certainly change (for the better) once my patches are taken.

I would like to hope that I got all the issues, but I did not performed an extensive analysis of the interface. I discovered the issues when reviewing the bitdefender code which copied the vcpu overrun bug, and then discovered the pause_count issue when fixing the vcpu overrun.

I got all the issues I could spot, given no specific knowledge of the mem_event stuff. However, I feel it would be naive to assume that the rest of the interface is secure. (This might well be my particularity-pessimistic attitude to security, but it does tend to be more reliable than the optimistic attitude.) A proper audit of the interface is required as part of resolving XSA-77.

How does this series interact with 6ae2df93 ?
Likely not a problem, you are referring to pause count? Looks like different things.

Let me rephrase. Does this series do anything which 6ae2df93 (which is now committed) didn't do? It would certainly appear that 6ae2df93 does the vast majority of what this series is attempting to do.

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.