|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] question about migration
On 12/24/2015 08:36 PM, Andrew Cooper wrote:
> On 24/12/15 02:29, Wen Congyang wrote:
>> Hi Andrew Cooper:
>>
>> I rebase the COLO codes to the newest upstream xen, and test it. I found
>> a problem in the test, and I can reproduce this problem via the migration.
>>
>> How to reproduce:
>> 1. xl cr -p hvm_nopv
>> 2. xl migrate hvm_nopv 192.168.3.1
>
> You are the very first person to try a usecase like this.
>
> It works as much as it does because of your changes to the uncooperative HVM
> domain logic. I have said repeatedly during review, this is not necessarily
> a safe change to make without an in-depth analysis of the knock-on effects;
> it looks as if you have found the first knock-on effect.
>
>>
>> The migration successes, but the vm doesn't run in the target machine.
>> You can get the reason from 'xl dmesg':
>> (XEN) HVM2 restore: VMCE_VCPU 1
>> (XEN) HVM2 restore: TSC_ADJUST 0
>> (XEN) HVM2 restore: TSC_ADJUST 1
>> (d2) HVM Loader
>> (d2) Detected Xen v4.7-unstable
>> (d2) Get guest memory maps[128] failed. (-38)
>> (d2) *** HVMLoader bug at e820.c:39
>> (d2) *** HVMLoader crashed.
>>
>> The reason is that:
>> We don't call xc_domain_set_memory_map() in the target machine.
>> When we create a hvm domain:
>> libxl__domain_build()
>> libxl__build_hvm()
>> libxl__arch_domain_construct_memmap()
>> xc_domain_set_memory_map()
>>
>> Should we migrate the guest memory from source machine to target machine?
>
> This bug specifically is because HVMLoader is expected to have run and turned
> the hypercall information in an E820 table in the guest before a migration
> occurs.
>
> Unfortunately, the current codebase is riddled with such assumption and
> expectations (e.g. the HVM save code assumed that FPU context is valid when
> it is saving register state) which is a direct side effect of how it was
> developed.
>
>
> Having said all of the above, I agree that your example is a usecase which
> should work. It is the ultimate test of whether the migration stream
> contains enough information to faithfully reproduce the domain on the far
> side. Clearly at the moment, this is not the case.
>
> I have an upcoming project to work on the domain memory layout logic, because
> it is unsuitable for a number of XenServer usecases. Part of that will
> require moving it in the migration stream.
I found another migration problem in the test:
If the migration fails, we will resume it in the source side.
But the hvm guest doesn't response any more.
In my test envirionment, the migration always successses, so I
use a hack way to reproduce it:
1. modify the target xen tools:
diff --git a/tools/libxl/libxl_stream_read.c b/tools/libxl/libxl_stream_read.c
index 258dec4..da95606 100644
--- a/tools/libxl/libxl_stream_read.c
+++ b/tools/libxl/libxl_stream_read.c
@@ -767,6 +767,8 @@ void libxl__xc_domain_restore_done(libxl__egc *egc, void
*dcs_void,
goto err;
}
+ rc = ERROR_FAIL;
+
err:
check_all_finished(egc, stream, rc);
2. xl cr hvm_nopv, and wait some time(You can login to the guest)
3. xl migrate hvm_nopv 192.168.3.1
The reason it that:
We create a default ioreq server when we get the hvm param HVM_PARAM_IOREQ_PFN.
It means that: the problem occurs only when the migration fails after we get
the hvm param HVM_PARAM_IOREQ_PFN.
In the function hvm_select_ioreq_server()
If the I/O will be handed by non-default ioreq server, we will return the
non-default ioreq server. In this case, it is handed by qemu.
If the I/O will not be handed by non-default ioreq server, we will return
the default ioreq server. Before migration, we return NULL, and after migration
it is not NULL.
See the caller is hvmemul_do_io():
case X86EMUL_UNHANDLEABLE:
{
struct hvm_ioreq_server *s =
hvm_select_ioreq_server(curr->domain, &p);
/* If there is no suitable backing DM, just ignore accesses */
if ( !s )
{
rc = hvm_process_io_intercept(&null_handler, &p);
vio->io_req.state = STATE_IOREQ_NONE;
}
else
{
rc = hvm_send_ioreq(s, &p, 0);
if ( rc != X86EMUL_RETRY || curr->domain->is_shutting_down )
vio->io_req.state = STATE_IOREQ_NONE;
else if ( data_is_addr )
rc = X86EMUL_OKAY;
}
break;
We send the I/O request to the default I/O request server, but no backing
DM hands it. We wil wait the I/O forever......
Thanks
Wen Congyang
>
> ~Andrew
>
>
> .
>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |