Xen project Mailing List

Re: [Xen-devel] Strange interdependace between domains

To: Simon Martin <furryfuttock@xxxxxxxxx>

From: Dario Faggioli <dario.faggioli@xxxxxxxxxx>

Date: Thu, 13 Feb 2014 18:36:55 +0100

Cc: Nate Studer <nate.studer@xxxxxxxxxxxxxxx>, xen-devel@xxxxxxxxxxxxx

Delivery-date: Thu, 13 Feb 2014 17:37:14 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On gio, 2014-02-13 at 16:56 +0000, Simon Martin wrote: > Hi all, > Hey Simon! First of all, as you're using ARINC, I'm adding Nate, as he's ARINC's maintainer, let's see if he can help us! ;-P > I am now successfully running my little operating system inside Xen. > It is fully preemptive and working a treat, > Aha, this is great! :-) > but I have just noticed > something I wasn't expecting, and will really be a problem for me if > I can't work around it. > Well, let's see... > My configuration is as follows: > > 1.- Hardware: Intel i3, 4GB RAM, 64GB SSD. > > 2.- Xen: 4.4 (just pulled from repository) > > 3.- Dom0: Debian Wheezy (Kernel 3.2) > > 4.- 2 cpu pools: > > # xl cpupool-list > Name CPUs Sched Active Domain count > Pool-0 3 credit y 2 > pv499 1 arinc653 y 1 > Ok, I think I figured this out from the other information, but it would be useful to know what pcpus are assigned to what cpupool. I think it's `xl cpupool-list -c'. > 5.- 2 domU: > > # xl list > Name ID Mem VCPUs State > Time(s) > Domain-0 0 984 3 r----- > 39.7 > win7x64 1 2046 3 -b---- > 143.0 > pv499 3 128 1 -b---- > 61.2 > > 6.- All VCPUs are pinned: > Right, although, if you use cpupools, and if I've understood what you're up to, you really should not require pinning. I mean, the isolation between the RT-ish domain and the rest of the world should be already in place thanks to cpupools. Actually, pinning can help, but meybe not in the exact way you're using it... > # xl vcpu-list > Name ID VCPU CPU State Time(s) CPU > Affinity > Domain-0 0 0 0 -b- 27.5 0 > Domain-0 0 1 1 -b- 7.2 1 > Domain-0 0 2 2 r-- 5.1 2 > win7x64 1 0 0 -b- 71.6 0 > win7x64 1 1 1 -b- 37.7 1 > win7x64 1 2 2 -b- 34.5 2 > pv499 3 0 3 -b- 62.1 3 > ...as it can be seen here. So, if you ask me, you're restricting too much things in pool-0, where dom0 and the Windows VM runs. In fact, is there a specific reason why you need all their vcpus to be statically pinned each one to only one pcpu? If not, I'd leave them a little bit more of freedom. What I'd try is: 1. all dom0 and win7 vcpus free, so no pinning in pool0. 2. pinning as follows: * all vcpus of win7 --> pcpus 1,2 * all vcpus of dom0 --> no pinning this way, what you get is the following: win7 could suffer sometimes, if all its 3 vcpus gets busy, but that, I think is acceptable, at least up to a certain extent, is that the case? At the same time, you are making sure dom0 always has a chance to run, as pcpu#0 would be his exclusive playground, in case someone, including your pv499 domain, needs its services. > 7.- pv499 is the domU that I am testing. It has no disk or vif devices > (yet). I am running a little test program in pv499 and the timing I > see is varies depending on disk activity. > > My test program runs prints up the time taken in milliseconds for a > million cycles. With no disk activity I see 940 ms, with disk activity > I see 1200 ms. > Wow, it's very hard to tell. What I first thought is that your domain may need something from dom0, and the suboptimal (IMHO) pinning configuration you're using could be slowing that down. The bug in this theory is that dom0 services are mostly PV drivers for disk and network, which you say you don't have... I still think your pinning setup is unnecessary restrictive, so I'd give it a try, but it's probably not the root cause of your issue. > I can't understand this as disk activity should be running on cores 0, > 1 and 2, but never on core 3. The only thing running on core 3 should > by my paravirtual machine and the hypervisor stub. > Right. Are you familiar with tracing what happens inside Xen with xentrace and, perhaps, xenalyze? It takes a bit of time to get used to it but, once you dominate it, it is a good mean for getting out really useful info! There is a blog post about that here: http://blog.xen.org/index.php/2012/09/27/tracing-with-xentrace-and-xenalyze/ and it should have most of the info, or the links to where to find them. It's going to be a lot of data, but if you trace one run without disk IO and one run with disk IO, it should be doable to compare the differences, for instance, in terms of when the vcpus of your domain are active, as well as when they get scheduled, and from that we hopefully can try to narrow down a bit more the real root cause of the thing. Let us know if you think you need help with that. Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.