[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Strange interdependace between domains

On 2/13/2014 12:36 PM, Dario Faggioli wrote:
> On gio, 2014-02-13 at 16:56 +0000, Simon Martin wrote:
>> Hi all,
> Hey Simon!
> First of all, as you're using ARINC, I'm adding Nate, as he's ARINC's
> maintainer, let's see if he can help us! ;-P
>> I  am  now successfully running my little operating system inside Xen.
>> It  is  fully  preemptive and working a treat, 
> Aha, this is great! :-)
>> but I have just noticed
>> something  I  wasn't expecting, and will really be a problem for me if
>> I can't work around it.
> Well, let's see...
>> My configuration is as follows:
>> 1.- Hardware: Intel i3, 4GB RAM, 64GB SSD.
>> 2.- Xen: 4.4 (just pulled from repository)
>> 3.- Dom0: Debian Wheezy (Kernel 3.2)
>> 4.- 2 cpu pools:
>> # xl cpupool-list
>> Name               CPUs   Sched     Active   Domain count
>> Pool-0               3    credit       y          2
>> pv499                1  arinc653       y          1
> Ok, I think I figured this out from the other information, but it would
> be useful to know what pcpus are assigned to what cpupool. I think it's
> `xl cpupool-list -c'.
>> 5.- 2 domU:
>> # xl list
>> Name                                        ID   Mem VCPUs      State   
>> Time(s)
>> Domain-0                                     0   984     3     r-----      
>> 39.7
>> win7x64                                      1  2046     3     -b----     
>> 143.0
>> pv499                                        3   128     1     -b----      
>> 61.2
>> 6.- All VCPUs are pinned:
> Right, although, if you use cpupools, and if I've understood what you're
> up to, you really should not require pinning. I mean, the isolation
> between the RT-ish domain and the rest of the world should be already in
> place thanks to cpupools.
> Actually, pinning can help, but meybe not in the exact way you're using
> it...
>> # xl vcpu-list
>> Name                                ID  VCPU   CPU State   Time(s) CPU 
>> Affinity
>> Domain-0                             0     0    0   -b-      27.5  0
>> Domain-0                             0     1    1   -b-       7.2  1
>> Domain-0                             0     2    2   r--       5.1  2
>> win7x64                              1     0    0   -b-      71.6  0
>> win7x64                              1     1    1   -b-      37.7  1
>> win7x64                              1     2    2   -b-      34.5  2
>> pv499                                3     0    3   -b-      62.1  3
> ...as it can be seen here.
> So, if you ask me, you're restricting too much things in pool-0, where
> dom0 and the Windows VM runs. In fact, is there a specific reason why
> you need all their vcpus to be statically pinned each one to only one
> pcpu? If not, I'd leave them a little bit more of freedom.
> What I'd try is:
>  1. all dom0 and win7 vcpus free, so no pinning in pool0.
>  2. pinning as follows:
>      * all vcpus of win7 --> pcpus 1,2
>      * all vcpus of dom0 --> no pinning
>    this way, what you get is the following: win7 could suffer sometimes,
>    if all its 3 vcpus gets busy, but that, I think is acceptable, at
>    least up to a certain extent, is that the case?
>    At the same time, you
>    are making sure dom0 always has a chance to run, as pcpu#0 would be
>    his exclusive playground, in case someone, including your pv499
>    domain, needs its services.
>> 7.- pv499 is the domU that I am testing. It has no disk or vif devices
>> (yet). I am running a little test program in pv499 and the timing I
>> see is varies depending on disk activity.
>> My test program runs prints up the time taken in milliseconds for a
>> million cycles. With no disk activity I see 940 ms, with disk activity
>> I see 1200 ms.
> Wow, it's very hard to tell. What I first thought is that your domain
> may need something from dom0, and the suboptimal (IMHO) pinning
> configuration you're using could be slowing that down. The bug in this
> theory is that dom0 services are mostly PV drivers for disk and network,
> which you say you don't have...

  Any shared resource between domains could cause one domain to affect the
timing of another domain:  shared cache, shared memory controller, interrupts,
shared I/O interface, domain-0, etc...

  Giving the small size and nature of your test application, the cache, which
Ian mentioned, is a likely culprit, but if your application gets more complex
some of these other sources could show up.

  What kind of variation (jitter) on this measurement can your application 

  You can never eliminate all the jitter, but you can get rid of a lot of it by
carefully partitioning your system.  The pinning suggested by Dario looks to be
a good step towards this goal.


> I still think your pinning setup is unnecessary restrictive, so I'd give
> it a try, but it's probably not the root cause of your issue.
>> I can't understand this as disk activity should be running on cores 0,
>> 1  and 2, but never on core 3. The only thing running on core 3 should
>> by my paravirtual machine and the hypervisor stub.
> Right. Are you familiar with tracing what happens inside Xen with
> xentrace and, perhaps, xenalyze? It takes a bit of time to get used to
> it but, once you dominate it, it is a good mean for getting out really
> useful info!
> There is a blog post about that here:
> http://blog.xen.org/index.php/2012/09/27/tracing-with-xentrace-and-xenalyze/
> and it should have most of the info, or the links to where to find them.
> It's going to be a lot of data, but if you trace one run without disk IO
> and one run with disk IO, it should be doable to compare the
> differences, for instance, in terms of when the vcpus of your domain are
> active, as well as when they get scheduled, and from that we hopefully
> can try to narrow down a bit more the real root cause of the thing.
> Let us know if you think you need help with that.
> Regards,
> Dario

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.