[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Strange interdependace between domains



On gio, 2014-02-13 at 16:56 +0000, Simon Martin wrote:
> Hi all,
> 
Hey Simon!

First of all, as you're using ARINC, I'm adding Nate, as he's ARINC's
maintainer, let's see if he can help us! ;-P

> I  am  now successfully running my little operating system inside Xen.
> It  is  fully  preemptive and working a treat, 
>
Aha, this is great! :-)

> but I have just noticed
> something  I  wasn't expecting, and will really be a problem for me if
> I can't work around it.
> 
Well, let's see...

> My configuration is as follows:
> 
> 1.- Hardware: Intel i3, 4GB RAM, 64GB SSD.
> 
> 2.- Xen: 4.4 (just pulled from repository)
> 
> 3.- Dom0: Debian Wheezy (Kernel 3.2)
> 
> 4.- 2 cpu pools:
> 
> # xl cpupool-list
> Name               CPUs   Sched     Active   Domain count
> Pool-0               3    credit       y          2
> pv499                1  arinc653       y          1
> 
Ok, I think I figured this out from the other information, but it would
be useful to know what pcpus are assigned to what cpupool. I think it's
`xl cpupool-list -c'.

> 5.- 2 domU:
> 
> # xl list
> Name                                        ID   Mem VCPUs      State   
> Time(s)
> Domain-0                                     0   984     3     r-----      
> 39.7
> win7x64                                      1  2046     3     -b----     
> 143.0
> pv499                                        3   128     1     -b----      
> 61.2
> 
> 6.- All VCPUs are pinned:
> 
Right, although, if you use cpupools, and if I've understood what you're
up to, you really should not require pinning. I mean, the isolation
between the RT-ish domain and the rest of the world should be already in
place thanks to cpupools.

Actually, pinning can help, but meybe not in the exact way you're using
it...

> # xl vcpu-list
> Name                                ID  VCPU   CPU State   Time(s) CPU 
> Affinity
> Domain-0                             0     0    0   -b-      27.5  0
> Domain-0                             0     1    1   -b-       7.2  1
> Domain-0                             0     2    2   r--       5.1  2
> win7x64                              1     0    0   -b-      71.6  0
> win7x64                              1     1    1   -b-      37.7  1
> win7x64                              1     2    2   -b-      34.5  2
> pv499                                3     0    3   -b-      62.1  3
> 
...as it can be seen here.

So, if you ask me, you're restricting too much things in pool-0, where
dom0 and the Windows VM runs. In fact, is there a specific reason why
you need all their vcpus to be statically pinned each one to only one
pcpu? If not, I'd leave them a little bit more of freedom.

What I'd try is:
 1. all dom0 and win7 vcpus free, so no pinning in pool0.
 2. pinning as follows:
     * all vcpus of win7 --> pcpus 1,2
     * all vcpus of dom0 --> no pinning
   this way, what you get is the following: win7 could suffer sometimes,
   if all its 3 vcpus gets busy, but that, I think is acceptable, at
   least up to a certain extent, is that the case?
   At the same time, you
   are making sure dom0 always has a chance to run, as pcpu#0 would be
   his exclusive playground, in case someone, including your pv499
   domain, needs its services.

> 7.- pv499 is the domU that I am testing. It has no disk or vif devices
> (yet). I am running a little test program in pv499 and the timing I
> see is varies depending on disk activity.
> 
> My test program runs prints up the time taken in milliseconds for a
> million cycles. With no disk activity I see 940 ms, with disk activity
> I see 1200 ms.
> 
Wow, it's very hard to tell. What I first thought is that your domain
may need something from dom0, and the suboptimal (IMHO) pinning
configuration you're using could be slowing that down. The bug in this
theory is that dom0 services are mostly PV drivers for disk and network,
which you say you don't have...

I still think your pinning setup is unnecessary restrictive, so I'd give
it a try, but it's probably not the root cause of your issue.

> I can't understand this as disk activity should be running on cores 0,
> 1  and 2, but never on core 3. The only thing running on core 3 should
> by my paravirtual machine and the hypervisor stub.
> 
Right. Are you familiar with tracing what happens inside Xen with
xentrace and, perhaps, xenalyze? It takes a bit of time to get used to
it but, once you dominate it, it is a good mean for getting out really
useful info!

There is a blog post about that here:
http://blog.xen.org/index.php/2012/09/27/tracing-with-xentrace-and-xenalyze/
and it should have most of the info, or the links to where to find them.

It's going to be a lot of data, but if you trace one run without disk IO
and one run with disk IO, it should be doable to compare the
differences, for instance, in terms of when the vcpus of your domain are
active, as well as when they get scheduled, and from that we hopefully
can try to narrow down a bit more the real root cause of the thing.

Let us know if you think you need help with that.

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.