Hi, folks, I have a question about the default number
of PIRQs of Domain 0 in Xen 4.x. I encountered a problem that
cciss.ko, the HP Smart Array driver, freezed and had the system
hanged at booting time. The server is HP Proliant DL385G5p and run
a CentOS 5.6 dom0 with Xen 4.1.1. However everything goes well
when with Xen 3.0.3 which CentOS had officially integrated in. We
upgraded to Xen 4.x so as to make use of Remus.
I debugged and guessed that an IRQ of the HP RAID
controller missing but could not figure out why. At last
I compiled and tried and all revisions from 3.4.3 (r19995) to
4.0.0 (r20789), using a binary search method in about 10 times,
and located some changes between r20142 and r20143 were the
point. The code changes were:
-static
unsigned int extra_dom0_irqs, extra_domU_irqs = 8;
+static
unsigned int extra_dom0_irqs = 256, extra_domU_irqs = 32;
static void __init parse_extra_guest_irqs(const
char *s)
{
if ( isdigit(*s) )
@@ -253,9 +253,11 @@
d->is_paused_by_controller = 1;
atomic_inc(&d->pause_count);
-
d->nr_pirqs = (nr_irqs_gsi +
-
(domid ? extra_domU_irqs :
-
extra_dom0_irqs ?: nr_irqs_gsi));
+
if ( domid )
+
d->nr_pirqs = nr_irqs_gsi + extra_domU_irqs;
+
else
+
d->nr_pirqs = nr_irqs_gsi + extra_dom0_irqs;
d->pirq_to_evtchn =
xmalloc_array(u16, d->nr_pirqs);
d->pirq_mask = xmalloc_array(
unsigned long,
BITS_TO_LONGS(d->nr_pirqs));
In the changes I noticed the extra_dom0_irqs, which should be
0 by default in r20142, was set to 256 in r20143, and caused
default number of Dom0's nr_pirq to exceed 256. Maybe this
prevent IRQ of HP RAID controller, I don't quite know about,
though. After I set it to 32 (the same number as
extra_guest_irqs) the cciss.ko worked well. Although I could set
this value by "extra_guest_irqs=32,32" in boot param, there are
still problem:
1. The argument for dom0 extra irqs, the one after the comma,
is undocumented.
2. What is the reason of the magic number 256 for Dom0, and
32 for DomU in Xen 4.x by default? nr_irqs_gsi is only 16 on x64
arch, but the total nr_pirq would be more than 256. The magic
number still exists in the newest code. This is bad hardcode and
may cause very elusive fault for newbie user, maybe you can have
a better solution.