[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] xenbus and the message of doom

On 16.12.2011 12:33, Olaf Hering wrote:
> On Thu, Dec 15, Konrad Rzeszutek Wilk wrote:
>> On Thu, Dec 15, 2011 at 08:20:23PM +0100, Stefan Bader wrote:
>>> I was investigating a bug report[1] about newer kernels (>3.1) not booting 
>>> as
>>> HVM guests on Amazon EC2. For some reason git bisect did give the some 
>>> pain, but
>>> it lead me at least close and with some crash dump data I think I figured 
>>> the
>>> problem.
>> Stefan, thanks for finding this.
>> Olaf, what are your thoughts? Should I prep a patch to revert the patch
>> below and then we can work on 3.3 and rethink this in 3.3? The clock is
>> ticking for 3.2 and there is not much runway to fix stuff.
> Sometimes guest changes expose bugs in the host. Its my understanding
> that hosts should be kept uptodate so that it can serve both old and new
> guests well.

That would be the ideal world. Unfortunately, in reality, hosts stick to a
particular version and maybe get updated with what is provided as stable or
security updates.

> In my testing with Xen4 based hosts their xenstored did properly ignore
> the new command.

I can only take what evidence I see publicly on running EC2 instances. The
closest we can get there is something Redhat or CentOS based with a variation of
3.x based hypervisors (my test box is keeping CentOS 5.6 and Xen 3.4.3 to verify
bugs). Checking against our Xen host reintroduced with 11.10, there is no
problem either.

> I proposed several ways to get rid of existing watches, but finally we
> came to the conclusion that a new xenstored command would be the
> cleanest way.
> Wether adding a timeout is a good idea has to be decided. I can imagine
> that a busy host may take some time to respond to guest commands.
> Perhaps we should figure out what exactly EC2 is using as host and why
> it only breaks with upstream kernels. So far I havent received reports
> for SLES11 guests. SP1 got an update recently, so their HVM guests would
> have seen the hang as well. The not yet released SP2 sends
> XS_RESET_WATCHES as well since quite some time.

As said before, it is hard to say exactly. We can even see variations of Xen 3.x
(I can at least remember having seen three versions ranging from 3.0 to 3.4.3).
So to a certain degree there are updates going on. But it is impossible to say
what and when. But definitely the problem was still there with 3.4.3 (I stopped
updating the test box because I was rather glad to have a state that exhibits
most of the weirdness that happens in the cloud).

> Olaf

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.