[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-ia64-devel] Any hint about a weird behavior about scheduler?


  • To: "Keir Fraser" <Keir.Fraser@xxxxxxxxxxxx>
  • From: "Tian, Kevin" <kevin.tian@xxxxxxxxx>
  • Date: Wed, 25 Jan 2006 11:36:36 +0800
  • Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
  • Delivery-date: Wed, 25 Jan 2006 03:45:19 +0000
  • List-id: Discussion of the ia64 port of Xen <xen-ia64-devel.lists.xensource.com>
  • Thread-index: AcYhYItPyvGm0f+vRga13WlrEENmgA==
  • Thread-topic: Any hint about a weird behavior about scheduler?

Hi, Keir,
        I'm seeing a strange phenomenon when running VTI domain on
latest xen-ia64-unstable.hg. It's definite some IA64 specific bug, but
hope you may have some hint for it. ;-)

        Sometimes when VTI domain is accessing MMIO, flow goes to
do_block and then __enter_scheduler and delete from runqueue. Before
__enter_scheduler, everything is verified OK.

        Later when ioreq is serviced and VTI domain is woken up, flow
resumes to the point after __enter_scheduler in do_block (XEN/IA46
doesn't reset stack pointer and stack is per-vp). However a check at
that point shows VTI domain is still off the runqueue with next pointer
as null. Both stack, current pointer and other control registers are the
very content of VTI domain, except schedule_data[cpu].curr points to
dom0.

        Then next schedule will throw out assertion, since current
running VTI domain is not on the runqueue. domU runs well, maybe because
there're more context shared and no block happens there. Actually the
error point is random, and I do observe many ioreq emulated
successfully.

        The possible reason seems from your recent change to reduce lock
critical region in __enter_scheduler, where spin_unlock_irq is promoted
to the point before context_switch. That's obvious good, however to
leave a small window with interrupt enabled which may not be handled
correctly by current ia64 code. To move spin_unlock_irq backward after
context_switch, everything works well then.

        So Keir, have you seen any similar phenomenon per your
experience before? If yes, how about the cause? That may be different as
IA64, which however provide invaluable hint to help track down bogus
code on IA64.

P.S.
        - BVT is the default scheduler along this issue
        - The patch I sent out to disable interrupt in context_switch is
for another random issue, which can't fix current one.
        - I'll be in vocation for Chinese New Year from today til
Feb.13. So there'll be no mail check to track this issue. So sorry if
your helpful answer is there however without my follow-up. ;-) BTW, if
anyone else can reproduce it, it would be helpful to track it down.

Thanks,
Kevin

_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.