[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen 4.5 random freeze question



On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> Hi Stefano,
> 
> Thank you for your support.
> 
> You are right - with latest change you've proposed I got a continuous
> prints during platform hang:
> 
> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
> 
> Looks line issue needs further deeper debugging.

Cool! You could simply print what irqs are in all LRs when they are
full, for example you could call gic_dump_info. That would tell us what
is taking all the LRs space we have.

How many LRs are available on omap5 anyway?

I doubt you have so much interrupt traffic to actually fill all the LRs,
so I am thinking that a few LRs might not be cleared properly (that
should happen on hypervisor entry, gic_update_one_lr should take care of
it).


> Regards,
> Andrii
> 
> On Tue, Nov 18, 2014 at 7:51 PM, Stefano Stabellini
> <stefano.stabellini@xxxxxxxxxxxxx> wrote:
> > Hello Andrii,
> > we are getting closer :-)
> >
> > It would help if you post the output with GIC_DEBUG defined but without
> > the other change that "fixes" the issue.
> >
> > I think the problem is probably due to software irqs.
> > You are getting too many
> >
> > gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is still lr_pending
> >
> > messages. That means you are loosing virtual SGIs (guest VCPU to guest
> > VCPU). It would be best to investigate why, especially if you get many
> > more of the same messages without the MAINTENANCE_IRQ change I
> > suggested.
> >
> > This patch might also help understading the problem more:
> >
> >
> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> > index b7516c0..5eaeca2 100644
> > --- a/xen/arch/arm/gic.c
> > +++ b/xen/arch/arm/gic.c
> > @@ -717,7 +717,12 @@ static void gic_restore_pending_irqs(struct vcpu *v)
> >      list_for_each_entry_safe ( p, t, &v->arch.vgic.lr_pending, lr_queue )
> >      {
> >          i = find_first_zero_bit(&this_cpu(lr_mask), nr_lrs);
> > -        if ( i >= nr_lrs ) return;
> > +        if ( i >= nr_lrs )
> > +        {
> > +            gdprintk(XENLOG_DEBUG, "LRs full, not injecting irq=%u into 
> > d%dv%d\n",
> > +                    p->irq, v->domain->domain_id, v->vcpu_id);
> > +            continue;
> > +        }
> >
> >          spin_lock_irqsave(&gic.lock, flags);
> >          gic_set_lr(i, p, GICH_LR_PENDING);
> >
> >
> >
> >
> > On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
> >> Hi Stefano,
> >>
> >> No hangs with this change.
> >> Complete log is the following:
> >>
> >> U-Boot SPL 2013.10-00499-g062782f (Oct 14 2014 - 11:36:26)
> >> DRA752 ES1.0
> >> <ethaddr> not set. Validating first E-fuse MAC
> >> cpsw
> >> - UART enabled -
> >> - CPU 00000000 booting -
> >> - Xen starting in Hyp mode -
> >> - Zero BSS -
> >> - Setting up control registers -
> >> - Turning on paging -
> >> - Ready -
> >> (XEN) Checking for initrd in /chosen
> >> (XEN) RAM: 0000000080000000 - 000000009fffffff
> >> (XEN) RAM: 00000000a0000000 - 00000000bfffffff
> >> (XEN) RAM: 00000000c0000000 - 00000000dfffffff
> >> (XEN)
> >> (XEN) MODULE[1]: 00000000c2000000 - 00000000c20069aa
> >> (XEN) MODULE[2]: 00000000c0000000 - 00000000c2000000
> >> (XEN) MODULE[3]: 0000000000000000 - 0000000000000000
> >> (XEN) MODULE[4]: 00000000c3000000 - 00000000c3010000
> >> (XEN)  RESVD[0]: 00000000ba300000 - 00000000bfd00000
> >> (XEN)  RESVD[1]: 0000000095800000 - 0000000095900000
> >> (XEN)  RESVD[2]: 0000000098a00000 - 0000000098b00000
> >> (XEN)  RESVD[3]: 0000000095f00000 - 0000000098a00000
> >> (XEN)  RESVD[4]: 0000000095900000 - 0000000095f00000
> >> (XEN)
> >> (XEN) Command line: dom0_mem=128M console=dtuart dtuart=serial0
> >> dom0_max_vcpus=2 bootscrub=0 flask_enforcing=1
> >> (XEN) Placing Xen at 0x00000000dfe00000-0x00000000e0000000
> >> (XEN) Xen heap: 00000000d2000000-00000000de000000 (49152 pages)
> >> (XEN) Dom heap: 344064 pages
> >> (XEN) Domain heap initialised
> >> (XEN) Looking for UART console serial0
> >>  Xen 4.5-unstable
> >> (XEN) Xen version 4.5-unstable (atseglytskyi@)
> >> (arm-linux-gnueabihf-gcc (crosstool-NG
> >> linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04) 4.7.3
> >> 20130328 (prerelease)) debu4
> >> (XEN) Latest ChangeSet: Thu Jul 3 12:55:26 2014 +0300 git:3ee354f-dirty
> >> (XEN) Processor: 412fc0f2: "ARM Limited", variant: 0x2, part 0xc0f, rev 0x2
> >> (XEN) 32-bit Execution:
> >> (XEN)   Processor Features: 00001131:00011011
> >> (XEN)     Instruction Sets: AArch32 Thumb Thumb-2 ThumbEE Jazelle
> >> (XEN)     Extensions: GenericTimer Security
> >> (XEN)   Debug Features: 02010555
> >> (XEN)   Auxiliary Features: 00000000
> >> (XEN)   Memory Model Features: 10201105 20000000 01240000 02102211
> >> (XEN)  ISA Features: 02101110 13112111 21232041 11112131 10011142 00000000
> >> (XEN) Platform: TI DRA7
> >> (XEN) /psci method must be smc, but is: "hvc"
> >> (XEN) Set AuxCoreBoot1 to 00000000dfe0004c (0020004c)
> >> (XEN) Set AuxCoreBoot0 to 0x20
> >> (XEN) Generic Timer IRQ: phys=30 hyp=26 virt=27
> >> (XEN) Using generic timer at 6144 KHz
> >> (XEN) GIC initialization:
> >> (XEN)         gic_dist_addr=0000000048211000
> >> (XEN)         gic_cpu_addr=0000000048212000
> >> (XEN)         gic_hyp_addr=0000000048214000
> >> (XEN)         gic_vcpu_addr=0000000048216000
> >> (XEN)         gic_maintenance_irq=25
> >> (XEN) GIC: 192 lines, 2 cpus, secure (IID 0000043b).
> >> (XEN) Using scheduler: SMP Credit Scheduler (credit)
> >> (XEN) I/O virtualisation disabled
> >> (XEN) Allocated console ring of 16 KiB.
> >> (XEN) VFP implementer 0x41 architecture 4 part 0x30 variant 0xf rev 0x0
> >> (XEN) Bringing up CPU1
> >> - CPU 00000001 booting -
> >> - Xen starting in Hyp mode -
> >> - Setting up control registers -
> >> - Turning on paging -
> >> - Ready -
> >> (XEN) CPU 1 booted.
> >> (XEN) Brought up 2 CPUs
> >> (XEN) *** LOADING DOMAIN 0 ***
> >> (XEN) Loading kernel from boot module 2
> >> (XEN) Populate P2M 0xc8000000->0xd0000000 (1:1 mapping for dom0)
> >> (XEN) Loading zImage from 00000000c0000040 to 
> >> 00000000cfc00000-00000000cff50c48
> >> (XEN) Loading dom0 DTB to 0x00000000cfa00000-0x00000000cfa05ba8
> >> (XEN) Std. Loglevel: All
> >> (XEN) Guest Loglevel: All
> >> (XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch
> >> input to Xen)
> >> (XEN) Freed 272kB init memory.
> >> (XEN) gic.c:673:d0v0 trying to inject irq=2 into d0v0, when it is
> >> already pending in LR0
> >> (XEN) gic.c:673:d0v0 trying to inject irq=2 into d0v0, when it is
> >> already pending in LR0
> >> [    0.000000] /cpus/cpu@0 missing clock-frequency property
> >> [    0.000000] /cpus/cpu@1 missing clock-frequency property
> >> [    0.140625] omap-gpmc omap-gpmc: failed to reserve memory
> >> [    0.187500] omap_l3_noc ocp.3: couldn't find resource 2
> >> [    0.273437] i2c i2c-1: of_i2c: invalid reg on
> >> /ocp/i2c@48072000/camera_ov10635
> >> [    0.437500] ldo3: operation not allowed
> >> [    0.437500] omapdss HDMI error: can't set the voltage regulator
> >> [    0.468750] tfc_s9700 display0: tfc_s9700_probe probe
> >> [    0.468750] ov1063x 1-0030: No deserializer node found
> >> [    0.468750] ov1063x 1-0030: No serializer node found
> >> [    0.468750] ov1063x 1-0030: Failed writing register 0x0103!
> >> [    0.468750] dra7xx-vip vip1-0: Waiting for I2C subdevice 30
> >> [    0.578125] ahci ahci.0.auto: can't get clock
> >> [    0.898437] ldc_module_init
> >> [    1.304687] Missing dual_emac_res_vlan in DT.
> >> [    1.304687] Using 1 as Reserved VLAN for 0 slave
> >> [    1.312500] Missing dual_emac_res_vlan in DT.
> >> [    1.320312] Using 2 as Reserved VLAN for 1 slave
> >> [    1.382812] Freeing init memory: 236K
> >> sh: write error: No such device
> >> Cannot identify '/dev/camera0': 2, No such file or directory
> >> Parsing config from /xen/images/DomUAndroid.cfg
> >> XSM Disabled: seclabel not supported
> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet
> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
> >> dom1 access to irq 53: Function not implemented
> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet
> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
> >> dom1 access to irq 71: Function not implemented
> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet
> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
> >> dom1 access to irq 173: Function not implemented
> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet
> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
> >> dom1 access to irq 174: Function not implemented
> >> Turning on vfb in domain 1
> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
> >> still lr_pending
> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
> >> still lr_pending
> >> Parsing config from /xen/images/DomUQNX.cfg
> >> XSM Disabled: seclabel not supported(XEN) gic.c:617:d0v1 trying to
> >> inject irq=2 into d0v0, when it is still lr_pending
> >>
> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
> >> still lr_pending
> >> [    4.304687] dra7-evm-sound sound.17: cpu dai node is invalid
> >> [    4.312500] dra7-evm-sound sound.17: failed to add bluetooth dai link 
> >> -22
> >> xc: error: panic: xc_dom_core.c:644: xc_dom_find_loader: no loader
> >> found: Invalid kernel
> >> libxl: error: libxl_dom.c:436:libxl__build_pv: xc_dom_parse_image
> >> failed: No such file or directory
> >> libxl: error: libxl_create.c:1030:domcreate_rebuild_done: cannot
> >> (re-)build domain: -3
> >> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is
> >> still lr_pending
> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
> >> still lr_pending
> >> Turning on 'vsnd' in domain '1' (dev_id: '0')
> >> Turning on vkbd in domain 1
> >> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is
> >> still lr_pending
> >> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is
> >> still lr_pending
> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
> >> still lr_pending
> >>
> >> Please press Enter to activate this console. (XEN) gic.c:617:d0v1
> >> trying to inject irq=2 into d0v0, when it is still lr_pending
> >>
> >> On Tue, Nov 18, 2014 at 6:18 PM, Andrii Tseglytskyi
> >> <andrii.tseglytskyi@xxxxxxxxxxxxxxx> wrote:
> >> > OK got it. Give me a few mins
> >> >
> >> > On Tue, Nov 18, 2014 at 6:14 PM, Stefano Stabellini
> >> > <stefano.stabellini@xxxxxxxxxxxxx> wrote:
> >> >> It is not the same: I would like to set GICH_V2_LR_MAINTENANCE_IRQ only
> >> >> for non-hardware irqs (desc == NULL) and keep avoiding
> >> >> GICH_V2_LR_MAINTENANCE_IRQ and setting GICH_LR_HW for hardware irqs.
> >> >>
> >> >> Also testing on 394b7e587b05d0f4a5fd6f067b38339ab5a77121 would avoid
> >> >> other potential bugs introduced later.
> >> >>
> >> >> On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
> >> >>> What if I try on top of current master branch the following code:
> >> >>>
> >> >>> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
> >> >>> index 31fb81a..6764ab7 100644
> >> >>> --- a/xen/arch/arm/gic-v2.c
> >> >>> +++ b/xen/arch/arm/gic-v2.c
> >> >>> @@ -36,6 +36,8 @@
> >> >>>  #include <asm/io.h>
> >> >>>  #include <asm/gic.h>
> >> >>>
> >> >>> +#define GIC_DEBUG 1
> >> >>> +
> >> >>>  /*
> >> >>>   * LR register definitions are GIC v2 specific.
> >> >>>   * Moved these definitions from header file to here
> >> >>> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> >> >>> index bcaded9..c03d6a6 100644
> >> >>> --- a/xen/arch/arm/gic.c
> >> >>> +++ b/xen/arch/arm/gic.c
> >> >>> @@ -41,7 +41,7 @@ static DEFINE_PER_CPU(uint64_t, lr_mask);
> >> >>>
> >> >>>  #define lr_all_full() (this_cpu(lr_mask) == ((1 <<
> >> >>> gic_hw_ops->info->nr_lrs) - 1))
> >> >>>
> >> >>> -#undef GIC_DEBUG
> >> >>> +#define GIC_DEBUG 1
> >> >>>
> >> >>>  static void gic_update_one_lr(struct vcpu *v, int i);
> >> >>>
> >> >>> It is equivalent to what you proposing - my code contains
> >> >>> PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI, as result the following lone will
> >> >>> be executed:
> >> >>>  lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ; inside gicv2_update_lr() 
> >> >>> function
> >> >>>
> >> >>> regards,
> >> >>> Andrii
> >> >>>
> >> >>> On Tue, Nov 18, 2014 at 5:39 PM, Stefano Stabellini
> >> >>> <stefano.stabellini@xxxxxxxxxxxxx> wrote:
> >> >>> > On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
> >> >>> >> OK, I see that GICH_V2_LR_MAINTENANCE_IRQ must always be set and
> >> >>> >> everything works fine
> >> >>> >> The following 2 patches fixes xen/master for my platform.
> >> >>> >>
> >> >>> >> Stefano, could you please take a look to these changes?
> >> >>> >>
> >> >>> >> commit 3628a0aa35706a8f532af865ed784536ce514eca
> >> >>> >> Author: Andrii Tseglytskyi <andrii.tseglytskyi@xxxxxxxxxxxxxxx>
> >> >>> >> Date:   Tue Nov 18 14:20:42 2014 +0200
> >> >>> >>
> >> >>> >>     xen/arm: dra7: always set GICH_V2_LR_MAINTENANCE_IRQ flag
> >> >>> >>
> >> >>> >>     Change-Id: Ia380b3507a182b11592588f65fd23693d4f87434
> >> >>> >>     Signed-off-by: Andrii Tseglytskyi 
> >> >>> >> <andrii.tseglytskyi@xxxxxxxxxxxxxxx>
> >> >>> >>
> >> >>> >> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
> >> >>> >> index 31fb81a..093ecdb 100644
> >> >>> >> --- a/xen/arch/arm/gic-v2.c
> >> >>> >> +++ b/xen/arch/arm/gic-v2.c
> >> >>> >> @@ -396,13 +396,14 @@ static void gicv2_update_lr(int lr, const 
> >> >>> >> struct
> >> >>> >> pending_irq *p,
> >> >>> >>                                               << 
> >> >>> >> GICH_V2_LR_PRIORITY_SHIFT) |
> >> >>> >>                ((p->irq & GICH_V2_LR_VIRTUAL_MASK) <<
> >> >>> >> GICH_V2_LR_VIRTUAL_SHIFT));
> >> >>> >>
> >> >>> >> -    if ( p->desc != NULL )
> >> >>> >> +    if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
> >> >>> >>      {
> >> >>> >> -        if ( 
> >> >>> >> platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
> >> >>> >> -            lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
> >> >>> >> -        else
> >> >>> >> -            lr_reg |= GICH_V2_LR_HW | ((p->desc->irq &
> >> >>> >> GICH_V2_LR_PHYSICAL_MASK )
> >> >>> >> -                            << GICH_V2_LR_PHYSICAL_SHIFT);
> >> >>> >> +        lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
> >> >>> >> +    }
> >> >>> >> +    else if ( p->desc != NULL )
> >> >>> >> +    {
> >> >>> >> +        lr_reg |= GICH_V2_LR_HW | ((p->desc->irq & 
> >> >>> >> GICH_V2_LR_PHYSICAL_MASK )
> >> >>> >> +                       << GICH_V2_LR_PHYSICAL_SHIFT);
> >> >>> >>      }
> >> >>> >>
> >> >>> >>      writel_gich(lr_reg, GICH_LR + lr * 4);
> >> >>> >
> >> >>> > Actually in case p->desc == NULL (the irq is not an hardware irq, it
> >> >>> > could be the virtual timer irq or the evtchn irq), you shouldn't need
> >> >>> > the maintenance interrupt, if the bug was really due to GICH_LR_HW 
> >> >>> > not
> >> >>> > working correctly on OMAP5. This changes might only be better at
> >> >>> > "hiding" the real issue.
> >> >>> >
> >> >>> > Maybe the problem is exactly the opposite: the new scheme for 
> >> >>> > avoiding
> >> >>> > maintenance interrupts doesn't work for software interrupts.
> >> >>> > The commit that should make them work correctly after the
> >> >>> > no-maintenance-irq commit is 394b7e587b05d0f4a5fd6f067b38339ab5a77121
> >> >>> > If you look at the changes to gic_update_one_lr in that commit, 
> >> >>> > you'll
> >> >>> > see that is going to set a software irq as PENDING if it is already 
> >> >>> > ACTIVE.
> >> >>> > Maybe that doesn't work correctly on OMAP5.
> >> >>> >
> >> >>> > Could you try this patch on top of
> >> >>> > 394b7e587b05d0f4a5fd6f067b38339ab5a77121?  It should help us 
> >> >>> > understand
> >> >>> > if the problem is specifically with software irqs.
> >> >>> >
> >> >>> >
> >> >>> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> >> >>> > index b7516c0..d8a17c9 100644
> >> >>> > --- a/xen/arch/arm/gic.c
> >> >>> > +++ b/xen/arch/arm/gic.c
> >> >>> > @@ -66,7 +66,7 @@ static DEFINE_PER_CPU(u8, gic_cpu_id);
> >> >>> >  /* Maximum cpu interface per GIC */
> >> >>> >  #define NR_GIC_CPU_IF 8
> >> >>> >
> >> >>> > -#undef GIC_DEBUG
> >> >>> > +#define GIC_DEBUG 1
> >> >>> >
> >> >>> >  static void gic_update_one_lr(struct vcpu *v, int i);
> >> >>> >
> >> >>> > @@ -563,6 +563,8 @@ static inline void gic_set_lr(int lr, struct 
> >> >>> > pending_irq *p,
> >> >>> >          ((p->irq & GICH_LR_VIRTUAL_MASK) << GICH_LR_VIRTUAL_SHIFT);
> >> >>> >      if ( p->desc != NULL )
> >> >>> >          lr_val |= GICH_LR_HW | (p->desc->irq << 
> >> >>> > GICH_LR_PHYSICAL_SHIFT);
> >> >>> > +    else
> >> >>> > +        lr_val |= GICH_LR_MAINTENANCE_IRQ;
> >> >>> >
> >> >>> >      GICH[GICH_LR + lr] = lr_val;
> >> >>> >
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>>
> >> >>> Andrii Tseglytskyi | Embedded Dev
> >> >>> GlobalLogic
> >> >>> www.globallogic.com
> >> >>>
> >> >
> >> >
> >> >
> >> > --
> >> >
> >> > Andrii Tseglytskyi | Embedded Dev
> >> > GlobalLogic
> >> > www.globallogic.com
> >>
> >>
> >>
> >> --
> >>
> >> Andrii Tseglytskyi | Embedded Dev
> >> GlobalLogic
> >> www.globallogic.com
> >>
> 
> 
> 
> -- 
> 
> Andrii Tseglytskyi | Embedded Dev
> GlobalLogic
> www.globallogic.com
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.