[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen 4.5 random freeze question



Hello Andrii,
we are getting closer :-)

It would help if you post the output with GIC_DEBUG defined but without
the other change that "fixes" the issue.

I think the problem is probably due to software irqs.
You are getting too many

gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is still lr_pending

messages. That means you are loosing virtual SGIs (guest VCPU to guest
VCPU). It would be best to investigate why, especially if you get many
more of the same messages without the MAINTENANCE_IRQ change I
suggested.

This patch might also help understading the problem more:


diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index b7516c0..5eaeca2 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -717,7 +717,12 @@ static void gic_restore_pending_irqs(struct vcpu *v)
     list_for_each_entry_safe ( p, t, &v->arch.vgic.lr_pending, lr_queue )
     {
         i = find_first_zero_bit(&this_cpu(lr_mask), nr_lrs);
-        if ( i >= nr_lrs ) return;
+        if ( i >= nr_lrs )
+        {
+            gdprintk(XENLOG_DEBUG, "LRs full, not injecting irq=%u into 
d%dv%d\n",
+                    p->irq, v->domain->domain_id, v->vcpu_id);
+            continue;
+        }
 
         spin_lock_irqsave(&gic.lock, flags);
         gic_set_lr(i, p, GICH_LR_PENDING);




On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
> Hi Stefano,
> 
> No hangs with this change.
> Complete log is the following:
> 
> U-Boot SPL 2013.10-00499-g062782f (Oct 14 2014 - 11:36:26)
> DRA752 ES1.0
> <ethaddr> not set. Validating first E-fuse MAC
> cpsw
> - UART enabled -
> - CPU 00000000 booting -
> - Xen starting in Hyp mode -
> - Zero BSS -
> - Setting up control registers -
> - Turning on paging -
> - Ready -
> (XEN) Checking for initrd in /chosen
> (XEN) RAM: 0000000080000000 - 000000009fffffff
> (XEN) RAM: 00000000a0000000 - 00000000bfffffff
> (XEN) RAM: 00000000c0000000 - 00000000dfffffff
> (XEN)
> (XEN) MODULE[1]: 00000000c2000000 - 00000000c20069aa
> (XEN) MODULE[2]: 00000000c0000000 - 00000000c2000000
> (XEN) MODULE[3]: 0000000000000000 - 0000000000000000
> (XEN) MODULE[4]: 00000000c3000000 - 00000000c3010000
> (XEN)  RESVD[0]: 00000000ba300000 - 00000000bfd00000
> (XEN)  RESVD[1]: 0000000095800000 - 0000000095900000
> (XEN)  RESVD[2]: 0000000098a00000 - 0000000098b00000
> (XEN)  RESVD[3]: 0000000095f00000 - 0000000098a00000
> (XEN)  RESVD[4]: 0000000095900000 - 0000000095f00000
> (XEN)
> (XEN) Command line: dom0_mem=128M console=dtuart dtuart=serial0
> dom0_max_vcpus=2 bootscrub=0 flask_enforcing=1
> (XEN) Placing Xen at 0x00000000dfe00000-0x00000000e0000000
> (XEN) Xen heap: 00000000d2000000-00000000de000000 (49152 pages)
> (XEN) Dom heap: 344064 pages
> (XEN) Domain heap initialised
> (XEN) Looking for UART console serial0
>  Xen 4.5-unstable
> (XEN) Xen version 4.5-unstable (atseglytskyi@)
> (arm-linux-gnueabihf-gcc (crosstool-NG
> linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04) 4.7.3
> 20130328 (prerelease)) debu4
> (XEN) Latest ChangeSet: Thu Jul 3 12:55:26 2014 +0300 git:3ee354f-dirty
> (XEN) Processor: 412fc0f2: "ARM Limited", variant: 0x2, part 0xc0f, rev 0x2
> (XEN) 32-bit Execution:
> (XEN)   Processor Features: 00001131:00011011
> (XEN)     Instruction Sets: AArch32 Thumb Thumb-2 ThumbEE Jazelle
> (XEN)     Extensions: GenericTimer Security
> (XEN)   Debug Features: 02010555
> (XEN)   Auxiliary Features: 00000000
> (XEN)   Memory Model Features: 10201105 20000000 01240000 02102211
> (XEN)  ISA Features: 02101110 13112111 21232041 11112131 10011142 00000000
> (XEN) Platform: TI DRA7
> (XEN) /psci method must be smc, but is: "hvc"
> (XEN) Set AuxCoreBoot1 to 00000000dfe0004c (0020004c)
> (XEN) Set AuxCoreBoot0 to 0x20
> (XEN) Generic Timer IRQ: phys=30 hyp=26 virt=27
> (XEN) Using generic timer at 6144 KHz
> (XEN) GIC initialization:
> (XEN)         gic_dist_addr=0000000048211000
> (XEN)         gic_cpu_addr=0000000048212000
> (XEN)         gic_hyp_addr=0000000048214000
> (XEN)         gic_vcpu_addr=0000000048216000
> (XEN)         gic_maintenance_irq=25
> (XEN) GIC: 192 lines, 2 cpus, secure (IID 0000043b).
> (XEN) Using scheduler: SMP Credit Scheduler (credit)
> (XEN) I/O virtualisation disabled
> (XEN) Allocated console ring of 16 KiB.
> (XEN) VFP implementer 0x41 architecture 4 part 0x30 variant 0xf rev 0x0
> (XEN) Bringing up CPU1
> - CPU 00000001 booting -
> - Xen starting in Hyp mode -
> - Setting up control registers -
> - Turning on paging -
> - Ready -
> (XEN) CPU 1 booted.
> (XEN) Brought up 2 CPUs
> (XEN) *** LOADING DOMAIN 0 ***
> (XEN) Loading kernel from boot module 2
> (XEN) Populate P2M 0xc8000000->0xd0000000 (1:1 mapping for dom0)
> (XEN) Loading zImage from 00000000c0000040 to 
> 00000000cfc00000-00000000cff50c48
> (XEN) Loading dom0 DTB to 0x00000000cfa00000-0x00000000cfa05ba8
> (XEN) Std. Loglevel: All
> (XEN) Guest Loglevel: All
> (XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch
> input to Xen)
> (XEN) Freed 272kB init memory.
> (XEN) gic.c:673:d0v0 trying to inject irq=2 into d0v0, when it is
> already pending in LR0
> (XEN) gic.c:673:d0v0 trying to inject irq=2 into d0v0, when it is
> already pending in LR0
> [    0.000000] /cpus/cpu@0 missing clock-frequency property
> [    0.000000] /cpus/cpu@1 missing clock-frequency property
> [    0.140625] omap-gpmc omap-gpmc: failed to reserve memory
> [    0.187500] omap_l3_noc ocp.3: couldn't find resource 2
> [    0.273437] i2c i2c-1: of_i2c: invalid reg on
> /ocp/i2c@48072000/camera_ov10635
> [    0.437500] ldo3: operation not allowed
> [    0.437500] omapdss HDMI error: can't set the voltage regulator
> [    0.468750] tfc_s9700 display0: tfc_s9700_probe probe
> [    0.468750] ov1063x 1-0030: No deserializer node found
> [    0.468750] ov1063x 1-0030: No serializer node found
> [    0.468750] ov1063x 1-0030: Failed writing register 0x0103!
> [    0.468750] dra7xx-vip vip1-0: Waiting for I2C subdevice 30
> [    0.578125] ahci ahci.0.auto: can't get clock
> [    0.898437] ldc_module_init
> [    1.304687] Missing dual_emac_res_vlan in DT.
> [    1.304687] Using 1 as Reserved VLAN for 0 slave
> [    1.312500] Missing dual_emac_res_vlan in DT.
> [    1.320312] Using 2 as Reserved VLAN for 1 slave
> [    1.382812] Freeing init memory: 236K
> sh: write error: No such device
> Cannot identify '/dev/camera0': 2, No such file or directory
> Parsing config from /xen/images/DomUAndroid.cfg
> XSM Disabled: seclabel not supported
> (XEN) do_physdev_op 16 cmd=13: not implemented yet
> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
> dom1 access to irq 53: Function not implemented
> (XEN) do_physdev_op 16 cmd=13: not implemented yet
> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
> dom1 access to irq 71: Function not implemented
> (XEN) do_physdev_op 16 cmd=13: not implemented yet
> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
> dom1 access to irq 173: Function not implemented
> (XEN) do_physdev_op 16 cmd=13: not implemented yet
> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
> dom1 access to irq 174: Function not implemented
> Turning on vfb in domain 1
> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
> still lr_pending
> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
> still lr_pending
> Parsing config from /xen/images/DomUQNX.cfg
> XSM Disabled: seclabel not supported(XEN) gic.c:617:d0v1 trying to
> inject irq=2 into d0v0, when it is still lr_pending
> 
> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
> still lr_pending
> [    4.304687] dra7-evm-sound sound.17: cpu dai node is invalid
> [    4.312500] dra7-evm-sound sound.17: failed to add bluetooth dai link -22
> xc: error: panic: xc_dom_core.c:644: xc_dom_find_loader: no loader
> found: Invalid kernel
> libxl: error: libxl_dom.c:436:libxl__build_pv: xc_dom_parse_image
> failed: No such file or directory
> libxl: error: libxl_create.c:1030:domcreate_rebuild_done: cannot
> (re-)build domain: -3
> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is
> still lr_pending
> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
> still lr_pending
> Turning on 'vsnd' in domain '1' (dev_id: '0')
> Turning on vkbd in domain 1
> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is
> still lr_pending
> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is
> still lr_pending
> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
> still lr_pending
> 
> Please press Enter to activate this console. (XEN) gic.c:617:d0v1
> trying to inject irq=2 into d0v0, when it is still lr_pending
> 
> On Tue, Nov 18, 2014 at 6:18 PM, Andrii Tseglytskyi
> <andrii.tseglytskyi@xxxxxxxxxxxxxxx> wrote:
> > OK got it. Give me a few mins
> >
> > On Tue, Nov 18, 2014 at 6:14 PM, Stefano Stabellini
> > <stefano.stabellini@xxxxxxxxxxxxx> wrote:
> >> It is not the same: I would like to set GICH_V2_LR_MAINTENANCE_IRQ only
> >> for non-hardware irqs (desc == NULL) and keep avoiding
> >> GICH_V2_LR_MAINTENANCE_IRQ and setting GICH_LR_HW for hardware irqs.
> >>
> >> Also testing on 394b7e587b05d0f4a5fd6f067b38339ab5a77121 would avoid
> >> other potential bugs introduced later.
> >>
> >> On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
> >>> What if I try on top of current master branch the following code:
> >>>
> >>> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
> >>> index 31fb81a..6764ab7 100644
> >>> --- a/xen/arch/arm/gic-v2.c
> >>> +++ b/xen/arch/arm/gic-v2.c
> >>> @@ -36,6 +36,8 @@
> >>>  #include <asm/io.h>
> >>>  #include <asm/gic.h>
> >>>
> >>> +#define GIC_DEBUG 1
> >>> +
> >>>  /*
> >>>   * LR register definitions are GIC v2 specific.
> >>>   * Moved these definitions from header file to here
> >>> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> >>> index bcaded9..c03d6a6 100644
> >>> --- a/xen/arch/arm/gic.c
> >>> +++ b/xen/arch/arm/gic.c
> >>> @@ -41,7 +41,7 @@ static DEFINE_PER_CPU(uint64_t, lr_mask);
> >>>
> >>>  #define lr_all_full() (this_cpu(lr_mask) == ((1 <<
> >>> gic_hw_ops->info->nr_lrs) - 1))
> >>>
> >>> -#undef GIC_DEBUG
> >>> +#define GIC_DEBUG 1
> >>>
> >>>  static void gic_update_one_lr(struct vcpu *v, int i);
> >>>
> >>> It is equivalent to what you proposing - my code contains
> >>> PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI, as result the following lone will
> >>> be executed:
> >>>  lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ; inside gicv2_update_lr() function
> >>>
> >>> regards,
> >>> Andrii
> >>>
> >>> On Tue, Nov 18, 2014 at 5:39 PM, Stefano Stabellini
> >>> <stefano.stabellini@xxxxxxxxxxxxx> wrote:
> >>> > On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
> >>> >> OK, I see that GICH_V2_LR_MAINTENANCE_IRQ must always be set and
> >>> >> everything works fine
> >>> >> The following 2 patches fixes xen/master for my platform.
> >>> >>
> >>> >> Stefano, could you please take a look to these changes?
> >>> >>
> >>> >> commit 3628a0aa35706a8f532af865ed784536ce514eca
> >>> >> Author: Andrii Tseglytskyi <andrii.tseglytskyi@xxxxxxxxxxxxxxx>
> >>> >> Date:   Tue Nov 18 14:20:42 2014 +0200
> >>> >>
> >>> >>     xen/arm: dra7: always set GICH_V2_LR_MAINTENANCE_IRQ flag
> >>> >>
> >>> >>     Change-Id: Ia380b3507a182b11592588f65fd23693d4f87434
> >>> >>     Signed-off-by: Andrii Tseglytskyi 
> >>> >> <andrii.tseglytskyi@xxxxxxxxxxxxxxx>
> >>> >>
> >>> >> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
> >>> >> index 31fb81a..093ecdb 100644
> >>> >> --- a/xen/arch/arm/gic-v2.c
> >>> >> +++ b/xen/arch/arm/gic-v2.c
> >>> >> @@ -396,13 +396,14 @@ static void gicv2_update_lr(int lr, const struct
> >>> >> pending_irq *p,
> >>> >>                                               << 
> >>> >> GICH_V2_LR_PRIORITY_SHIFT) |
> >>> >>                ((p->irq & GICH_V2_LR_VIRTUAL_MASK) <<
> >>> >> GICH_V2_LR_VIRTUAL_SHIFT));
> >>> >>
> >>> >> -    if ( p->desc != NULL )
> >>> >> +    if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
> >>> >>      {
> >>> >> -        if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
> >>> >> -            lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
> >>> >> -        else
> >>> >> -            lr_reg |= GICH_V2_LR_HW | ((p->desc->irq &
> >>> >> GICH_V2_LR_PHYSICAL_MASK )
> >>> >> -                            << GICH_V2_LR_PHYSICAL_SHIFT);
> >>> >> +        lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
> >>> >> +    }
> >>> >> +    else if ( p->desc != NULL )
> >>> >> +    {
> >>> >> +        lr_reg |= GICH_V2_LR_HW | ((p->desc->irq & 
> >>> >> GICH_V2_LR_PHYSICAL_MASK )
> >>> >> +                       << GICH_V2_LR_PHYSICAL_SHIFT);
> >>> >>      }
> >>> >>
> >>> >>      writel_gich(lr_reg, GICH_LR + lr * 4);
> >>> >
> >>> > Actually in case p->desc == NULL (the irq is not an hardware irq, it
> >>> > could be the virtual timer irq or the evtchn irq), you shouldn't need
> >>> > the maintenance interrupt, if the bug was really due to GICH_LR_HW not
> >>> > working correctly on OMAP5. This changes might only be better at
> >>> > "hiding" the real issue.
> >>> >
> >>> > Maybe the problem is exactly the opposite: the new scheme for avoiding
> >>> > maintenance interrupts doesn't work for software interrupts.
> >>> > The commit that should make them work correctly after the
> >>> > no-maintenance-irq commit is 394b7e587b05d0f4a5fd6f067b38339ab5a77121
> >>> > If you look at the changes to gic_update_one_lr in that commit, you'll
> >>> > see that is going to set a software irq as PENDING if it is already 
> >>> > ACTIVE.
> >>> > Maybe that doesn't work correctly on OMAP5.
> >>> >
> >>> > Could you try this patch on top of
> >>> > 394b7e587b05d0f4a5fd6f067b38339ab5a77121?  It should help us understand
> >>> > if the problem is specifically with software irqs.
> >>> >
> >>> >
> >>> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> >>> > index b7516c0..d8a17c9 100644
> >>> > --- a/xen/arch/arm/gic.c
> >>> > +++ b/xen/arch/arm/gic.c
> >>> > @@ -66,7 +66,7 @@ static DEFINE_PER_CPU(u8, gic_cpu_id);
> >>> >  /* Maximum cpu interface per GIC */
> >>> >  #define NR_GIC_CPU_IF 8
> >>> >
> >>> > -#undef GIC_DEBUG
> >>> > +#define GIC_DEBUG 1
> >>> >
> >>> >  static void gic_update_one_lr(struct vcpu *v, int i);
> >>> >
> >>> > @@ -563,6 +563,8 @@ static inline void gic_set_lr(int lr, struct 
> >>> > pending_irq *p,
> >>> >          ((p->irq & GICH_LR_VIRTUAL_MASK) << GICH_LR_VIRTUAL_SHIFT);
> >>> >      if ( p->desc != NULL )
> >>> >          lr_val |= GICH_LR_HW | (p->desc->irq << 
> >>> > GICH_LR_PHYSICAL_SHIFT);
> >>> > +    else
> >>> > +        lr_val |= GICH_LR_MAINTENANCE_IRQ;
> >>> >
> >>> >      GICH[GICH_LR + lr] = lr_val;
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>>
> >>> Andrii Tseglytskyi | Embedded Dev
> >>> GlobalLogic
> >>> www.globallogic.com
> >>>
> >
> >
> >
> > --
> >
> > Andrii Tseglytskyi | Embedded Dev
> > GlobalLogic
> > www.globallogic.com
> 
> 
> 
> -- 
> 
> Andrii Tseglytskyi | Embedded Dev
> GlobalLogic
> www.globallogic.com
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.