[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 1/2] x2APIC: translate IO-APIC entries when enabling the IOMMU



On Fri, Oct 11, 2019 at 12:01:36PM +0200, Jan Beulich wrote:
> On 11.10.2019 11:28, Roger Pau Monné  wrote:
> > On Fri, Oct 11, 2019 at 09:52:35AM +0200, Jan Beulich wrote:
> >> On 10.10.2019 17:19, Roger Pau Monné  wrote:
> >>> On Thu, Oct 10, 2019 at 03:46:45PM +0200, Jan Beulich wrote:
> >>>> On 10.10.2019 15:12, Roger Pau Monné  wrote:
> >>>>> On Thu, Oct 10, 2019 at 02:55:02PM +0200, Jan Beulich wrote:
> >>>>>> On 10.10.2019 14:13, Roger Pau Monné  wrote:
> >>>>>>> On Thu, Oct 10, 2019 at 01:54:06PM +0200, Jan Beulich wrote:
> >>>>>>>> On 10.10.2019 13:33, Roger Pau Monne wrote:
> >>>>>>>>> When interrupt remapping is enabled as part of enabling x2APIC the
> >>>>>>>>
> >>>>>>>> Perhaps "unmasked" instead of "the"?
> >>>>>>>>
> >>>>>>>>> IO-APIC entries also need to be translated to the new format and 
> >>>>>>>>> added
> >>>>>>>>> to the interrupt remapping table.
> >>>>>>>>>
> >>>>>>>>> This prevents IOMMU interrupt remapping faults when booting on
> >>>>>>>>> hardware that has unmasked IO-APIC pins.
> >>>>>>>>
> >>>>>>>> But in the end it only papers over whatever the spurious interrupts
> >>>>>>>> result form, doesn't it? Which isn't to say this isn't an
> >>>>>>>> improvement. Calling out the ExtInt case here may be worthwhile as
> >>>>>>>> well, as would be pointing out that this case still won't work on
> >>>>>>>> AMD IOMMUs.
> >>>>>>>
> >>>>>>> But the fix for the ExtINT AMD issue should be done in
> >>>>>>> amd_iommu_ioapic_update_ire then, so that it can properly handle
> >>>>>>> ExtINT delivery mode, not to this part of the code. I will look
> >>>>>>> into it, but I think it's kind of tangential to the issue here.
> >>>>>>
> >>>>>> I'm not talking of you working on fixing this right away. I'm merely
> >>>>>> asking that you mention here (a) the ExtInt special case and (b)
> >>>>>> that this special case will (continue to) not work in the AMD case.
> >>>>>>
> >>>>>>>>> --- a/xen/arch/x86/apic.c
> >>>>>>>>> +++ b/xen/arch/x86/apic.c
> >>>>>>>>> @@ -515,7 +515,7 @@ static void resume_x2apic(void)
> >>>>>>>>>      iommu_enable_x2apic();
> >>>>>>>>>      __enable_x2apic();
> >>>>>>>>>  
> >>>>>>>>> -    restore_IO_APIC_setup(ioapic_entries);
> >>>>>>>>> +    restore_IO_APIC_setup(ioapic_entries, true);
> >>>>>>>>>      unmask_8259A();
> >>>>>>>>>  
> >>>>>>>>>  out:
> >>>>>>>>> @@ -961,7 +961,12 @@ void __init x2apic_bsp_setup(void)
> >>>>>>>>>          printk("Switched to APIC driver %s\n", genapic.name);
> >>>>>>>>>  
> >>>>>>>>>  restore_out:
> >>>>>>>>> -    restore_IO_APIC_setup(ioapic_entries);
> >>>>>>>>> +    /*
> >>>>>>>>> +     * NB: do not use raw mode when restoring entries if the iommu 
> >>>>>>>>> has been
> >>>>>>>>> +     * enabled during the process, because the entries need to be 
> >>>>>>>>> translated
> >>>>>>>>> +     * and added to the remapping table in that case.
> >>>>>>>>> +     */
> >>>>>>>>> +    restore_IO_APIC_setup(ioapic_entries, !x2apic_enabled);
> >>>>>>>>
> >>>>>>>> How is this different in the resume_x2apic() case? The IOMMU gets
> >>>>>>>> enabled in the course of that as well. I.e. I'd expect you want
> >>>>>>>> to pass "false" there, not "true".
> >>>>>>>
> >>>>>>> In the resume_x2apic case interrupt remapping should already be
> >>>>>>> enabled or not, but that function is not going to enable interrupt
> >>>>>>> remapping if it wasn't enabled before, hence the IO-APIC entries
> >>>>>>> should already be using the interrupt remapping table and no
> >>>>>>> translation is needed.
> >>>>>>
> >>>>>> Who / what would have enabled the IOMMU in the resume case?
> >>>>>
> >>>>> I don't think the question is who enables interrupt remapping in the
> >>>>> resume case (which is resume_x2apic when calling iommu_enable_x2apic
> >>>>> AFAICT), the point here is that on resume the entries in the IO-APIC
> >>>>> will already match the state of interrupt remapping, so they shouldn't
> >>>>> be translated. If interrupt remapping was off before suspend it will
> >>>>> still be off after resume, and there won't be any translation needed.
> >>>>> The same is true if interrupt remapping is on before suspend.
> >>>>
> >>>> I disagree: save_IO_APIC_setup() gets called from resume_x2apic(),
> >>>> not prior to suspend.
> >>>
> >>> Oh, so maybe that's a misunderstanding on my side. I don't seem to be
> >>> able to find a statement about the contents of the IO-APIC registers
> >>> (and more specifically the entries) when getting back from
> >>> suspension. Are all entries cleared and masked?
> >>>
> >>> Are the values previous to suspension stored?
> >>
> >> See ioapic_suspend() / ioapic_resume(): Looks like there's some
> >> redundancy here - I don't think it makes sense for the LAPIC
> >> code to fiddle with all the RTEs if subsequently (I assume;
> >> didn't check) they'll all be overwritten anyway. I would seem
> >> more logical to me if they'd just all get masked for IOMMU
> >> enabling, deferring to ioapic_resume() for everything else.
> > 
> > Yes, it seems like resume_x2apic shouldn't need to play with the
> > entries at all TBH, just enabling interrupt remapping should be enough
> > given that the entries are already save and restored by
> > ioapic_{suspend/resume}.
> > 
> > Would you agree to leave that as-is in this patch, ie: always pass
> > true to restore_IO_APIC_setup in resume_x2apic?
> 
> Properly explained in the description, that's an option. Even better
> would of course be to do away with the unnecessary save/restore in
> a prereq patch, at which point the question disappears as to what to
> pass to the function at that point.

OK, I can create a pre-patch, but I don't think I have any hardware
that does suspend/resume since it's all server-grade. Someone would
have to test it and assert it actually works correctly, hence my
reluctance to touch any of the suspend/resume logic.

> >>> But it's simply not possible to reach the call to
> >>> restore_IO_APIC_setup with x2apic_enabled == true and interrupt
> >>> remapping disabled, regardless of the initial value of
> >>> x2apic_enabled.
> >>>
> >>> All the paths that could lead to this scenario are short-circuited
> >>> above with a panic.
> >>
> >> Hmm, true. Nevertheless it would feel better if the conditionals
> >> were using what actually matters, rather than something derived.
> > 
> > Would you be OK to using something like v1:
> > 
> > https://lists.xenproject.org/archives/html/xen-devel/2019-10/msg00805.html
> > 
> > See the usage of iommu_enabled in x2apic_bsp_setup. I think using
> > x2apic_enabled is fine given the logic in the function, and the fact
> > that x2APIC mandates interrupt remapping, but I could live with that
> > extra local variable.
> 
> Conceptually this looks better to me. I'd prefer to avoid shadowing
> the global "iommu_enabled" though, and set the local variable to
> "true" only one we actually enable the IOMMU. Also strictly speaking
> you care about the enabling of interrupt remapping, not that of the
> IOMMU (which implicitly means DMA remapping). I wonder whether the
> code couldn't easily be made cope with IOMMU already being enabled,
> assuming that this would lead to {iommu,intremap}_enabled to already
> be true on entry. I.e. you'd request a "raw" restore unless
> intremap_enabled changed from false to true.

I'm not sure it's that easy. A raw restore wouldn't fully work, since
Xen would have replaced the interrupt remapping table with an empty
one, so Xen would have to add the entries for the unmasked IO-APIC
pins to the IRT and then likely also update the IO-APIC pin entry to
match the newly created IRTE.

I think adding a comment at the top of the function to note that this
won't work properly if there are unmasked IO-APIC pins and interrupt
remapping enabled should be enough for now, as the patch doesn't make
this case worse than it's current state.

> >>>> I realize that io_apic_write() would suitably avoid going
> >>>> the remapping path, but I think it would be more clear if the
> >>>> distinction was already made properly at the call site.
> >>>
> >>> I'm afraid I'm slightly loss, do you mean to replace the
> >>> ioapic_write_entry with an io_apic_write in restore_IO_APIC_setup?
> >>
> >> No, because of ...
> >>
> >>> That would be the same as always passing raw == false AFAICT.
> >>
> >> ... this. I'm asking to pass in the argument for "raw" based
> >> on what you want, without relying on io_apic_write()'s
> >> behavior. That's more for code clarity than actual correctness,
> >> since - as said - io_apic_write() would invoke __io_apic_write()
> >> anyway.
> > 
> > Sorry, but I'm afraid I still don't fully understand what you mean
> > with this. restore_IO_APIC_setup uses ioapic_write_entry which in turn
> > uses __io_apic_write or io_apic_write. I could get rid of the usage of
> > ioapic_write_entry and just call __io_apic_write or io_apic_write (or
> > even directly call iommu_update_ire_from_apic), but I'm not sure
> > what's the benefit of any of this, it's just open-coding logic that's
> > done in the helpers.
> > 
> > Would you rather mean to rename the parameter of restore_IO_APIC_setup
> > from 'raw' to 'translate' or some such in order to clearly denote
> > there's a transformation done before writing the entry?
> 
> Whether it's "raw" or "translate" doesn't really matter to me.
> What I'm after is for you to not pass raw=true when you really
> mean raw=false (or the other way around), relying on the
> intremap_enabled check down the call chain in io_apic_write().

I don't think that's the case with my proposed code (or at least not
the intention), when raw=false is passed that's because the entries
_must_ be translated, not because x2apic_bsp_setup is relying on the
value of intremap_enabled.

Let me prepare a new series with the proposed changes and we can
continue from there.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.