[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] VT-d Posted-interrupt (PI) design for XEN




> -----Original Message-----
> From: Tian, Kevin
> Sent: Tuesday, March 10, 2015 10:22 AM
> To: Wu, Feng; xen-devel@xxxxxxxxxxxxx
> Cc: Jan Beulich; Zhang, Yang Z
> Subject: RE: VT-d Posted-interrupt (PI) design for XEN
> 
> > From: Wu, Feng
> > Sent: Wednesday, March 04, 2015 9:30 PM
> >
> > VT-d Posted-interrupt (PI) design for XEN
> >
> > Background
> > ==========
> > With the development of virtualization, there are more and more device
> > assignment requirements. However, today when a VM is running with
> > assigned devices (such as, NIC), external interrupt handling for the 
> > assigned
> > devices always needs VMM intervention.
> >
> > VT-d Posted-interrupt is a more enhanced method to handle interrupts
> > in the virtualization environment. Interrupt posting is the process by
> > which an interrupt request is recorded in a memory-resident
> > posted-interrupt-descriptor structure by the root-complex, followed by
> > an optional notification event issued to the CPU complex.
> >
> > With VT-d Posted-interrupt we can get the following advantages:
> > - Directly delivery of external interrupts to running vCPUs without VMM
> > intervention
> 
> "Directly" -> "Direct"
> 
> > - Decease the interrupt migration complexity. On vCPU migration, software
> > can atomically co-migrate all interrupts targeting the migrating vCPU.
> 
> could you elaborate this benefit? I didn't see discussion around migration
> throughout the proposal.
> 
> >
> >
> > Posted-interrupt Introduction
> > ========================
> > There are two components to the Posted-interrupt architecture:
> > Processor Support and Root-Complex Support
> >
> > - Processor Support
> > Posted-interrupt processing is a feature by which a processor processes
> > the virtual interrupts by recording them as pending on the virtual-APIC
> > page.
> >
> > Posted-interrupt processing is enabled by setting the "process posted
> > interrupts" VM-execution control. The processing is performed in response
> > to the arrival of an interrupt with the posted-interrupt notification 
> > vector.
> > In response to such an interrupt, the processor processes virtual interrupts
> > recorded in a data structure called a posted-interrupt descriptor.
> >
> > More information about APICv and CPU-side Posted-interrupt, please refer
> > to Chapter 29, and Section 29.6 in the Intel SDM:
> >
> http://www.intel.com/content/dam/www/public/us/en/documents/manuals/6
> > 4-ia-32-architectures-software-developer-manual-325462.pdf
> >
> > - Root-Complex Support
> > Interrupt posting is the process by which an interrupt request (from IOAPIC
> > or MSI/MSIx capable sources) is recorded in a memory-resident
> > posted-interrupt-descriptor structure by the root-complex, followed by
> > an optional notification event issued to the CPU complex. The interrupt
> > request arriving at the root-complex carry the identity of the interrupt
> > request source and a 'remapping-index'. The remapping-index is used to
> > look-up an entry from the memory-resident interrupt-remap-table. Unlike
> > with interrupt-remapping, the interrupt-remap-table-entry for a posted-
> > interrupt, specifies a virtual-vector and a pointer to the posted-interrupt
> > descriptor. The virtual-vector specifies the vector of the interrupt to be
> > recorded in the posted-interrupt descriptor. The posted-interrupt descriptor
> > hosts storage for the virtual-vectors and contains the attributes of the
> > notification event (interrupt) to be issued to the CPU complex to inform
> > CPU/software about pending interrupts recorded in the posted-interrupt
> > descriptor.
> >
> > More information about VT-d PI, please refer to
> >
> http://www.intel.com/content/www/us/en/intelligent-systems/intel-technolog
> > y/vt-directed-io-spec.html
> >
> >
> > Design Overview
> > ==============
> > In this design, we will cover the following items:
> > 1. Add a variant to control whether enable VT-d posted-interrupt or not.
> > 2. VT-d PI feature detection.
> > 3. Extend posted-interrupt descriptor structure to cover VT-d PI specific 
> > stuff.
> > 4. Extend IRTE structure to support VT-d PI.
> > 5. Introduce a new global vector which is used for waking up the HLT'ed 
> > vCPU.
> 
> HLT'ed -> blocked
> 
> > 6. Update IRTE when guest modifies the interrupt configuration (MSI/MSIx
> > configuration).
> > 7. Update posted-interrupt descriptor during vCPU scheduling (when the state
> > of the vCPU is transmitted among RUNSTATE_running / RUNSTATE_blocked/
> > RUNSTATE_runnable / RUNSTATE_offline).
> > 8. New boot command line for Xen, which controls VT-d PI feature by user.
> > 9. Multicast/broadcast and lowest priority interrupts consideration.
> >
> 
> add a step on notification handler, as what you described in another mail.
> 
> >
> > Implementation details
> > ===================
> > - New variant to control VT-d PI
> > Like variant 'iommu_intremap' for interrupt remapping, it is very
> > straightforward
> > to add a new one 'iommu_intpost' for posted-interrupt. 'iommu_intpost' is 
> > set
> > only when interrupt remapping and VT-d posted-interrupt are both enabled.
> >
> > - VT-d PI feature detection.
> > Bit 59 in VT-d Capability Register is used to report VT-d Posted-interrupt
> > support.
> >
> > - Extend posted-interrupt descriptor structure to cover VT-d PI specific 
> > stuff.
> > Here is the new structure for posted-interrupt descriptor:
> >
> > struct pi_desc {
> >      DECLARE_BITMAP(pir, NR_VECTORS);
> >      union {
> >         struct
> >         {
> >         u64 on     : 1,
> >             sn     : 1,
> >             rsvd_1 : 13,
> >             ndm    : 1,
> >             nv     : 8,
> >             rsvd_2 : 8,
> >             ndst   : 32;
> >         };
> >         u64 control;
> >     };
> >     u32 rsvd[6];
> >  } __attribute__ ((aligned (64)));
> >
> > - Extend IRTE structure to support VT-d PI.
> > Here is the new structure for IRTE:
> > /* interrupt remap entry */
> > struct iremap_entry {
> >   union {
> >     u64 lo_val;
> >     struct {
> >         u64 p       : 1,
> >             fpd     : 1,
> >             dm      : 1,
> >             rh      : 1,
> >             tm      : 1,
> >             dlm     : 3,
> >             avail   : 4,
> >             res_1   : 4,
> >             vector  : 8,
> >             res_2   : 8,
> >             dst     : 32;
> >     }lo;
> >     struct {
> >         u64 p       : 1,
> >             fpd     : 1,
> >             res_1   : 6,
> >             avail   : 4,
> >             res_2   : 2,
> >             urg     : 1,
> >             pst     : 1,
> >             vector  : 8,
> >             res_3   : 14,
> >             pda_l   : 26;
> >     }lo_intpost;
> >   };
> >   union {
> >     u64 hi_val;
> >     struct {
> >         u64 sid     : 16,
> >             sq      : 2,
> >             svt     : 2,
> >             res_1   : 44;
> >     }hi;
> >     struct {
> >         u64 sid     : 16,
> >             sq      : 2,
> >             svt     : 2,
> >             res_1   : 12,
> >             pda_h   : 32;
> >     }hi_intpost;
> >   };
> > };
> >
> > - Introduce a new global vector which is used to wake up the HLT'ed vCPU.
> > Currently, there is a global vector 'posted_intr_vector', which is used as 
> > the
> > global notification vector for all vCPUs in the system. This vector is 
> > stored in
> > VMCS and CPU considers it as a special vector, uses it to notify the related
> > pCPU when an interrupt is recorded in the posted-interrupt descriptor.
> >
> > After having VT-d PI, VT-d engine can issue notification event when the
> > assigned devices issue interrupts. We need add a new global vector to
> > wakeup the HLT'ed vCPU, please refer to the following scenario for the
> > usage of this new global vector:
> >
> > 1. vCPU0 is running on pCPU0
> > 2. vCPU0 is HLT'ed and vCPU1 is currently running on pCPU0
> > 3. An external interrupt from an assigned device occurs for vCPU0, if we
> > still use 'posted_intr_vector' as the notification vector for vCPU0, the
> > notification event for vCPU0 (the event will go to pCPU1) will be consumed
> > by vCPU1 incorrectly. The worst case is that vCPU0 will never be woken up
> > again since the wakeup event for it is always consumed by other vCPUs
> > incorrectly. So we need introduce another global vector, naming
> > 'pi_wakeup_vector'
> > to wake up the HTL'ed vCPU.
> 
> update above example with design about notification handler.
> 
> >
> > - Update IRTE when guest modifies the interrupt configuration (MSI/MSIx
> > configuration).
> > After VT-d PI is introduced, the format of IRTE is changed as follows:
> >     Descriptor Address: the address of the posted-interrupt descriptor
> >     Virtual Vector: the guest vector of the interrupt
> >     URG: indicates if the interrupt is urgent
> >     Other fields continue to have the same meaning
> >
> > 'Descriptor Address' tells the destination vCPU of this interrupt, since
> > each vCPU has a dedicated posted-interrupt descriptor.
> >
> > 'Virtual Vector' tells the guest vector of the interrupt.
> >
> > When guest changes the configuration of the interrupts, such as, the
> > cpu affinity, or the vector, we need to update the associated IRTE 
> > accordingly.
> >
> > - Update posted-interrupt descriptor during vCPU scheduling
> > The basic idea here is:
> > 1. When vCPU's state is RUNSTATE_running,
> >         - Set 'NV' to 'posted_intr_vector'.
> >         - Clear 'SN' to accept posted-interrupts.
> >         - Set 'NDST' to the pCPU on which the vCPU will be running.
> > 2. When vCPU's state is RUNSTATE_blocked,
> >         - Set 'NV' to ' pi_wakeup_vector ', so we can wake up the
> >           related vCPU when posted-interrupt happens for it.
> >           Please refer to the above section about the new global vector.
> >         - Clear 'SN' to accept posted-interrupts
> > 3. When vCPU's state is RUNSTATE_runnable/RUNSTATE_offline,
> >         - Set 'SN' to suppress non-urgent interrupts
> >           (Current, we only support non-urgent interrupts)
> >          When vCPU is in RUNSTATE_runnable or RUNSTATE_offline,
> >          It is not needed to accept posted-interrupt notification event,
> >          since we don't change the behavior of scheduler when the
> interrupt
> >          occurs, we still need wait the next scheduling of the vCPU.
> >          When external interrupts from assigned devices occur, the
> > interrupts
> >          are recorded in PIR, and will be synced to IRR before VM-Entry.
> >         - Set 'NV' to 'posted_intr_vector'.
> 
> would it be safer to use 'pi_wakeup_vector', if it's the right one to use
> in the future when we consider real-time scheduling?
>

Since we don't consider real-time case now, is it better to set 'NV' to 
'posted_intr_vector'
together with other changes when supporting real-time cases?


> >
> > - New boot command line for Xen, which controls VT-d PI feature by user.
> > Like 'intremap' for interrupt remapping, we add a new boot command line
> > 'intpost' for posted-interrupts.
> >
> > - Multicast/broadcast and lowest priority interrupts consideration
> > With VT-d PI, the destination vCPU information of an external interrupt
> > from assigned devices is stored in IRTE, this makes the following
> > consideration of the design:
> > 1. Multicast/broadcast interrupts cannot be posted.
> > 2. For lowest-priority interrupts, new Intel CPU/Chipset/root-complex
> > (starting from Nehalem) ignore TPR value, and instead supported two other
> > ways (configurable by BIOS) on how the handle lowest priority interrupts:
> >     A) Round robin: In this method, the chipset simply delivers lowest 
> > priority
> > interrupts in a round-robin manner across all the available logical CPUs. 
> > While
> > this provides good load balancing, this was not the best thing to do always 
> > as
> > interrupts from the same device (like NIC) will start running on all the 
> > CPUs
> > thrashing caches and taking locks. This led to the next scheme.
> >     B) Vector hashing: In this method, hardware would apply a hash function
> > on the vector value in the interrupt request, and use that hash to pick a 
> > logical
> > CPU to route the lowest priority interrupt. This way, a given vector always
> goes
> > to the same logical CPU, avoiding the thrashing problem above.
> >
> > So, gist of above is that, lowest priority interrupts has never been 
> > delivered as
> > "lowest priority" in physical hardware.
> >
> > For KVM enabling work of VT-d PI, we divide this into two stage:
> > Stage 1: Only support single-CPU lowest-priority interrupts (configured via
> > /proc/irq or irqbalance). This is simple and clear.
> > Stage 2: After all the patches are merged, I will add the vector hashing
> support
> > for lowest-priority on VT-d PI.
> >
> > On Xen side, what is your opinion about support lowest-priority interrupts
> > for VT-d PI?
> 
> I'm not sure how important supporting vector hashing is here. We can do same
> thing in software when setting NDST in fixed delivery mode?

I am not clear about this, here we need find a way to support lowest-priority 
interrupts,
Could you please elaborate it a bit more? Thanks!

> 
> >
> > ================================
> >
> > Any comments about this design are highly appreciated!
> 
> Could you send an updated version based on all comments so far?

Sure!

Thanks,
Feng

> 
> Thanks,
> Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.