Xen project Mailing List

[Xen-devel] [Draft D] Xen on ARM vITS Handling

Draft D follows. Also at: http://xenbits.xen.org/people/ianc/vits/draftD.{pdf,html} In this iteration I took a step back and tried to simplify a lot of things, in an attempt to get closer to something we can agree on as a first cut that is achievable for 4.6, since I felt we were getting bogged down in the complexity of trying to do everything at once. These assumptions/short comings can be addressed in future iterations of the code. Please let me know what you think. Perhaps it would be useful to have a short IRC meeting on #xenarm to iron out the last details? If people could let me know their availability and timezones I can try and set something up. If people prefer we could use a slot in the Monthly technical call[0], which is next week but I think IRC is probably better suited? [0] http://lists.xen.org/archives/html/xen-devel/2015-06/msg00460.html Ian. -----8>------------ % Xen on ARM vITS Handling % Ian Campbell <ian.campbell@xxxxxxxxxx> % Draft D # Changelog ## Since Draft C * _Major_ rework, in an attempt to simplify everything into something more likely to be achievable for 4.6. * Made some simplifying assumptions. * Reduced the scope of some support. * Command emulation is now mostly trivial. * Expanded detail on host setup, allowing other assumptions to be made during emulation. * Many other things lost in the noise of the above. ## Since Draft B * Details of command translation (thanks to Julien and Vijay) * Added background on LPI Translation and Pending tables * Added background on Collections * Settled on `N:N` scheme for vITS:pat's mapping. * Rejigged section nesting a bit. * Since we now thing translation should be cheap, settle on translation at scheduling time. * Lazy `INVALL` and `SYNC` ## Since Draft A * Added discussion of when/where command translation occurs. * Contention on scheduler lock, suggestion to use SOFTIRQ. * Handling of domain shutdown. * More detailed discussion of multiple vs single vits pros/cons. # Introduction ARM systems containing a GIC version 3 or later may contain one or more ITS logical blocks. An ITS is used to route Message Signalled interrupts from devices into an LPI injection on the processor. The following summarises the ITS hardware design and serves as a set of assumptions for the vITS software design. For full details of the ITS see the "GIC Architecture Specification". ## Locality-specific Peripheral Interrupts (`LPI`) This is a new class of message signalled interrupts introduced in GICv3. They occupy the interrupt ID space from `8192..(2^32)-1`. The number of LPIs support by an ITS is exposed via `GITS_TYPER.IDbits` (as number of bits - 1), it may be up to 2^32. _Note_: This field also contains the number of Event IDs supported by the ITS. ### LPI Configuration Table Each LPI has an associated configuration byte in the LPI Configuration Table (managed via the GIC Redistributor and placed at `GICR_PROPBASER` or `GICR_VPROPBASER`). This byte configures: * The LPI's priority; * Whether the LPI is enabled or disabled. Software updates the Configuration Table directly but must then issue an invalidate command (per-device `INV` ITS command, global `INVALL` ITS command or write `GICR_INVLPIR`) for the affect to be guaranteed to become visible (possibly requiring an ITS `SYNC` command to ensure completion of the `INV` or `INVALL`). Note that it is valid for an implementation to reread the configuration table at any time (IOW it is _not_ guaranteed that a change to the LPI Configuration Table won't be visible until an invalidate is issued). ### LPI Pending Table Each LPI also has an associated bit in the LPI Pending Table (managed by the GIC redistributor). This bit signals whether the LPI is pending or not. This region may contain out of date information and the mechanism to synchronise is `IMPLEMENTATION DEFINED`. ## Interrupt Translation Service (`ITS`) ### Device Identifiers Each device using the ITS is associated with a unique "Device Identifier". The device IDs are properties of the implementaiton and are typically described via system firmware, e.g. the ACPI IORT table or via device tree. The number of device ids in a system depends on the implementation and can be discovered via `GITS_TYPER.Devbits`. This field allows an ITS to have up to 2^32 devices. ### Events Each device can generate "Events" (called `ID` in the spec) these correspond to possible interrupt sources in the device (e.g. MSI offset). The maximum number of interrupt sources is device specific. It is usually discovered either from firmware tables (e.g. DT or ACPI) or from bus specific mechanisms (e.g. PCI config space). The maximum number of events ids support by an ITS is exposed via `GITS_TYPER.IDbits` (as number of bits - 1), it may be up to 2^32. _Note_: This field also contains the number of `LPIs` supported by the ITS. ### Interrupt Collections Each interrupt is a member of an "Interrupt Collection". This allows software to manage large numbers of physical interrupts with a small number of commands rather than issuing one command per interrupt. On a system with N processors, the ITS must provide at least N+1 collections. An ITS may support some number of internal collections (indicated by `GITS_TYPER.HCC`) and external ones which require memory provisioned by the Operating System via a `GITS_BASERn` register. ### Target Addresses The Target Address correspond to a specific GIC re-distributor. The format of this field depends on the value of the `GITS_TYPER.PTA` bit: * 1: the base address of the re-distributor target is used * 0: a unique processor number is used. The mapping between the processor affinity value (`MPIDR`) and the processor number is discoverable via `GICR_TYPER.ProcessorNumber`. This value is up to the ITS implementer (`GITS_TYPER` is a read-only register). ### Device Table A Device Table is configured in each ITS which maps incoming device identifiers into an ITS Interrupt Translation Table. ### Interrupt Translation Table (`ITT`) and Collection Table An `Event` generated by a `Device` is translated into an `LPI` via a per-Device Interrupt Translation Table. The structure of this table is described in GIC Spec 4.9.12. The ITS translation table maps the device id of the originating device into a physical interrupt (`LPI`) and an Interrupt Collection. The Collection is in turn looked up in the Collection Table to produce a Target Address, indicating a redistributor (AKA CPU) to which the LPI is delivered. ### OS Provisioned Memory Regions The ITS hardware design provides mechanisms for an ITS to be provided with various blocks of memory by the OS for ITS internal use, this include the per-device ITT (established with `MAPD`) and memory regions for Device Tables, Virtual Processors and Interrupt Collections. Up to 8 such regions can be requested by the ITS and provisioned by the OS via the `GITS_BASERn` registers. ### ITS Configuration The ITS is configured and managed, including establishing and configuring the Translation Tables and Collection Table, via an in memory ring shared between the CPU and the ITS controller. The ring is managed via the `GITS_CBASER` register and indexed by `GITS_CWRITER` and `GITS_CREADR` registers. A processor adds commands to the shared ring and then updates `GITS_CWRITER` to make them visible to the ITS controller. The ITS controller processes commands from the ring and then updates `GITS_CREADR` to indicate the the processor that the command has been processed. Commands are processed sequentially. Commands sent on the ring include operational commands: * Routing interrupts to processors; * Generating interrupts; * Clearing the pending state of interrupts; * Synchronising the command queue and maintenance commands: * Map device/collection/processor; * Map virtual interrupt; * Clean interrupts; * Discard interrupts; The field `GITS_CBASER.Size` encodes the number of 4KB pages minus 0 consisting of the command queue. This field is 8 bits which means the maximum size is 2^8 * 4KB = 1MB. Given that each command is 32 bytes, there is a maximum of 32768 commands in the queue. The ITS provides no specific completion notification mechanism. Completion is monitored by a combination of a `SYNC` command and either polling `GITS_CREADR` or notification via an interrupt generated via the `INT` command. Note that the interrupt generation via `INT` requires an originating device ID to be supplied (which is then translated via the ITS into an LPI). No specific device ID is defined for this purpose and so the OS software is expected to fabricate one. Possible ways of inventing such a device ID are: * Enumerate all device ids in the system and pick another one; * Use a PCI BDF associated with a non-existent device function (such as an unused one relating to the PCI root-bridge) and translate that (via firmware tables) into a suitable device id; * ??? # LPI Handling in Xen ## IRQ descriptors Currently all SGI/PPI/SPI interrupts are covered by a single static array of `struct irq_desc` with ~1024 entries (the maximum interrupt number in that set of interrupt types). The addition of LPIs in GICv3 means that the largest potential interrupt specifier is much larger. Therefore a second dynamically allocated array will be added to cover the range `8192..nr_lpis`. The `irq_to_desc` function will determine which array to use (static `0..1024` or dynamic `8192..end` lpi desc array) based on the input irq number. Two arrays are used to avoid a wasteful allocation covering the unused/unusable) `1024..8191` range. ## Virtual LPI interrupt injection A physical interrupt which is routed to a guest vCPU has the `_IRQ_GUEST` flag set in the `irq_desc` status mask. Such interrupts have an associated instance of `struct irq_guest` which contains the target `struct domain` pointer and virtual interrupt number. In Xen a virtual interrupt (either arising from a physical interrupt or completely virtual) is ultimately injected to a VCPU using the `vgic_vcpu_inject_irq` function, or `vgic_vcpu_inject_spi`. This mechanism will likely need updating to handle the injection of virtual LPIs. In particular rather than `GICD_ITARGERRn` or `GICD_IROUTERn` routing of LPIs is performed via the ITS collections mechanism. This is discussed below (In _vITS_:_Virtual LPI injection_). # Scope The ITS is rather complicated, especially when combined with virtualisation. To simplify things we initially omit the following functionality: - Interrupt -> vCPU -> pCPU affinity. The management of physical vs virtual Collections is a feature of GICv4, thus is omitted in this design for GICv3. Physical interrupts which occur on a pCPU where the target vCPU is not already resident will be forwarded (via IPI) to the correct pCPU for injection via the existing `vgic_vcpu_inject_irq` mechanism (extended to handle LPI injection correctly). - Clearing of the pending state of an LPI under various circumstances (`MOVI`, `DISCARD`, `CLEAR` commands) is not done. This will result in guests seeing some perhaps spurious interrupts. - vITS functionality will only be available on 64-bit ARM hosts, avoiding the need to worry about fast access to guest owned data structures (64-bit uses a direct map). (NB: 32-bit guests on 64-bit hosts can be considered to have access) XXX Can we assume that `GITS_TYPER.Devbits` will be sane, i.e. requiring support for the full 2^32 device ids would require a 32GB device table even for native, which is improbable except on systems with RAM measured in TB. So we can probably assume that Devbits will be appropriate to the size of the system. _Note_: We require per guest device tables, so size of the native Device Table is not the only factor here. XXX Likewise can we assume that `GITS_TYPER.IDbits` will be sane? i.e. that the required ITT table size will be reasonable? # Unresolved Issues Various parts are marked with XXX. Most are minor, but there s one more or less major one, which we may or may not be able to live with for a first implementation: 1. When handling Virtual LPI Configuration Table writes we do not have a Device ID, so we cannot consult the virtual Device Table, ITT etc to determine if the LPI is actually mapped. This means that the physical LPI enable/disable is decoupled from the validity of the virtual ITT. Possibly resulting in spurious LPIs which must be ignored. This issue is discussed further in the relevant places in the text, marked with `XXX UI1`. # pITS ## Assumptions It is assumed that `GITS_TYPER.IDbits` is large enough that there are sufficient LPIs available to cover the sum of the number of possible events generated by each device in the system (that is the sum of the actual events for each bit of hardware, rather than the notional per-device maximum from `GITS_TYPER.Idbits`). This assumption avoids the need to do memory allocations and interrupt routing at run time, e.g. during command processing by allowing us to setup everything up front. ## Driver The physical driver will provide functions for enabling, disabling routing etc a specified interrupt, via the usual Xen APIs for doing such things. This will likely involve interacting with the physical ITS command queue etc. In this document such interactions are considered internal to the driver (i.e. we care that the API to enable an interrupt exists, not how it is implemented). ## Device Table The `pITS` device table will be allocated and given to the pITS at start of day. ## Collections The `pITS` will be configured at start of day with 1 Collection mapped to each physical processor, using the `MAPC` command on the physical ITS. ## Per Device Information Each physical device in the system which can be used together with an ITS (whether using passthrough or not) will have associated with it a data structure: struct its_device { uintNN_t phys_device_id; uintNN_t virt_device_id; unsigned int *events; unsigned int nr_events; struct page_info *pitt; unsigned int nr_pitt_pages }; Where: - `phys_device_id`: The physical device ID of the physical device - `virt_device_id`: The virtual device ID if the device is accessible to a domain - `events`: An array mapping a per-device event number into a physical LPI. - `nr_events`: The number of events which this device is able to generate. - `pitt`, `nr_pitt_pages`: Records allocation of pages for physical ITT (not directly accessible). During its lifetime this structure may be referenced by several different mappings (e.g. physical and virtual device id maps, virtual collection device id). ## Device Discovery/Registration and Configuration Per device information will be discovered based on firmware tables (DT or ACPI) and information provided by dom0 (e.g. registration via PHYSDEVOP_pci_device_add or new custom hypercalls). This information shall include at least: - The Device ID of the device. - The maximum number of Events which the device is capable of generating. When a device is discovered/registered (i.e. when all necessary information is available) then: - `struct its_device` and the embedded `events` array will be allocated (the latter with `nr_events` elements). - The `struct its_device` will be inserted into a mapping (possibly an R-B tree) from its physical Device ID to the `struct its`. - `nr_events` physical LPIs will be allocated and recorded in the `events` array. - An ITT table will be allocated for the device and the appropriate `MAPD` command will be issued to the physical ITS. The location will be recorded in `struct its_device.pitt`. - Each Event which the device may generate will be mapped to the corresponding LPI in the `events` array and a collection, by issuing a series of `MAPVI` commands. Events will be assigned to physical collections in a round-robin fashion. This setup must occur for a given device before any ITS interrupts may be configured for the device and certainly before a device is passed through to a guest. This implies that dom0 cannot use MSIs on a PCI device before having called `PHYSDEVOP_pci_device_add`. # Device Assignment Each domain will have an associated mapping from virtual device ids into a data structure describing the physical device, including a reference to the relevant `struct its_device`. The number of possible device IDs may be large so a simple array or list is likely unsuitable. A tree (e.g. Red-Black may be a suitable data structure. Currently we do not need to perform lookups in this tree on any hot paths. _Note_: In the context of virtualised device ids (especially for domU) it may be possible to arrange for the upper bound on the number of device IDs to be lower allowing a more efficient data structure to be used. This is left for a future improvement. When a device is assigned to a domain (including to domain 0) the mapping for the new virtual device ID will be entered into the tree. During assignment all LPIs associated with the device will be routed to the guest (i.e. `route_irq_to_guest` will be called for each LPI in the `struct its_device.events` array). # vITS A guest domain which is allowed to use ITS functionality (i.e. has been assigned pass-through devices which can generate MSIs) will be presented with a virtualised ITS. Accesses to the vITS registers will trap to Xen and be emulated and a virtualised Command Queue will be provided. Commands entered onto the virtual Command Queue will be translated into physical commands, as described later in this document. There are other aspects to virtualising the ITS (LPI collection management, assignment of LPI ranges to guests, device management). However these are only considered here to the extent needed for describing the vITS emulation. ## Xen interaction with guest OS provisioned vITS memory Memory which the guest provisions to the vITS (ITT via `MAPD` or other tables via `GITS_BASERn`) needs careful handling in Xen. Since Xen cannot trust data in data structures contained in such memory if a guest can trample over it at will. Therefore Xen either must take great care when accessing data structures stored in such memory to validate the contents e.g. not trust that values are within the required limits or it must take steps to restrict guest access to the memory when it is provisioned. Since the data structures are simple and most accessors need to do bounds check anyway it is considered sufficient to simply do the necessary checks on access. Most data structures stored in this shared memory are accessed on the hot interrupt injection path and must therefore be quickly accessbile from within Xen. Since we have restricted vits support to 64-bit hosts only `map_domain_page` is fast enough to be used on the fly and therefore we do not need to be concerned about unbounded amounts of permanently mapped memory consumed by each `MAPD` command. Although `map_domain_page` is fast, `p2m_lookup` (translation from IPA to PA) is not necessarily so. For now we accept this, as a future extension a sparse mapping of the guest device table in vmap space could be considered, with limits on the total amount of vmap space which we allow each domain to consume. ## vITS properties The vITS implementation shall have: - `GITS_TYPER.HCC == nr_vcpus + 1`. - `GITS_TYPER.PTA == 0`. Target addresses are linear processor numbers. - `GITS_TYPER.Devbits == See below`. - `GITS_TYPER.IDbits == See below`. - `GITS_TYPER.ITT Entry Size == 7`, meaning 8 bytes, which is the size of `struct vitt` (defined below). `GITS_TYPER.Devbits` and `GITS_TYPER.Idbits` will need to be chosen to reflect the host and guest configurations (number of LPIs, maximum device ID etc). Other fields (not mentioned here) will be set to some sensible (or mandated) value. The `GITS_BASER0` will be setup to request sufficient memory for a device table consisting of entries of: struct vdevice_table { uint64_t vitt_ipa; uint32_t vitt_size; uint32_t padding; }; BUILD_BUG_ON(sizeof(struct vdevice_table) != 16); On write to `GITS_BASE0` the relevant details of the Device Table (IPA, size, cache attributes to use when mapping) will be recorded in `struct domain`. All other `GITS_BASERn.Valid == 0`. ## vITS to pITS mapping A physical system may have multiple physical ITSs. With the simplified vits command model presented here only a single `vits` is required. In the future a more complex arrangement may be desired. Since the choice of model is internal to the hypervisor/tools and is communicated to the guest via firmware tables we are not tied to this model as an ABI if we decide to change. ## LPI Configuration Table Virtualisation A guest's write accesses to its LPI Configuration Table (which is just an area of guest RAM which the guest has nominated) will be trapped to the hypervisor, using stage 2 MMU permissions, in order for changes to be propagated into the host interrupt configuration. On write `bit[0]` of the written byte is the enable/disable state for the irq and is handled thus: lpi = (addr - table_base); if ( byte & 1 ) enable_irq(lpi); else disable_irq(lpi); Note that in the context of this emulation we do not have access to a Device ID, and therefore cannot make decisions based on whether the LPI/event has been `MAPD`d etc. In any case we have an `lpi` in our hand and not an `event`, IOW we would need to do a _reverse_ lookup in the ITT. LPI priority (the remaining bits in the written byte) is currently ignored. ## LPI Pending Table Virtualisation XXX Can we simply ignore this? 4.8.5 suggests it is not necessarily in sync and the mechanism to force a sync is `IMPLEMENTATION DEFINED`. ## Device Table Virtualisation The IPA, size and cacheability attributes of the guest device table will be recorded in `struct domain` upon write to `GITS_BASER0`. In order to lookup an entry for `device`: define {get,set}_vdevice_entry(domain, device, struct device_table *entry): offset = device*sizeof(struct vdevice_table) if offset > <DT size>: error dt_entry = <DT base IPA> + device*sizeof(struct vdevice_table) page = p2m_lookup(domain, dt_entry, p2m_ram) if !page: error /* nb: non-RAM pages, e.g. grant mappings, * are rejected by this lookup */ dt_mapping = map_domain_page(page) if (set) dt_mapping[<appropriate page offset from device>] = *entry; else *entry = dt_mapping[<appropriate page offset>]; unmap_domain_page(dt_mapping) Since everything is based upon IPA (guest addresses) a malicious guest can only reference its own RAM here. ## ITT Virtualisation The location of a VITS will have been recorded in the domain Device Table by a `MAPI` or `MAPVI` command and is looked up as above. The `vitt` is a `struct vitt`: struct vitt { uint16_t valid:1; uint16_t pad:15; uint16_t collection; uint32_t vpli; }; BUILD_BUG_ON(sizeof(struct vitt) != 8); A lookup occurs similar to for a device table, the offset is range checked against the `vitt_size` from the device table. To lookup `event` on `device`: define {get,set}_vitt_entry(domain, device, event, struct vitt *entry): get_vdevice_entry(domain, device, &dt) offset = device*sizeof(struct vitt); if offset > dt->vitt_size: error vitt_entry = dt->vita_ipa + event*sizeof(struct vitt) page = p2m_lookup(domain, vitt_entry, p2m_ram) if !page: error /* nb: non-RAM pages, e.g. grant mappings, * are rejected by this lookup */ vitt_mapping = map_domain_page(page) if (set) vitt_mapping[<appropriate page offset from event>] = *entry; else *entry = vitt_mapping[<appropriate page offset>]; unmap_domain_page(entry) Again since this is IPA based a malicious guest can only point things to its own ram. ## Collection Table Virtualisation A pointer to a dynamically allocated array `its_collections` mapping collection ID to vcpu ID will be added to `struct domain`. The array shall have `nr_vcpus + 1` entries and resets to ~0 (or another explicitly invalid vpcu nr). ## Virtual LPI injection As discussed above the `vgic_vcpu_inject_irq` functionality will need to be extended to cover this new case, most likely via a new `vgic_vcpu_inject_lpi` frontend function. `vgic_vcpu_inject_lpi` receives a `struct domain *` and a virtual interrupt number (corresponding to a vLPI) and needs to figure out which vcpu this should map to. To do this it must look up the Collection ID associated (via the vITS) with that LPI. Proposal: Add a new `its_device` field to `struct irq_guest`, a pointer to the associated `struct its_device`. The existing `struct irq_guest.virq` field contains the event ID (perhaps use a `union` to give a more appropriate name) and _not_ the virtual LPI. Injection then consists of: d = irq_guest->domain virq = irq_guest->virq its_device = irq_guest->its_device get_vitt_entry(d, its_device->virt_device_id, virq, &vitt) vcpu = d->its_collections[vitt.collection] vgic_vcpu_inject_irq(d, &d->vcpus[vcpu]) In the event that the IIT is not `MAPD`d, or the Event has not been `MAPI`/`MAPVI`d or the collection is not `MAPC`d here the interrupt is simply ignored. Note that this can happen because LPI mapping is decoupled from LPI enablement. In particular writes to the LPI Configuration Table do not include a Device ID and therefore cannot make decisions based on the ITT. XXX UI1 if we could find a reliable way to reenable then could potentially disable LPI on error and reenable later (taking a spurious Xen interrupt for each possible vits misconfiguration). IOW if the interrupt is invalid for each of these reasons we can disable and reenable as described: - Not `MAPD`d -- on `MAPD` enable all associate LPIs which are enabled in LPI CFG Table. - Not `MAPI`/`MAPVI`d -- on `MAPI`/`MAPVI` enable LPI if enabled in CFG Table. - Not `MAPC`d -- tricky. Need to know lists of LPIs associated with a virtual collection. A `list_head` in `struct irq_guest` implies a fair bit of overhead, number of LPIs per collection is the total number of LPIs (guest could assign all to the same collection). Could walk entire LPI CFG and reenable all set LPIs, might involve walking over several KB of memory though. Could inject unmapped collections to vcpu0, forcing the guest to deal with the spurious interrupts? XXX Only the `MAPC` issue seems problematic. Is this a critical issue or can we get away with it? ## Command Queue Virtualisation The command translation/emulation in this design has been arranged to be as cheap as possible (e.g. in many cases the actions are NOPs), avoiding previous concerns about the length of time which an emulated write to a `CWRITER` register may block the vcpu. The vits will simply track its reader and writer pointers. On write to `CWRITER` it will immediately and synchronously process all commands in the queue and update its state accordingly. It might be possible to implement a rudimentary form of preemption by periodically (as determined by `hypercall_preempt_check()`) returning to the guest without incrementing PC but with updated internal `CREADR` state, meaning it will reexecute the write to `CWRITER` and we can pickup where we left off for another iteration. This at least lets us schedule other vcpus etc and prevents a monopoly. ## ITS Command Translation This section is based on the section 5.13 of GICv3 specification (PRD03-GENC-010745 24.0) and provides concrete ideas of how this can be interpreted for Xen. The ITS provides 12 commands in order to manage interrupt collections, devices and interrupts. Possible command parameters are: - Device ID (`Device`) (called `device` in the spec). - Event ID (`Event`) (called `ID` in the spec). This is an index into a devices `ITT`. - Collection ID (`Collection`) (called `collection` in the spec) - LPI ID (`LPI`) (called `pID` in the spec) - Target Address (`TA`) (called `TA` in the spec`) These parameters need to be validated and translated from Virtual (`v` prefix) to Physical (`p` prefix). Note, we differ from the naming in the GIC spec for clarity, in particular we use `Event` not `ID` and `LPI` not `pID` to reduce confusion, especially when `v` and `p` suffixes are used due to virtualisation. ### Parameter Validation / Translation Each command contains parameters that needs to be validated before any usage in Xen or passing to the hardware. #### Device ID (`Device`) Corresponding ITT obtained by looking up as described above. The physical `struct its_device` can be found by looking up in the domain's device map. If lookup fails or the resulting device table entry is invalid then the Device is invalid. #### Event ID (`Event`) Validated against emulated `GITS_TYPER.IDbits`. It is not necessary to translate a `vEvent`. #### LPI (`LPI`) Validated against emulated `GITS_TYPER.IDbits`. It is not necessary to translate a `vLPI` into a `pLPI` since the tables all contain `vLPI`. (Translation from `pLPI` to `vLPI` happens via `struct irq_guest` when we receive the IRQ). #### Interrupt Collection (`Collection`) The `Collection` is validated against the size of the per-domain `its_collections` array (i.e. nr_vcpus + 1) and then translated by a simple lookup in that array. vcpu_nr = d->its_collections[Collection] A result > `nr_cpus` is invalid #### Target Address (`TA`) This parameter is used in commands which manage collections. It is a unique identifier per processor. We have chosen to implement `GITS_TYPER.PTA` as 0, hence `vTA` simply corresponds to the `vcpu_id`, so only needs bounds checking against `nr_vcpus`. ### Commands To be read with reference to spec for each command (which includes error checks etc which are omitted here). It is assumed that inputs will be bounds and validity checked as described above, thus error handling is omitted for brevity (i.e. if get and/or set fail then so be it). In general invalid commands are simply ignored. #### `MAPD`: Map a physical device to an ITT. _Format_: `MAPD Device, Valid, ITT Address, ITT Size`. _Spec_: 5.13.11 `MAPD` is sent with `Valid` bit set if the mapping is to be added and reset when mapping is removed. The domain's device table is updated with the provided information. The `vitt_mapd` field is set according to the `Valid` flag in the command: dt_entry.vitt_ipa = ITT Address dt_entry.vitt_size = ITT Size set_vdevice_entry(current->domain, Device, &dt_entry) XXX UI1 Poss. handle dis/enabling pLPI, if mapping now completely (in)valid. XXX Should we validate size? Guests own fault if we run off the end of a table later? If we did start to consider permanent mappings then we _would_ care. #### `MAPC`: Map an interrupt collection to a target processor _Format_: `MAPC Collection, TA` _Spec_: 5.13.12 The updated `vTA` (a vcpu number) is recorded in the `its_collections` array of the domain struct: d->its_collections[Collection] = TA XXX UI1 Poss. handle dis/enabling pLPI, if mapping now completely (in)valid. #### `MAPI`: Map an interrupt to an interrupt collection. _Format_: `MAPI Device, LPI, Collection` _Spec_: 5.13.13 After validation: vitt.valid = True vitt.collection = Collection vitt.vpli = LPI set_vitt_entry(current->domian, Device, LPI, &vitt) XXX UI1 Poss. handle dis/enabling pLPI, if mapping now completely (in)valid. #### `MAPVI`: Map an input identifier to a physical interrupt and an interrupt collection. Format: `MAPVI Device, Event, LPI, Collection` vitt.valid = True vitt.collection = Collection vitt.vpli = LPI set_vitt_entry(current->odmian, Device, Event, &vitt) XXX UI1 Poss. handle dis/enabling pLPI, if mapping now completely (in)valid. #### `MOVI`: Redirect interrupt to an interrupt collection _Format_: `MOVI Device, Event, Collection` _Spec_: 5.13.15 get_vitt_entry(current->domain, Device, Event, &vitt) vitt.collection = Collection set_vitt_entry(current->domain, Device, Event, &vitt) XXX consider helper which sets field without mapping/unmapping twice. This command is supposed to move any pending interrupts associated with `Event` to the vcpu implied by the new `Collection`, which is tricky. For now we ignore this requirement (as we do for `GICD_IROUTERn` and `GICD_TARGETRn` for other interrupt types). #### `DISCARD`: Discard interrupt requests _Format_: `DISCARD Device, Event` _Spec_: 5.13.16 get_vitt_entry(current->domain, Device, Event, &vitt) vitt.valid = False set_vitt_entry(current->domain, Device, Event, &vitt) XXX UI1 Poss. handle dis/enabling pLPI, if mapping now completely (in)valid. XXX consider helper which sets field without mapping/unmapping twice. This command is supposed to clear the pending state of any associated interrupt. This requirement is ignored (guest may see a spurious interrupt). #### `INV`: Clean any caches associated with interrupt _Format_: `INV Device, Event` _Spec_: 5.13.17 Since LPI Configuration table updates are handled synchronously in the respective trap handler there is nothing to do here. #### `INVALL`: Clean any caches associated with an interrupt collection _Format_: `INVALL Collection` _Spec_: 5.13.19 Since LPI Configuration table updates are handled synchronously there is nothing to do here. #### `INT`: Generate an interrupt _Format_: `INT Device, Event` _Spec_: 5.13.20 The `vitt` entry corresonding to `Device,Event` is looked up and then: get_vitt_entry(current->domain, Device, Event, &vitt) vgic_vcpu_inject_lpi(current->domain, vitt.vlpi) XXX Where (Device,Event) is real may need consideration of interactions with real LPIs being delivered: Julien had concerns about Xen's internal IRQ State tracking. if this is a problem then may need changes to IRQ state tracking, or to inject as a real IRQ and let physical IRQ injection handle it, or write to `GICR_SETLPIR`? #### `CLEAR`: Clear the pending state of an interrupt _Format_: `CLEAR Device, Event` _Spec_: 5.13.21 Should clear pending state of LPI. Ignore (guest may see a spurious interrupt). #### `SYNC`: Wait for completion of any outstanding ITS actions for collection _Format_: `SYNC TA` _Spec_: 5.13.22 This command can be ignored. # GICv4 Direct Interrupt Injection GICv4 will directly mark the LPIs pending in the virtual pending table which is per-redistributor (i.e per-vCPU). LPIs will be received by the guest the same way as an SPIs. I.e trap in IRQ mode then read ICC_IAR1_EL1 (for GICv3). Therefore GICv4 will not require one vITS per pITS. # Event Channels It has been proposed that it might be nice to inject event channels as LPIs in the future. Whether or not that would involve any sort of vITS is unclear, but if it did then it would likely be a separate emulation to the vITS emulation used with a pITS and as such is not considered further here. # Glossary * _MSI_: Message Signalled Interrupt * _ITS_: Interrupt Translation Service * _GIC_: Generic Interrupt Controller * _LPI_: Locality-specific Peripheral Interrupt # References "GIC Architecture Specification" PRD03-GENC-010745 24.0. "IO Remapping Table System Software on ARMÂ Platforms" ARM DEN 0049A. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.