[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v5 02/11] vpci: introduce basic handlers to trap accesses to the PCI config space
> -----Original Message----- > From: Roger Pau Monne [mailto:roger.pau@xxxxxxxxxx] > Sent: 14 August 2017 15:29 > To: xen-devel@xxxxxxxxxxxxxxxxxxxx > Cc: boris.ostrovsky@xxxxxxxxxx; konrad.wilk@xxxxxxxxxx; Roger Pau Monne > <roger.pau@xxxxxxxxxx>; Ian Jackson <Ian.Jackson@xxxxxxxxxx>; Wei Liu > <wei.liu2@xxxxxxxxxx>; Jan Beulich <jbeulich@xxxxxxxx>; Andrew Cooper > <Andrew.Cooper3@xxxxxxxxxx>; Paul Durrant <Paul.Durrant@xxxxxxxxxx> > Subject: [PATCH v5 02/11] vpci: introduce basic handlers to trap accesses to > the PCI config space > > This functionality is going to reside in vpci.c (and the corresponding > vpci.h header), and should be arch-agnostic. The handlers introduced > in this patch setup the basic functionality required in order to trap > accesses to the PCI config space, and allow decoding the address and > finding the corresponding handler that should handle the access > (although no handlers are implemented). > > Note that the traps to the PCI IO ports registers (0xcf8/0xcfc) are > setup inside of a x86 HVM file, since that's not shared with other > arches. > > A new XEN_X86_EMU_VPCI x86 domain flag is added in order to signal Xen > whether a domain should use the newly introduced vPCI handlers, this > is only enabled for PVH Dom0 at the moment. > > A very simple user-space test is also provided, so that the basic > functionality of the vPCI traps can be asserted. This has been proven > quite helpful during development, since the logic to handle partial > accesses or accesses that expand across multiple registers is not > trivial. > > The handlers for the registers are added to a linked list that's keep > sorted at all times. Both the read and write handlers support accesses > that expand across multiple emulated registers and contain gaps not > emulated. > > Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx> > --- > Cc: Ian Jackson <ian.jackson@xxxxxxxxxxxxx> > Cc: Wei Liu <wei.liu2@xxxxxxxxxx> > Cc: Jan Beulich <jbeulich@xxxxxxxx> > Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > Cc: Paul Durrant <paul.durrant@xxxxxxxxxx> > --- > Changes since v4: > * User-space test harness: > - Do not redirect the output of the test. > - Add main.c and emul.h as dependencies of the Makefile target. > - Use the same rule to modify the vpci and list headers. > - Remove underscores from local macro variables. > - Add _check suffix to the test harness multiread function. > - Change the value written by every different size in the multiwrite > test. > - Use { } to initialize the r16 and r20 arrays (instead of { 0 }). > - Perform some of the read checks with the local variable directly. > - Expand some comments. > - Implement a dummy rwlock. > * Hypervisor code: > - Guard the linker script changes with CONFIG_HAS_PCI. > - Rename vpci_access_check to vpci_access_allowed and make it return > bool. > - Make hvm_pci_decode_addr return the register as return value. > - Use ~3 instead of 0xfffc to remove the register offset when > checking accesses to IO ports. > - s/head/prev in vpci_add_register. > - Add parentheses around & in vpci_add_register. > - Fix register removal. > - Change the BUGs in vpci_{read/write}_hw helpers to > ASSERT_UNREACHABLE. > - Make merge_result static and change the computation of the mask to > avoid using a uint64_t. > - Modify vpci_read to only read from hardware the not-emulated gaps. > - Remove the vpci_val union and use a uint32_t instead. > - Change handler read type to return a uint32_t instead of modifying > a variable passed by reference. > - Constify the data opaque parameter of read handlers. > - Change the size parameter of the vpci_{read/write} functions to > unsigned int. > - Place the array of initialization handlers in init.rodata or > .rodata depending on whether late-hwdom is enabled. > - Remove the pci_devs lock, assume the Dom0 is well behaved and won't > remove the device while trying to access it. > - Change the recursive spinlock into a rw lock for performance > reasons. > > Changes since v3: > * User-space test harness: > - Fix spaces in container_of macro. > - Implement a dummy locking functions. > - Remove 'current' macro make current a pointer to the statically > allocated vpcu. > - Remove unneeded parentheses in the pci_conf_readX macros. > - Fix the name of the write test macro. > - Remove the dummy EXPORT_SYMBOL macro (this was needed by the RB > code only). > - Import the max macro. > - Test all possible read/write size combinations with all possible > emulated register sizes. > - Introduce a test for register removal. > * Hypervisor code: > - Use a sorted list in order to store the config space handlers. > - Remove some unneeded 'else' branches. > - Make the IO port handlers always return X86EMUL_OKAY, and set the > data to all 1's in case of read failure (write are simply ignored). > - In hvm_select_ioreq_server reuse local variables when calling > XEN_DMOP_PCI_SBDF. > - Store the pointers to the initialization functions in the .rodata > section. > - Do not ignore the return value of xen_vpci_add_handlers in > setup_one_hwdom_device. > - Remove the vpci_init macro. > - Do not hide the pointers inside of the vpci_{read/write}_t > typedefs. > - Rename priv_data to private in vpci_register. > - Simplify checking for register overlap in vpci_register_cmp. > - Check that the offset and the length match before removing a > register in xen_vpci_remove_register. > - Make vpci_read_hw return a value rather than storing it in a > pointer passed by parameter. > - Handler dispatcher functions vpci_{read/write} no longer return an > error code, errors on reads/writes should be treated like hardware > (writes ignored, reads return all 1's or garbage). > - Make sure pcidevs is locked before calling pci_get_pdev_by_domain. > - Use a recursive spinlock for the vpci lock, so that spin_is_locked > checks that the current CPU is holding the lock. > - Make the code less error-chatty by removing some of the printk's. > - Pass the slot and the function as separate parameters to the > handler dispatchers (instead of passing devfn). > - Allow handlers to be registered with either a read or write > function only, the missing handler will be replaced by a dummy > handler (writes ignored, reads return 1's). > - Introduce PCI_CFG_SPACE_* defines from Linux. > - Simplify the handler dispatchers by removing the recursion, now the > dispatchers iterate over the list of sorted handlers and call them > in order. > - Remove the GENMASK_BYTES, SHIFT_RIGHT_BYTES and ADD_RESULT > macros, > and instead provide a merge_result function in order to merge a > register output into a partial result. > - Rename the fields of the vpci_val union to u8/u16/u32. > - Remove the return values from the read/write handlers, errors > should be handled internally and signaled as would be done on > native hardware. > - Remove the usage of the GENMASK macro. > > Changes since v2: > - Generalize the PCI address decoding and use it for IOREQ code also. > > Changes since v1: > - Allow access to cross a word-boundary. > - Add locking. > - Add cleanup to xen_vpci_add_handlers in case of failure. > --- > .gitignore | 3 + > tools/libxl/libxl_x86.c | 2 +- > tools/tests/Makefile | 1 + > tools/tests/vpci/Makefile | 37 ++++ > tools/tests/vpci/emul.h | 128 +++++++++++ > tools/tests/vpci/main.c | 314 +++++++++++++++++++++++++++ > xen/arch/arm/xen.lds.S | 10 + > xen/arch/x86/domain.c | 18 +- > xen/arch/x86/hvm/hvm.c | 2 + > xen/arch/x86/hvm/io.c | 118 +++++++++- > xen/arch/x86/setup.c | 3 +- > xen/arch/x86/xen.lds.S | 10 + > xen/drivers/Makefile | 2 +- > xen/drivers/passthrough/pci.c | 9 +- > xen/drivers/vpci/Makefile | 1 + > xen/drivers/vpci/vpci.c | 443 > ++++++++++++++++++++++++++++++++++++++ > xen/include/asm-x86/domain.h | 1 + > xen/include/asm-x86/hvm/domain.h | 3 + > xen/include/asm-x86/hvm/io.h | 3 + > xen/include/public/arch-x86/xen.h | 5 +- > xen/include/xen/pci.h | 3 + > xen/include/xen/pci_regs.h | 8 + > xen/include/xen/vpci.h | 80 +++++++ > 23 files changed, 1194 insertions(+), 10 deletions(-) > create mode 100644 tools/tests/vpci/Makefile > create mode 100644 tools/tests/vpci/emul.h > create mode 100644 tools/tests/vpci/main.c > create mode 100644 xen/drivers/vpci/Makefile > create mode 100644 xen/drivers/vpci/vpci.c > create mode 100644 xen/include/xen/vpci.h > [snip] > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c > index 6cb903def5..cc73df8dc7 100644 > --- a/xen/arch/x86/hvm/hvm.c > +++ b/xen/arch/x86/hvm/hvm.c > @@ -36,6 +36,7 @@ > #include <xen/rangeset.h> > #include <xen/monitor.h> > #include <xen/warning.h> > +#include <xen/vpci.h> > #include <asm/shadow.h> > #include <asm/hap.h> > #include <asm/current.h> > @@ -629,6 +630,7 @@ int hvm_domain_initialise(struct domain *d, unsigned > long domcr_flags, > d->arch.hvm_domain.io_bitmap = hvm_io_bitmap; > > register_g2m_portio_handler(d); > + register_vpci_portio_handler(d); > > hvm_ioreq_init(d); > > diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c > index 074cba89da..c3b68eb257 100644 > --- a/xen/arch/x86/hvm/io.c > +++ b/xen/arch/x86/hvm/io.c > @@ -25,6 +25,7 @@ > #include <xen/trace.h> > #include <xen/event.h> > #include <xen/hypercall.h> > +#include <xen/vpci.h> > #include <asm/current.h> > #include <asm/cpufeature.h> > #include <asm/processor.h> > @@ -260,7 +261,7 @@ unsigned int hvm_pci_decode_addr(unsigned int cf8, > unsigned int addr, > unsigned int *bus, unsigned int *slot, > unsigned int *func) > { > - unsigned long bdf; > + unsigned int bdf; Shouldn't this be folded into the previous patch where you introduce this function? > > ASSERT(CF8_ENABLED(cf8)); > > @@ -275,6 +276,121 @@ unsigned int hvm_pci_decode_addr(unsigned int > cf8, unsigned int addr, > return CF8_ADDR_LO(cf8) | (addr & 3); > } > > +/* Do some sanity checks. */ > +static bool vpci_access_allowed(unsigned int reg, unsigned int len) > +{ > + /* Check access size. */ > + if ( len != 1 && len != 2 && len != 4 ) > + return false; > + > + /* Check that access is size aligned. */ > + if ( (reg & (len - 1)) ) > + return false; > + > + return true; > +} > + > +/* vPCI config space IO ports handlers (0xcf8/0xcfc). */ > +static bool vpci_portio_accept(const struct hvm_io_handler *handler, > + const ioreq_t *p) > +{ > + return (p->addr == 0xcf8 && p->size == 4) || (p->addr & ~3) == 0xcfc; > +} > + > +static int vpci_portio_read(const struct hvm_io_handler *handler, > + uint64_t addr, uint32_t size, uint64_t *data) > +{ > + struct domain *d = current->domain; > + unsigned int bus, slot, func, reg; > + > + *data = ~(uint64_t)0; > + > + vpci_rlock(d); > + if ( addr == 0xcf8 ) > + { > + ASSERT(size == 4); > + *data = d->arch.hvm_domain.pci_cf8; > + vpci_runlock(d); > + return X86EMUL_OKAY; > + } > + if ( !CF8_ENABLED(d->arch.hvm_domain.pci_cf8) ) > + { > + vpci_runlock(d); > + return X86EMUL_OKAY; > + } > + > + reg = hvm_pci_decode_addr(d->arch.hvm_domain.pci_cf8, addr, &bus, > &slot, > + &func); > + > + if ( !vpci_access_allowed(reg, size) ) > + { > + vpci_runlock(d); > + return X86EMUL_OKAY; > + } > + > + *data = vpci_read(0, bus, slot, func, reg, size); > + vpci_runlock(d); > + > + return X86EMUL_OKAY; > +} > + > +static int vpci_portio_write(const struct hvm_io_handler *handler, > + uint64_t addr, uint32_t size, uint64_t data) > +{ > + struct domain *d = current->domain; > + unsigned int bus, slot, func, reg; > + > + vpci_wlock(d); > + if ( addr == 0xcf8 ) > + { > + ASSERT(size == 4); > + d->arch.hvm_domain.pci_cf8 = data; > + vpci_wunlock(d); > + return X86EMUL_OKAY; > + } > + if ( !CF8_ENABLED(d->arch.hvm_domain.pci_cf8) ) > + { > + vpci_wunlock(d); > + return X86EMUL_OKAY; > + } > + > + reg = hvm_pci_decode_addr(d->arch.hvm_domain.pci_cf8, addr, &bus, > &slot, > + &func); > + > + if ( !vpci_access_allowed(reg, size) ) > + { > + vpci_wunlock(d); > + return X86EMUL_OKAY; > + } > + > + vpci_write(0, bus, slot, func, reg, size, data); > + vpci_wunlock(d); > + > + return X86EMUL_OKAY; > +} > + > +static const struct hvm_io_ops vpci_portio_ops = { > + .accept = vpci_portio_accept, > + .read = vpci_portio_read, > + .write = vpci_portio_write, > +}; > + > +void register_vpci_portio_handler(struct domain *d) > +{ > + struct hvm_io_handler *handler; > + > + if ( !has_vpci(d) ) > + return; > + > + handler = hvm_next_io_handler(d); > + if ( !handler ) > + return; > + > + rwlock_init(&d->arch.hvm_domain.vpci_lock); > + handler->type = IOREQ_TYPE_PIO; > + handler->ops = &vpci_portio_ops; > +} > + > /* > * Local variables: > * mode: C > diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c > index db5df6956d..5b2c0e3fc3 100644 > --- a/xen/arch/x86/setup.c > +++ b/xen/arch/x86/setup.c > @@ -1566,7 +1566,8 @@ void __init noreturn __start_xen(unsigned long > mbi_p) > domcr_flags |= DOMCRF_hvm | > ((hvm_funcs.hap_supported && !opt_dom0_shadow) ? > DOMCRF_hap : 0); > - config.emulation_flags = > XEN_X86_EMU_LAPIC|XEN_X86_EMU_IOAPIC; > + config.emulation_flags = > XEN_X86_EMU_LAPIC|XEN_X86_EMU_IOAPIC| > + XEN_X86_EMU_VPCI; > } > > /* Create initial domain 0. */ > diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S > index ff08bbe42a..af1b30cb2b 100644 > --- a/xen/arch/x86/xen.lds.S > +++ b/xen/arch/x86/xen.lds.S > @@ -76,6 +76,11 @@ SECTIONS > > __2M_rodata_start = .; /* Start of 2M superpages, mapped RO. */ > .rodata : { > +#if defined(CONFIG_HAS_PCI) && defined(CONFIG_LATE_HWDOM) > + __start_vpci_array = .; > + *(.rodata.vpci) > + __end_vpci_array = .; > +#endif > _srodata = .; > /* Bug frames table */ > __start_bug_frames = .; > @@ -167,6 +172,11 @@ SECTIONS > _einittext = .; > } :text > .init.data : { > +#if defined(CONFIG_HAS_PCI) && !defined(CONFIG_LATE_HWDOM) > + __start_vpci_array = .; > + *(.init.rodata.vpci) > + __end_vpci_array = .; > +#endif > *(.init.rodata) > *(.init.rodata.rel) > *(.init.rodata.str*) > diff --git a/xen/drivers/Makefile b/xen/drivers/Makefile > index 19391802a8..d51c766453 100644 > --- a/xen/drivers/Makefile > +++ b/xen/drivers/Makefile > @@ -1,6 +1,6 @@ > subdir-y += char > subdir-$(CONFIG_HAS_CPUFREQ) += cpufreq > -subdir-$(CONFIG_HAS_PCI) += pci > +subdir-$(CONFIG_HAS_PCI) += pci vpci > subdir-$(CONFIG_HAS_PASSTHROUGH) += passthrough > subdir-$(CONFIG_ACPI) += acpi > subdir-$(CONFIG_VIDEO) += video > diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c > index 27bdb7163c..54326cf0b8 100644 > --- a/xen/drivers/passthrough/pci.c > +++ b/xen/drivers/passthrough/pci.c > @@ -30,6 +30,7 @@ > #include <xen/radix-tree.h> > #include <xen/softirq.h> > #include <xen/tasklet.h> > +#include <xen/vpci.h> > #include <xsm/xsm.h> > #include <asm/msi.h> > #include "ats.h" > @@ -1030,9 +1031,10 @@ static void __hwdom_init > setup_one_hwdom_device(const struct setup_hwdom *ctxt, > struct pci_dev *pdev) > { > u8 devfn = pdev->devfn; > + int err; > > do { > - int err = ctxt->handler(devfn, pdev); > + err = ctxt->handler(devfn, pdev); > > if ( err ) > { > @@ -1045,6 +1047,11 @@ static void __hwdom_init > setup_one_hwdom_device(const struct setup_hwdom *ctxt, > devfn += pdev->phantom_stride; > } while ( devfn != pdev->devfn && > PCI_SLOT(devfn) == PCI_SLOT(pdev->devfn) ); > + > + err = vpci_add_handlers(pdev); > + if ( err ) > + printk(XENLOG_ERR "setup of vPCI for d%d failed: %d\n", > + ctxt->d->domain_id, err); > } > > static int __hwdom_init _setup_hwdom_pci_devices(struct pci_seg *pseg, > void *arg) > diff --git a/xen/drivers/vpci/Makefile b/xen/drivers/vpci/Makefile > new file mode 100644 > index 0000000000..840a906470 > --- /dev/null > +++ b/xen/drivers/vpci/Makefile > @@ -0,0 +1 @@ > +obj-y += vpci.o > diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c > new file mode 100644 > index 0000000000..f63de97e89 > --- /dev/null > +++ b/xen/drivers/vpci/vpci.c > @@ -0,0 +1,443 @@ > +/* > + * Generic functionality for handling accesses to the PCI configuration space > + * from guests. > + * > + * Copyright (C) 2017 Citrix Systems R&D > + * > + * This program is free software; you can redistribute it and/or > + * modify it under the terms and conditions of the GNU General Public > + * License, version 2, as published by the Free Software Foundation. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > GNU > + * General Public License for more details. > + * > + * You should have received a copy of the GNU General Public > + * License along with this program; If not, see > <http://www.gnu.org/licenses/>. > + */ > + > +#include <xen/sched.h> > +#include <xen/vpci.h> > + > +extern vpci_register_init_t *const __start_vpci_array[]; > +extern vpci_register_init_t *const __end_vpci_array[]; > +#define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array) > + > +/* Internal struct to store the emulated PCI registers. */ > +struct vpci_register { > + vpci_read_t *read; > + vpci_write_t *write; > + unsigned int size; > + unsigned int offset; > + void *private; > + struct list_head node; > +}; > + > +int __hwdom_init vpci_add_handlers(struct pci_dev *pdev) > +{ > + unsigned int i; > + int rc = 0; > + > + if ( !has_vpci(pdev->domain) ) > + return 0; > + > + pdev->vpci = xzalloc(struct vpci); > + if ( !pdev->vpci ) > + return -ENOMEM; > + > + INIT_LIST_HEAD(&pdev->vpci->handlers); > + > + for ( i = 0; i < NUM_VPCI_INIT; i++ ) > + { > + rc = __start_vpci_array[i](pdev); > + if ( rc ) > + break; > + } > + > + if ( rc ) > + { > + while ( !list_empty(&pdev->vpci->handlers) ) > + { > + struct vpci_register *r = list_first_entry(&pdev->vpci->handlers, > + struct vpci_register, > + node); > + > + list_del(&r->node); > + xfree(r); > + } > + xfree(pdev->vpci); > + } > + > + return rc; > +} > + > +static int vpci_register_cmp(const struct vpci_register *r1, > + const struct vpci_register *r2) > +{ > + /* Return 0 if registers overlap. */ > + if ( r1->offset < r2->offset + r2->size && > + r2->offset < r1->offset + r1->size ) > + return 0; > + if ( r1->offset < r2->offset ) > + return -1; > + if ( r1->offset > r2->offset ) > + return 1; > + > + ASSERT_UNREACHABLE(); > + return 0; > +} > + > +/* Dummy hooks, writes are ignored, reads return 1's */ > +static uint32_t vpci_ignored_read(struct pci_dev *pdev, unsigned int reg, > + const void *data) > +{ > + return ~(uint32_t)0; > +} > + > +static void vpci_ignored_write(struct pci_dev *pdev, unsigned int reg, > + uint32_t val, void *data) > +{ > +} > + > +int vpci_add_register(const struct pci_dev *pdev, vpci_read_t > *read_handler, > + vpci_write_t *write_handler, unsigned int offset, > + unsigned int size, void *data) > +{ > + struct list_head *prev; > + struct vpci_register *r; > + > + /* Some sanity checks. */ > + if ( (size != 1 && size != 2 && size != 4) || > + offset >= PCI_CFG_SPACE_EXP_SIZE || (offset & (size - 1)) || > + (!read_handler && !write_handler) ) > + return -EINVAL; > + > + r = xmalloc(struct vpci_register); > + if ( !r ) > + return -ENOMEM; > + > + r->read = read_handler ?: vpci_ignored_read; > + r->write = write_handler ?: vpci_ignored_write; > + r->size = size; > + r->offset = offset; > + r->private = data; > + > + vpci_wlock(pdev->domain); > + > + /* The list of handlers must be keep sorted at all times. */ > + list_for_each ( prev, &pdev->vpci->handlers ) > + { > + const struct vpci_register *this = > + list_entry(prev, const struct vpci_register, node); > + int cmp = vpci_register_cmp(r, this); > + > + if ( cmp < 0 ) > + break; > + if ( cmp == 0 ) > + { > + vpci_wunlock(pdev->domain); > + xfree(r); > + return -EEXIST; > + } > + } > + > + list_add_tail(&r->node, prev); > + vpci_wunlock(pdev->domain); > + > + return 0; > +} > + > +int vpci_remove_register(const struct pci_dev *pdev, unsigned int offset, > + unsigned int size) > +{ > + const struct vpci_register r = { .offset = offset, .size = size }; > + struct vpci_register *rm; > + > + vpci_wlock(pdev->domain); > + list_for_each_entry ( rm, &pdev->vpci->handlers, node ) > + { > + int cmp = vpci_register_cmp(&r, rm); > + > + /* > + * NB: do not use a switch so that we can use break to > + * get out of the list loop earlier if required. > + */ > + if ( !cmp && rm->offset == offset && rm->size == size ) > + { > + list_del(&rm->node); > + vpci_wunlock(pdev->domain); > + xfree(rm); > + return 0; > + } > + if ( cmp <= 0 ) > + break; > + } > + vpci_wunlock(pdev->domain); > + > + return -ENOENT; > +} > + > +/* Wrappers for performing reads/writes to the underlying hardware. */ > +static uint32_t vpci_read_hw(unsigned int seg, unsigned int bus, > + unsigned int slot, unsigned int func, > + unsigned int reg, unsigned int size) > +{ > + uint32_t data; > + > + switch ( size ) > + { > + case 4: > + data = pci_conf_read32(seg, bus, slot, func, reg); > + break; > + case 3: > + /* > + * This is possible because a 4byte read can have 1byte trapped and > + * the rest passed-through. > + */ > + if ( reg & 1 ) > + { > + data = pci_conf_read8(seg, bus, slot, func, reg); > + data |= pci_conf_read16(seg, bus, slot, func, reg + 1) << 8; > + } > + else > + { > + data = pci_conf_read16(seg, bus, slot, func, reg); > + data |= pci_conf_read8(seg, bus, slot, func, reg + 2) << 16; > + } > + break; > + case 2: > + data = pci_conf_read16(seg, bus, slot, func, reg); > + break; > + case 1: > + data = pci_conf_read8(seg, bus, slot, func, reg); > + break; > + default: > + ASSERT_UNREACHABLE(); > + data = ~(uint32_t)0; > + break; > + } > + > + return data; > +} > + > +static void vpci_write_hw(unsigned int seg, unsigned int bus, > + unsigned int slot, unsigned int func, > + unsigned int reg, unsigned int size, uint32_t data) > +{ > + switch ( size ) > + { > + case 4: > + pci_conf_write32(seg, bus, slot, func, reg, data); > + break; > + case 3: > + /* > + * This is possible because a 4byte write can have 1byte trapped and > + * the rest passed-through. > + */ > + if ( reg & 1 ) > + { > + pci_conf_write8(seg, bus, slot, func, reg, data); > + pci_conf_write16(seg, bus, slot, func, reg + 1, data >> 8); > + } > + else > + { > + pci_conf_write16(seg, bus, slot, func, reg, data); > + pci_conf_write8(seg, bus, slot, func, reg + 2, data >> 16); > + } > + break; > + case 2: > + pci_conf_write16(seg, bus, slot, func, reg, data); > + break; > + case 1: > + pci_conf_write8(seg, bus, slot, func, reg, data); > + break; > + default: > + ASSERT_UNREACHABLE(); > + break; > + } > +} > + > +/* > + * Merge new data into a partial result. > + * > + * Zero the bytes of 'data' from [offset, offset + size), and > + * merge the value found in 'new' from [0, offset) left shifted > + * by 'offset'. > + */ > +static uint32_t merge_result(uint32_t data, uint32_t new, unsigned int size, > + unsigned int offset) > +{ > + uint32_t mask = 0xffffffff >> (32 - 8 * size); > + > + return (data & ~(mask << (offset * 8))) | ((new & mask) << (offset * 8)); > +} > + > +uint32_t vpci_read(unsigned int seg, unsigned int bus, unsigned int slot, > + unsigned int func, unsigned int reg, unsigned int size) > +{ > + struct domain *d = current->domain; > + struct pci_dev *pdev; > + const struct vpci_register *r; > + unsigned int data_offset = 0; > + uint32_t data = ~(uint32_t)0; > + > + ASSERT(vpci_rlocked(d)); > + > + /* Find the PCI dev matching the address. */ > + pdev = pci_get_pdev_by_domain(d, seg, bus, PCI_DEVFN(slot, func)); > + if ( !pdev ) > + return vpci_read_hw(seg, bus, slot, func, reg, size); > + > + /* Read from the hardware or the emulated register handlers. */ > + list_for_each_entry ( r, &pdev->vpci->handlers, node ) > + { > + const struct vpci_register emu = { > + .offset = reg + data_offset, > + .size = size - data_offset > + }; > + int cmp = vpci_register_cmp(&emu, r); > + uint32_t val; > + unsigned int read_size; > + > + if ( cmp < 0 ) > + break; > + if ( cmp > 0 ) > + continue; > + > + if ( emu.offset < r->offset ) > + { > + /* Heading gap, read partial content from hardware. */ > + read_size = r->offset - emu.offset; > + val = vpci_read_hw(seg, bus, slot, func, emu.offset, read_size); > + data = merge_result(data, val, read_size, data_offset); > + data_offset += read_size; > + } > + > + val = r->read(pdev, r->offset, r->private); > + > + /* Check if the read is in the middle of a register. */ > + if ( r->offset < emu.offset ) > + val >>= (emu.offset - r->offset) * 8; > + > + /* Find the intersection size between the two sets. */ > + read_size = min(emu.offset + emu.size, r->offset + r->size) - > + max(emu.offset, r->offset); > + /* Merge the emulated data into the native read value. */ > + data = merge_result(data, val, read_size, data_offset); > + data_offset += read_size; > + if ( data_offset == size ) > + break; > + ASSERT(data_offset < size); > + } > + > + if ( data_offset < size ) > + { > + /* Tailing gap, read the remaining. */ > + uint32_t tmp_data = vpci_read_hw(seg, bus, slot, func, > + reg + data_offset, > + size - data_offset); > + > + data = merge_result(data, tmp_data, size - data_offset, data_offset); > + } > + > + return data & (0xffffffff >> (32 - 8 * size)); > +} > + > +/* > + * Perform a maybe partial write to a register. > + * > + * Note that this will only work for simple registers, if Xen needs to > + * trap accesses to rw1c registers (like the status PCI header register) > + * the logic in vpci_write will have to be expanded in order to correctly > + * deal with them. > + */ > +static void vpci_write_helper(struct pci_dev *pdev, > + const struct vpci_register *r, unsigned int > size, > + unsigned int offset, uint32_t data) > +{ > + ASSERT(size <= r->size); > + > + if ( size != r->size ) > + { > + uint32_t val; > + > + val = r->read(pdev, r->offset, r->private); > + data = merge_result(val, data, size, offset); > + } > + > + r->write(pdev, r->offset, data & (0xffffffff >> (32 - 8 * r->size)), > + r->private); > +} > + > +void vpci_write(unsigned int seg, unsigned int bus, unsigned int slot, > + unsigned int func, unsigned int reg, unsigned int size, > + uint32_t data) > +{ > + struct domain *d = current->domain; > + struct pci_dev *pdev; > + const struct vpci_register *r; > + unsigned int data_offset = 0; > + > + ASSERT(vpci_wlocked(d)); > + > + /* > + * Find the PCI dev matching the address. > + * Passthrough everything that's not trapped. > + */ > + pdev = pci_get_pdev_by_domain(d, seg, bus, PCI_DEVFN(slot, func)); > + if ( !pdev ) > + { > + vpci_write_hw(seg, bus, slot, func, reg, size, data); > + return; > + } > + > + /* Write the value to the hardware or emulated registers. */ > + list_for_each_entry ( r, &pdev->vpci->handlers, node ) > + { > + const struct vpci_register emu = { > + .offset = reg + data_offset, > + .size = size - data_offset > + }; > + int cmp = vpci_register_cmp(&emu, r); > + unsigned int write_size; > + > + if ( cmp < 0 ) > + break; > + if ( cmp > 0 ) > + continue; > + > + if ( emu.offset < r->offset ) > + { > + /* Heading gap, write partial content to hardware. */ > + vpci_write_hw(seg, bus, slot, func, emu.offset, > + r->offset - emu.offset, data >> (data_offset * 8)); > + data_offset += r->offset - emu.offset; > + } > + > + /* Find the intersection size between the two sets. */ > + write_size = min(emu.offset + emu.size, r->offset + r->size) - > + max(emu.offset, r->offset); > + vpci_write_helper(pdev, r, write_size, reg + data_offset - r->offset, > + data >> (data_offset * 8)); > + data_offset += write_size; > + if ( data_offset == size ) > + break; > + ASSERT(data_offset < size); > + } > + > + if ( data_offset < size ) > + /* Tailing gap, write the remaining. */ > + vpci_write_hw(seg, bus, slot, func, reg + data_offset, > + size - data_offset, data >> (data_offset * 8)); > +} > + > +/* > + * Local variables: > + * mode: C > + * c-file-style: "BSD" > + * c-basic-offset: 4 > + * tab-width: 4 > + * indent-tabs-mode: nil > + * End: > + */ > diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h > index c10522b7f5..ec14343b27 100644 > --- a/xen/include/asm-x86/domain.h > +++ b/xen/include/asm-x86/domain.h > @@ -427,6 +427,7 @@ struct arch_domain > #define has_vpit(d) (!!((d)->arch.emulation_flags & > XEN_X86_EMU_PIT)) > #define has_pirq(d) (!!((d)->arch.emulation_flags & \ > XEN_X86_EMU_USE_PIRQ)) > +#define has_vpci(d) (!!((d)->arch.emulation_flags & > XEN_X86_EMU_VPCI)) > > #define has_arch_pdevs(d) (!list_empty(&(d)->arch.pdev_list)) > > diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm- > x86/hvm/domain.h > index d2899c9bb2..3a54d50606 100644 > --- a/xen/include/asm-x86/hvm/domain.h > +++ b/xen/include/asm-x86/hvm/domain.h > @@ -184,6 +184,9 @@ struct hvm_domain { > /* List of guest to machine IO ports mapping. */ > struct list_head g2m_ioport_list; > > + /* Lock for the PCI emulation layer (vPCI). */ > + rwlock_t vpci_lock; > + > /* List of permanently write-mapped pages. */ > struct { > spinlock_t lock; > diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h > index 51659b6c7f..01322a2e21 100644 > --- a/xen/include/asm-x86/hvm/io.h > +++ b/xen/include/asm-x86/hvm/io.h > @@ -160,6 +160,9 @@ unsigned int hvm_pci_decode_addr(unsigned int cf8, > unsigned int addr, > */ > void register_g2m_portio_handler(struct domain *d); > > +/* HVM port IO handler for PCI accesses. */ > +void register_vpci_portio_handler(struct domain *d); > + > #endif /* __ASM_X86_HVM_IO_H__ */ > > > diff --git a/xen/include/public/arch-x86/xen.h b/xen/include/public/arch- > x86/xen.h > index f21332e897..86a1a09a8d 100644 > --- a/xen/include/public/arch-x86/xen.h > +++ b/xen/include/public/arch-x86/xen.h > @@ -295,12 +295,15 @@ struct xen_arch_domainconfig { > #define XEN_X86_EMU_PIT (1U<<_XEN_X86_EMU_PIT) > #define _XEN_X86_EMU_USE_PIRQ 9 > #define XEN_X86_EMU_USE_PIRQ (1U<<_XEN_X86_EMU_USE_PIRQ) > +#define _XEN_X86_EMU_VPCI 10 > +#define XEN_X86_EMU_VPCI (1U<<_XEN_X86_EMU_VPCI) > > #define XEN_X86_EMU_ALL (XEN_X86_EMU_LAPIC | > XEN_X86_EMU_HPET | \ > XEN_X86_EMU_PM | XEN_X86_EMU_RTC | > \ > XEN_X86_EMU_IOAPIC | XEN_X86_EMU_PIC | > \ > XEN_X86_EMU_VGA | XEN_X86_EMU_IOMMU | > \ > - XEN_X86_EMU_PIT | XEN_X86_EMU_USE_PIRQ) > + XEN_X86_EMU_PIT | XEN_X86_EMU_USE_PIRQ > |\ > + XEN_X86_EMU_VPCI) > uint32_t emulation_flags; > }; > > diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h > index ea6a66b248..ad5d3ca031 100644 > --- a/xen/include/xen/pci.h > +++ b/xen/include/xen/pci.h > @@ -88,6 +88,9 @@ struct pci_dev { > #define PT_FAULT_THRESHOLD 10 > } fault; > u64 vf_rlen[6]; > + > + /* Data for vPCI. */ > + struct vpci *vpci; > }; > > #define for_each_pdev(domain, pdev) \ > diff --git a/xen/include/xen/pci_regs.h b/xen/include/xen/pci_regs.h > index ecd6124d91..cc4ee3b83e 100644 > --- a/xen/include/xen/pci_regs.h > +++ b/xen/include/xen/pci_regs.h > @@ -23,6 +23,14 @@ > #define LINUX_PCI_REGS_H > > /* > + * Conventional PCI and PCI-X Mode 1 devices have 256 bytes of > + * configuration space. PCI-X Mode 2 and PCIe devices have 4096 bytes of > + * configuration space. > + */ > +#define PCI_CFG_SPACE_SIZE 256 > +#define PCI_CFG_SPACE_EXP_SIZE 4096 > + > +/* > * Under PCI, each device has 256 bytes of configuration address space, > * of which the first 64 bytes are standardized as follows: > */ > diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h > new file mode 100644 > index 0000000000..12f7287d7b > --- /dev/null > +++ b/xen/include/xen/vpci.h > @@ -0,0 +1,80 @@ > +#ifndef _VPCI_ > +#define _VPCI_ > + > +#include <xen/pci.h> > +#include <xen/types.h> > +#include <xen/list.h> > + > +/* > + * Helpers for locking/unlocking. > + * > + * NB: the recursive variants are used so that spin_is_locked > + * returns whether the lock is hold by the current CPU (instead > + * of just returning whether the lock is hold by any CPU). > + */ The comment doesn't seem to match the use of read-write locks below. > +#define vpci_rlock(d) read_lock(&(d)->arch.hvm_domain.vpci_lock) > +#define vpci_wlock(d) write_lock(&(d)->arch.hvm_domain.vpci_lock) > +#define vpci_runlock(d) read_unlock(&(d)->arch.hvm_domain.vpci_lock) > +#define vpci_wunlock(d) write_unlock(&(d)->arch.hvm_domain.vpci_lock) > +#define vpci_rlocked(d) rw_is_locked(&(d)->arch.hvm_domain.vpci_lock) > +#define vpci_wlocked(d) rw_is_write_locked(&(d)- > >arch.hvm_domain.vpci_lock) > + > +/* > + * The vPCI handlers will never be called concurrently for the same domain, > it > + * is guaranteed that the vpci domain lock will always be locked when calling > + * any handler. > + */ > +typedef uint32_t vpci_read_t(struct pci_dev *pdev, unsigned int reg, > + const void *data); > + > +typedef void vpci_write_t(struct pci_dev *pdev, unsigned int reg, > + uint32_t val, void *data); > + > +typedef int vpci_register_init_t(struct pci_dev *dev); > + > +#ifdef CONFIG_LATE_HWDOM > +#define VPCI_SECTION ".rodata.vpci" > +#else > +#define VPCI_SECTION ".init.rodata.vpci" > +#endif > + > +#define REGISTER_VPCI_INIT(x) \ > + static vpci_register_init_t *const x##_entry \ > + __used_section(VPCI_SECTION) = x > + > +/* Add vPCI handlers to device. */ > +int __must_check vpci_add_handlers(struct pci_dev *dev); > + > +/* Add/remove a register handler. */ > +int __must_check vpci_add_register(const struct pci_dev *pdev, > + vpci_read_t *read_handler, > + vpci_write_t *write_handler, > + unsigned int offset, unsigned int size, > + void *data); > +int __must_check vpci_remove_register(const struct pci_dev *pdev, > + unsigned int offset, > + unsigned int size); > + > +/* Generic read/write handlers for the PCI config space. */ > +uint32_t vpci_read(unsigned int seg, unsigned int bus, unsigned int slot, > + unsigned int func, unsigned int reg, unsigned int size); > +void vpci_write(unsigned int seg, unsigned int bus, unsigned int slot, > + unsigned int func, unsigned int reg, unsigned int size, > + uint32_t data); > + > +struct vpci { > + /* List of vPCI handlers for a device. */ > + struct list_head handlers; > +}; > + > +#endif > + > +/* > + * Local variables: > + * mode: C > + * c-file-style: "BSD" > + * c-basic-offset: 4 > + * tab-width: 4 > + * indent-tabs-mode: nil > + * End: > + */ > -- > 2.11.0 (Apple Git-81) _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |