[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v3 05/18] x86/hvm: remove multiple open coded 'chunking' loops
> -----Original Message----- > From: Jan Beulich [mailto:JBeulich@xxxxxxxx] > Sent: 24 June 2015 10:38 > To: Paul Durrant > Cc: Andrew Cooper; xen-devel@xxxxxxxxxxxxxxxxxxxx; Keir (Xen.org) > Subject: Re: [PATCH v3 05/18] x86/hvm: remove multiple open coded > 'chunking' loops > > >>> On 23.06.15 at 12:39, <paul.durrant@xxxxxxxxxx> wrote: > > --- a/xen/arch/x86/hvm/emulate.c > > +++ b/xen/arch/x86/hvm/emulate.c > > @@ -540,6 +540,115 @@ static int hvmemul_virtual_to_linear( > > return X86EMUL_EXCEPTION; > > } > > > > +static int hvmemul_phys_mmio_access( > > + paddr_t gpa, unsigned int size, uint8_t dir, uint8_t *buffer, > > + unsigned int *off) > > Why this (buffer, off) pair? The caller can easily adjust buffer as > necessary, avoiding the other parameter altogether. And buffer > itself can be void * just like it is in some of the callers (and the > others should follow suit). > It actually becomes necessary in a later patch, but I'll make the change there instead. As for incrementing a void *, I know that MSVC disallows this. I believe it is a gcc-ism which I guess clang must tolerate, but I don't think it is standard C. > > +{ > > + unsigned long one_rep = 1; > > + unsigned int chunk; > > + int rc = 0; > > + > > + /* Accesses must fall within a page */ > > + if ( (gpa & (PAGE_SIZE - 1)) + size > PAGE_SIZE ) > > + return X86EMUL_UNHANDLEABLE; > > As for patch 4 - this imposes a restriction that real hardware doesn't > have, and hence this needs to be replaced by adjusting the one caller > not currently guaranteeing this such that it caps the size. > Ok. > > + /* > > + * hvmemul_do_io() cannot handle non-power-of-2 accesses or > > + * accesses larger than sizeof(long), so choose the highest power > > + * of 2 not exceeding sizeof(long) as the 'chunk' size. > > + */ > > + chunk = 1 << (fls(size) - 1); > > + if ( chunk > sizeof (long) ) > > + chunk = sizeof (long); > > I suppose you intentionally generalize this; if so this should be > mentioned in the commit message. This is particularly because it > results in changed behavior (which isn't to say that I'm sure the > previous way was any better in the sense of being closer to what > real hardware does): Right now, an 8 byte access at the last > byte of a page would get carried out as 8 1-byte accesses. Your > change makes it a 1-, 4-, 2-, and 1-byte access in that order. > ...which is certainly going to be quicker since it's only 4 round-trips to an emulator rather than 8. > Also, considering instruction characteristics (as explained in the > original comment) I think the old way of determining the chunk > size may have been cheaper than yours using fls(). > I thought fls() was generally implemented using inline assembler and was pretty fast. I didn't actually check how Xen implements it; I just assumed it would be optimal. > > + > > + while ( size != 0 ) > > + { > > + rc = hvmemul_do_mmio_buffer(gpa, &one_rep, chunk, dir, 0, > > + &buffer[*off]); > > + if ( rc != X86EMUL_OKAY ) > > + break; > > + > > + /* Advance to the next chunk */ > > + gpa += chunk; > > + *off += chunk; > > + size -= chunk; > > + > > + /* > > + * If the chunk now exceeds the remaining size, choose the next > > + * lowest power of 2 that will fit. > > + */ > > + while ( chunk > size ) > > + chunk >>= 1; > > Please avoid this loop when size == 0. Since the function won't be > called with size being zero, I think the loop should be a for ( ; ; ) > one with the loop exit condition put in the middle. Sure. > > > @@ -549,52 +658,26 @@ static int __hvmemul_read( > > struct hvm_emulate_ctxt *hvmemul_ctxt) > > { > > struct vcpu *curr = current; > > - unsigned long addr, reps = 1; > > - unsigned int off, chunk = min(bytes, 1U << LONG_BYTEORDER); > > + unsigned long addr, one_rep = 1; > > uint32_t pfec = PFEC_page_present; > > struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io; > > - paddr_t gpa; > > int rc; > > > > rc = hvmemul_virtual_to_linear( > > - seg, offset, bytes, &reps, access_type, hvmemul_ctxt, &addr); > > + seg, offset, bytes, &one_rep, access_type, hvmemul_ctxt, &addr); > > if ( rc != X86EMUL_OKAY ) > > return rc; > > - off = addr & (PAGE_SIZE - 1); > > - /* > > - * We only need to handle sizes actual instruction operands can have. > > All > > - * such sizes are either powers of 2 or the sum of two powers of 2. > > Thus > > - * picking as initial chunk size the largest power of 2 not greater > > than > > - * the total size will always result in only power-of-2 size requests > > - * issued to hvmemul_do_mmio() (hvmemul_do_io() rejects non- > powers-of-2). > > - */ > > - while ( chunk & (chunk - 1) ) > > - chunk &= chunk - 1; > > - if ( off + bytes > PAGE_SIZE ) > > - while ( off & (chunk - 1) ) > > - chunk >>= 1; > > > > if ( ((access_type != hvm_access_insn_fetch > > ? vio->mmio_access.read_access > > : vio->mmio_access.insn_fetch)) && > > (vio->mmio_gva == (addr & PAGE_MASK)) ) > > { > > - gpa = (((paddr_t)vio->mmio_gpfn << PAGE_SHIFT) | off); > > - while ( (off + chunk) <= PAGE_SIZE ) > > - { > > - rc = hvmemul_do_mmio_buffer(gpa, &reps, chunk, IOREQ_READ, 0, > > - p_data); > > - if ( rc != X86EMUL_OKAY || bytes == chunk ) > > - return rc; > > - off += chunk; > > - gpa += chunk; > > - p_data += chunk; > > - bytes -= chunk; > > - if ( bytes < chunk ) > > - chunk = bytes; > > - } > > + unsigned int off = 0; > > + paddr_t gpa = (((paddr_t)vio->mmio_gpfn << PAGE_SHIFT) | > > If you already touch this, pfn_to_paddr() please. > Indeed. Will do. Paul > Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |