[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v6 09/16] x86/hvm: limit reps to avoid the need to handle retry
> -----Original Message----- > From: Paul Durrant > Sent: 09 July 2015 14:43 > To: 'Jan Beulich' > Cc: Andrew Cooper; xen-devel@xxxxxxxxxxxxxxxxxxxx; Keir (Xen.org) > Subject: RE: [PATCH v6 09/16] x86/hvm: limit reps to avoid the need to > handle retry > > > -----Original Message----- > > From: Jan Beulich [mailto:JBeulich@xxxxxxxx] > > Sent: 09 July 2015 14:38 > > To: Paul Durrant > > Cc: Andrew Cooper; xen-devel@xxxxxxxxxxxxxxxxxxxx; Keir (Xen.org) > > Subject: RE: [PATCH v6 09/16] x86/hvm: limit reps to avoid the need to > > handle retry > > > > >>> On 09.07.15 at 14:50, <Paul.Durrant@xxxxxxxxxx> wrote: > > >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx] > > >> Sent: 09 July 2015 13:04 > > >> >>> On 09.07.15 at 13:11, <Paul.Durrant@xxxxxxxxxx> wrote: > > >> >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx] > > >> >> Sent: 09 July 2015 11:05 > > >> >> >>> On 03.07.15 at 18:25, <paul.durrant@xxxxxxxxxx> wrote: > > >> >> > @@ -287,17 +271,56 @@ static int hvmemul_do_io_addr( > > >> >> > bool_t is_mmio, paddr_t addr, unsigned long *reps, > > >> >> > unsigned int size, uint8_t dir, bool_t df, paddr_t ram_gpa) > > >> >> > { > > >> >> > - struct page_info *ram_page; > > >> >> > + struct vcpu *v = current; > > >> >> > + unsigned long ram_gmfn = paddr_to_pfn(ram_gpa); > > >> >> > + unsigned int page_off = ram_gpa & (PAGE_SIZE - 1); > > >> >> > + struct page_info *ram_page[2]; > > >> >> > + int nr_pages = 0; > > >> >> > > >> >> unsigned int > > >> > > > >> > No, it's intentionally signed because the unwind code at the end of > the > > >> > function is: > > >> > > > >> > while ( --nr_pages >= 0 ) > > >> > hvmemul_release_page(ram_page[nr_pages]); > > >> > > > >> > I.e. the loop terminates when nr_pages gets to -1. > > >> > > >> while ( nr_pages-- ) > > >> hvmemul_release_page(ram_page[nr_pages]); > > >> > > > > > > But that wouldn't be correct, since the loop would try to release > > > ram_page[1] and ram_page[0] in the case where only ram_page[0] had > > been > > > acquired. It would have to be: > > > > > > while ( nr_pages ) > > > hvmemul_release_page(ram_page[--nr_pages]); > > > > Unless you need the value of nr_pages after the loop, I can't see > > the difference between your and my proposal - the decrement in > > both cases happens between determining whether to continue the > > loop and accessing the intended array element. > > D'oh. Of course it does. > > > > > > which I'll do, since you so dislike signed ints. > > > > Sounds like you're not understanding why. Just look at the > > generated code when using signed vs unsigned int as array > > index. > > Ok, I'll take a look because I would not have thought it would make a huge > difference. > So.... Two test programs: test1.c: ------- #include <stdio.h> int main(void) { int i = 10; int a[10]; while ( --i >= 0 ) { a[i] = i; printf("%d\n", i); } return 0; } test2.c: ------- #include <stdio.h> int main(void) { unsigned int i = 10; int a[10]; while ( i ) { a[--i] = i; printf("%d\n", i); } return 0; } I compiled them just using 'make test1' and 'make test2', so using whatever the default options are for gcc (on a debian wheezy box). The relevant bits of the objdump are: test1 ----- 000000000040050c <main>: 40050c: 55 push %rbp 40050d: 48 89 e5 mov %rsp,%rbp 400510: 48 83 ec 30 sub $0x30,%rsp 400514: c7 45 fc 0a 00 00 00 movl $0xa,-0x4(%rbp) 40051b: eb 20 jmp 40053d <main+0x31> 40051d: 8b 45 fc mov -0x4(%rbp),%eax 400520: 48 98 cltq 400522: 8b 55 fc mov -0x4(%rbp),%edx 400525: 89 54 85 d0 mov %edx,-0x30(%rbp,%rax,4) 400529: 8b 45 fc mov -0x4(%rbp),%eax 40052c: 89 c6 mov %eax,%esi 40052e: bf fc 05 40 00 mov $0x4005fc,%edi 400533: b8 00 00 00 00 mov $0x0,%eax 400538: e8 a3 fe ff ff callq 4003e0 <printf@plt> 40053d: 83 6d fc 01 subl $0x1,-0x4(%rbp) 400541: 83 7d fc 00 cmpl $0x0,-0x4(%rbp) 400545: 79 d6 jns 40051d <main+0x11> 400547: b8 00 00 00 00 mov $0x0,%eax 40054c: c9 leaveq 40054d: c3 retq 40054e: 90 nop 40054f: 90 nop test2 ----- 000000000040050c <main>: 40050c: 55 push %rbp 40050d: 48 89 e5 mov %rsp,%rbp 400510: 48 83 ec 30 sub $0x30,%rsp 400514: c7 45 fc 0a 00 00 00 movl $0xa,-0x4(%rbp) 40051b: eb 22 jmp 40053f <main+0x33> 40051d: 83 6d fc 01 subl $0x1,-0x4(%rbp) 400521: 8b 55 fc mov -0x4(%rbp),%edx 400524: 8b 45 fc mov -0x4(%rbp),%eax 400527: 89 54 85 d0 mov %edx,-0x30(%rbp,%rax,4) 40052b: 8b 45 fc mov -0x4(%rbp),%eax 40052e: 89 c6 mov %eax,%esi 400530: bf fc 05 40 00 mov $0x4005fc,%edi 400535: b8 00 00 00 00 mov $0x0,%eax 40053a: e8 a1 fe ff ff callq 4003e0 <printf@plt> 40053f: 83 7d fc 00 cmpl $0x0,-0x4(%rbp) 400543: 75 d8 jne 40051d <main+0x11> 400545: b8 00 00 00 00 mov $0x0,%eax 40054a: c9 leaveq 40054b: c3 retq 40054c: 90 nop 40054d: 90 nop 40054e: 90 nop 40054f: 90 nop So, the only significant difference I can see is that using a signed value causes an extra cltq instruction to be used. Or maybe I need some specific level of optimization for there to be a big difference? Paul > Paul > > > > > Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |