[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: Proposal for Porting Xen to Armv8-R64 - DraftA
Hi Julien, > -----Original Message----- > From: Julien Grall <julien@xxxxxxx> > Sent: 2022年3月2日 18:25 > To: Wei Chen <Wei.Chen@xxxxxxx>; Stefano Stabellini > <sstabellini@xxxxxxxxxx> > Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx; Bertrand Marquis > <Bertrand.Marquis@xxxxxxx>; Penny Zheng <Penny.Zheng@xxxxxxx>; Henry Wang > <Henry.Wang@xxxxxxx>; nd <nd@xxxxxxx> > Subject: Re: Proposal for Porting Xen to Armv8-R64 - DraftA > > Hi Wei, > > On 02/03/2022 06:43, Wei Chen wrote: > >> -----Original Message----- > >> From: Julien Grall <julien@xxxxxxx> > >> Sent: 2022年3月1日 21:17 > >> To: Wei Chen <Wei.Chen@xxxxxxx>; Stefano Stabellini > >> <sstabellini@xxxxxxxxxx> > >> Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx; Bertrand Marquis > >> <Bertrand.Marquis@xxxxxxx>; Penny Zheng <Penny.Zheng@xxxxxxx>; Henry > Wang > >> <Henry.Wang@xxxxxxx>; nd <nd@xxxxxxx> > >> Subject: Re: Proposal for Porting Xen to Armv8-R64 - DraftA > >> > >> On 01/03/2022 06:29, Wei Chen wrote: > >>> Hi Julien, > >> > >> Hi, > >> > >>>> -----Original Message----- > >>>> From: Julien Grall <julien@xxxxxxx> > >>>> Sent: 2022年2月26日 4:12 > >>>> To: Wei Chen <Wei.Chen@xxxxxxx>; Stefano Stabellini > >>>> <sstabellini@xxxxxxxxxx> > >>>> Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx; Bertrand Marquis > >>>> <Bertrand.Marquis@xxxxxxx>; Penny Zheng <Penny.Zheng@xxxxxxx>; Henry > >> Wang > >>>> <Henry.Wang@xxxxxxx>; nd <nd@xxxxxxx> > >>>> Subject: Re: Proposal for Porting Xen to Armv8-R64 - DraftA > >>>> > >>>> Hi Wei, > >>>> > >>>> On 25/02/2022 10:48, Wei Chen wrote: > >>>>>>> Armv8-R64 can support max to 256 MPU regions. But that's > just > >>>>>> theoretical. > >>>>>>> So we don't want to define `pr_t mpu_regions[256]`, this is > a > >>>> memory > >>>>>> waste > >>>>>>> in most of time. So we decided to let the user specify > through > >> a > >>>>>> Kconfig > >>>>>>> option. `CONFIG_ARM_MPU_EL1_PROTECTION_REGIONS` default > value > >> can > >>>> be > >>>>>> `32`, > >>>>>>> it's a typical implementation on Armv8-R64. Users will > >> recompile > >>>> Xen > >>>>>> when > >>>>>>> their platform changes. So when the MPU changes, > respecifying > >> the > >>>>>> MPU > >>>>>>> protection regions number will not cause additional > problems. > >>>>>> > >>>>>> I wonder if we could probe the number of MPU regions at runtime and > >>>>>> dynamically allocate the memory needed to store them in arch_vcpu. > >>>>>> > >>>>> > >>>>> We have considered to used a pr_t mpu_regions[0] in arch_vcpu. But > it > >>>> seems > >>>>> we will encounter some static allocated arch_vcpu problems and > sizeof > >>>> issue. > >>>> > >>>> Does it need to be embedded in arch_vcpu? If not, then we could > >> allocate > >>>> memory outside and add a pointer in arch_vcpu. > >>>> > >>> > >>> We had thought to use a pointer in arch_vcpu instead of embedding > >> mpu_regions > >>> into arch_vcpu. But we noticed that arch_vcpu has a > __cacheline_aligned > >>> attribute, this may be because of arch_vcpu will be used very > frequently > >>> in some critical path. So if we use the pointer for mpu_regions, may > >> cause > >>> some cache miss in these critical path, for example, in context_swtich. > >> > >> From my understanding, the idea behind ``cacheline_aligned`` is to > >> avoid the struct vcpu to be shared with other datastructure. Otherwise > >> you may end up to have two pCPUs to frequently write the same cacheline > >> which is not ideal. > >> > >> arch_vcpu should embbed anything that will be accessed often (e.g. > >> entry/exit) to certain point. For instance, not everything related to > >> the vGIC are embbed in the vCPU/Domain structure. > >> > >> I am a bit split regarding the mpu_regions. If they are mainly used in > >> the context_switch() then I would argue this is a premature > optimization > >> because the scheduling decision is probably going to take a lot more > >> time than the context switch itself. > > > > mpu_regions in arch_vcpu are used to save guest's EL1 MPU context. So, > yes, > > they are mainly used in context_switch. In terms of the number of > registers, > > it will save/restore more work than the original V8A. And on V8R we also > need > > to keep most of the original V8A save/restore work. So it will take > longer > > than the original V8A context_switch. And I think this is due to > architecture's > > difference. So it's impossible for us not to save/restore EL1 MPU region > > registers in context_switch. And we have done some optimization for EL1 > MPU > > save/restore: > > 1. Assembly code for EL1 MPU context_switch > > This discussion reminds me when KVM decided to rewrite their context > switch from assembly to C. The outcome was the compiler is able to do a > better job than us when it comes to optimizing. > > With a C version, we could also share the save/restore code with 32-bit > and it is easier to read/maintain. > > So I would suggest to run some numbers to check if it really worth > implementing the MPU save/restore in assembly. > It's interesting to hear KVM guys have similar discussion. Yes, if the gains of assembly code is not very obvious, then reusing the code for 32-bit would be more important. As our current platform (FVP) could not do very precise performance measurement. I want to keep current assembly code there, when we have a platform that can do such measurement we can have a thread to discuss it. > > 2. Use real MPU regions number instead of > CONFIG_ARM_MPU_EL1_PROTECTION_REGIONS > > in context_switch. CONFIG_ARM_MPU_EL1_PROTECTION_REGIONS is defined > the Max > > supported EL1 MPU regions for this Xen image. All platforms that > implement > > EL1 MPU regions in this range can work well with this Xen Image. But > if the > > implemented EL1 MPU region number exceeds > CONFIG_ARM_MPU_EL1_PROTECTION_REGIONS, > > this Xen image could not work well on this platform. > > This sounds similar to the GICv3. The number of LRs depends on the > hardware. See how we dealt with it in gicv3_save_lrs(). > This is a good suggestion, we will check the GIC code. > > > >> > >> Note that for the P2M we already have that indirection because it is > >> embbed in the struct domain. > > > > It's different with V8A P2M case. In V8A context_switch we just need to > > save/restore VTTBR, we don't need to do P2M table walk. But on V8R, we > > need to access valid mpu_regions for save/restore. > > The save/restore for the P2M is a bit more complicated than simply > save/restore the VTTBR. But yes, I agree the code for the MPU will > likely be more complicated. > > > > >> > >> This raises one question, why is the MPUs regions will be per-vCPU > >> rather per domain? > >> > > > > Because there is a EL1 MPU component for each pCPU. We can't assume > guest > > to use the same EL1 MPU configuration for all vCPU. > > Ah. Sorry, I thought you were referring to whatever Xen will use to > prevent the guest accessing outside of its designated region. > NP : ) Thanks, Wei Chen > Cheers, > > -- > Julien Grall
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |