[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [V10 PATCH 23/23] PVH xen: introduce vmexit handler for PVH



At 19:30 -0700 on 25 Jul (1374780657), Mukesh Rathor wrote:
> On Thu, 25 Jul 2013 17:28:40 +0100
> Tim Deegan <tim@xxxxxxx> wrote:
> 
> > At 18:59 -0700 on 23 Jul (1374605971), Mukesh Rathor wrote:
> > > +/* Just like HVM, PVH should be using "cpuid" from the kernel
> > > mode. */ +static int vmxit_invalid_op(struct cpu_user_regs *regs)
> > > +{
> > > +    if ( guest_kernel_mode(current, regs)
> > > || !emulate_forced_invalid_op(regs) )
> > > +        hvm_inject_hw_exception(TRAP_invalid_op,
> > > HVM_DELIVER_NO_ERROR_CODE);
> > 
> > Was this discussed before?  It seems harsh to stop kernel-mode code
> > from using the pv cpuid operation if it wants to.  In particular,
> > what about loadable kernel modules?
> 
> Yes, few times on the xen mailing list. The only PVH guest, linux
> as of now, the pv ops got rewired to use native cpuid, which is
> how hvm does it.

Yes, but presumably you want to make it easy for other PV guests to port
to PVH too?

> So, couldn't come up with any real reason to support it.

Seems like there's no reason not to -- wouldn't just removing the check
for kernel-mode DTRT?  Or is there some other complication?

> The kernel modules in pv ops will go thru native_cpuid
> too, which will do hvm cpuid too.
> 
> 
> > If you do go with this restriction, please document it in
> > include/public/arch-x86/xen.h beside the XEN_CPUID definition.
> 
> Ok, I'll add it there.

Thanks.

> > > +/* Returns: rc == 0: success. */
> > > +static int access_cr0(struct cpu_user_regs *regs, uint acc_typ,
> > > uint64_t *regp) +{
> > > +    struct vcpu *vp = current;
> > > +
> > > +    if ( acc_typ == VMX_CONTROL_REG_ACCESS_TYPE_MOV_TO_CR )
> > > +    {
> > > +        unsigned long new_cr0 = *regp;
> > > +        unsigned long old_cr0 = __vmread(GUEST_CR0);
> > > +
> > > +        dbgp1("PVH:writing to CR0. RIP:%lx val:0x%lx\n",
> > > regs->rip, *regp);
> > > +        if ( (u32)new_cr0 != new_cr0 )
> > > +        {
> > > +            printk(XENLOG_G_WARNING
> > > +                   "Guest setting upper 32 bits in CR0: %lx",
> > > new_cr0);
> > > +            return -EPERM;
> > 
> > AFAICS returning non-zero here crashes the guest.  Shouldn't this
> > inject #GP instead?
> 
> Right, GPF it is.

Ta.

> > > +
> > > +        if ( new & HVM_CR4_GUEST_RESERVED_BITS(vp) )
> > > +        {
> > > +            printk(XENLOG_G_WARNING
> > > +                   "PVH guest attempts to set reserved bit in CR4:
> > > %lx", new);
> > > +            hvm_inject_hw_exception(TRAP_gp_fault, 0);
> > > +            return 0;
> > > +        }
> > > +
> > > +        if ( !(new & X86_CR4_PAE) && hvm_long_mode_enabled(vp) )
> > > +        {
> > > +            printk(XENLOG_G_WARNING "Guest cleared CR4.PAE while "
> > > +                   "EFER.LMA is set");
> > > +            hvm_inject_hw_exception(TRAP_gp_fault, 0);
> > > +            return 0;
> > > +        }
> > > +
> > > +        vp->arch.hvm_vcpu.guest_cr[4] = new;
> > > +
> > > +        if ( (old_val ^ new) & (X86_CR4_PSE | X86_CR4_PGE |
> > > X86_CR4_PAE) )
> > > +            vpid_sync_all();
> > > +
> > > +        __vmwrite(CR4_READ_SHADOW, new);
> > > +
> > > +        new &= ~X86_CR4_PAE;         /* PVH always runs with hap
> > > enabled. */
> > 
> > The equivalent mask in vmx_update_guest_cr() is masking out a default
> > setting of CR4.PAE _before_ the guest's requested bits get ORred in.
> > This is masking out the PAE bit that we just insisted on.  I'm
> > surprised that VMENTER doesn't choke on this -- I guess it uses
> > VM_ENTRY_IA32E_MODE rather than looking at these bits at all.
> 
> Ah, I see. what a mess! Hmm... I can't find much in the VMX sections
> of the SDM on this. Not sure what I should do here. How about just
> not clearing the X86_CR4_PAE, so the GUEST_CR4 will have it
> if it's set in new, ie, the guest wants it set? ie, :
> ...
>         __vmwrite(CR4_READ_SHADOW, new);
> 
>         new |= X86_CR4_VMXE | X86_CR4_MCE;
>         __vmwrite(GUEST_CR4, new);

Yep, that looks correct to me. 

> > > +    case VMX_CONTROL_REG_ACCESS_TYPE_CLTS:
> > > +    {
> > > +        struct vcpu *vp = current;
> > > +        unsigned long cr0 = vp->arch.hvm_vcpu.guest_cr[0] &
> > > ~X86_CR0_TS;
> > > +        vp->arch.hvm_vcpu.hw_cr[0] = vp->arch.hvm_vcpu.guest_cr[0]
> > > = cr0; +
> > > +        vmx_fpu_enter(vp);
> > > +        __vmwrite(GUEST_CR0, cr0);
> > > +        __vmwrite(CR0_READ_SHADOW, cr0);
> > > +        vmx_update_guest_eip();
> > > +        rc = 0;
> > > +    }
> > > +    }
> > 
> > No "case VMX_CONTROL_REG_ACCESS_TYPE_LMSW"?
> 
> Well, PVH is such a new concept, do you really think we need it?

Yes!  I've seen it used (in an implementation of stts() IIRC) quite
recently.  Besides, it's only about three lines of code to lift from the
normal VMX handler.

> Lets
> not hold the series just for this, we can always add in future :).

Well, I assume with Keir's ack the series will go in anyway and any
changes from this review would be fixed up afterwards.

Tim.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.