[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [XEN v2] xen/arm64: io: Decode 32-bit ldr/str post-indexing instructions



On Tue, 30 Nov 2021 19:13:41 +0000
Ayan Kumar Halder <ayan.kumar.halder@xxxxxxxxxx> wrote:

Hi Ayan,

> Thanks for your comments. They are useful.
> 
> On 30/11/2021 09:49, Andre Przywara wrote:
> > On Mon, 29 Nov 2021 19:16:38 +0000
> > Ayan Kumar Halder <ayan.kumar.halder@xxxxxxxxxx> wrote:
> > 
> > Hi,
> >   
> >> At the moment, Xen is only handling data abort with valid syndrome (i.e.
> >> ISV=0). Unfortunately, this doesn't cover all the instructions a domain
> >> could use to access MMIO regions.
> >>
> >> For instance, Xilinx baremetal OS will use:
> >>
> >>          volatile u32 *LocalAddr = (volatile u32 *)Addr;
> >>          *LocalAddr = Value;
> >>
> >> This leave the compiler to decide which store instructions to use.  
> > 
> > As mentioned in the other email, this is wrong, if this points to MMIO:
> > don't let the compiler do MMIO accesses. If a stage 2 fault isn't in
> > an MMIO area, you should not see traps that you cannot handle already.
> > 
> > So I don't think it's a good idea to use that as an example. And since
> > this patch only seems to address this use case, I would doubt its
> > usefulness in general.  
> Yes, I should have fixed the comment.
> 
> Currently, I am testing with baremetal app which uses inline assembly 
> code with post indexing instructions, to access the MMIO.
> 
> ATM, I am testing with 32 bit MMIO only.
> 
> On the usefulness, I am kind of torn as it is legitimate for post 
> indexing instructions to be used in an inline-assembly code for 
> accessing MMIO. However, that may not be something commonly seen.

It is legitimate, but I question the usefulness: for a start it wouldn't
work under today's Xen (or KVM), and I doubt this would be backported. So
you would always require users to use the newest version of the hypervisor.
Also MMIO accesses are slow anyway, so while using post-indexing might be
convenient from an assembly writer's point of view, it doesn't give you
much performance-wise, probably. So if you are interested in running under
Xen (or other virtualisation solutions), just don't use them for MMIO.

And I am not sure that Xen aims to virtualise every piece of code on the
planet: certainly we have certain expectations about the platform, so you
would probably consider running under a hypervisor from the very beginning.

> @Stefano/Bertrand/Julien/Volodymyr :- As you are the Arm mantainers, can 
> you comment if we should have decoding logic or not ?
> 
> >   
> >> This
> >> may be a post-index store instruction where the HW will not provide a
> >> valid syndrome.
> >>
> >> In order to handle post-indexing store/load instructions, Xen will need
> >> to fetch and decode the instruction.
> >>
> >> This patch only cover post-index store/load instructions from AArch64
> >> mode. For now, this is left unimplemented for trap from AArch32 mode.
> >>
> >> Signed-off-by: Ayan Kumar Halder <ayankuma@xxxxxxxxxx>
> >> ---
> >>
> >> Changelog :-
> >>
> >> v2 :- 1. Updated the rn register after reading from it. (Pointed by
> >> Julien, Stefano)
> >> 2. Used a union to represent the instruction opcode (Suggestd by
> >> Bertrand) 3. Fixed coding style issues (Pointed by Julien)
> >> 4. In the previous patch, I was updating dabt->sign based on the
> >> signedness of imm9. This was incorrect. As mentioned in ARMv8 ARM  DDI
> >> 0487G.b, Page 3221, SSE indicates the signedness of the data item
> >> loaded. In our case, the data item loaded is always unsigned.
> >>
> >> This has been tested for the following cases :-
> >> ldr x2, [x1], #4  
> > 
> > As Jan already mentioned: this is a bad example. First, this is a 64-bit
> > access, which you don't emulate below. And second, you want to keep the
> > pointer aligned. Unaligned accesses to device memory always trap, as per
> > the architecture, even on bare metal.
> >   
> >>
> >> ldr w2, [x1], #-4
> >>
> >> str x2, [x1], #4  
> > 
> > Same as above.
> >   
> >> str w2, [x1], #-4
> >>
> >> The reason being  I am testing on 32bit MMIO registers only. I don't see
> >> a 8bit or 16bit MMIO register.  
> > 
> > Where did you look? There are plenty of examples out there, even the GIC
> > allows 8-bit accesses to certain registers (grep for "VGIC_ACCESS_"), and
> > the Linux GIC driver is using them (but with proper accessors, of course).
> > Also GICv3 supports 64-bit accesses to some registers. Some PL011 UARTs use
> > 16-bit MMIO accesses.  
> Yes, sorry I see them now. GICD_IPRIORITYR can be accessed with 8 bits.
> Unfortunately, I have GIC-v2 on my hardware system. So, probably I can't 
> test 64 bit access.
> 
> >   
> >>   xen/arch/arm/decode.c     | 68 ++++++++++++++++++++++++++++++++++++++-
> >>   xen/arch/arm/decode.h     |  3 +-
> >>   xen/arch/arm/io.c         | 40 +++++++++++++++++++----
> >>   xen/include/asm-arm/hsr.h | 26 +++++++++++++++
> >>   4 files changed, 129 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/xen/arch/arm/decode.c b/xen/arch/arm/decode.c
> >> index 792c2e92a7..0b3e8fcbc6 100644
> >> --- a/xen/arch/arm/decode.c
> >> +++ b/xen/arch/arm/decode.c
> >> @@ -84,6 +84,66 @@ bad_thumb2:
> >>       return 1;
> >>   }
> >>   
> >> +static int decode_32bit_loadstore_postindexing(register_t pc,
> >> +                                               struct hsr_dabt *dabt,
> >> +                                               union ldr_str_instr_class 
> >> *instr)
> >> +{
> >> +    if ( raw_copy_from_guest(&instr->value, (void * __user)pc, sizeof 
> >> (instr)) )
> >> +        return -EFAULT;
> >> +
> >> +    /* First, let's check for the fixed values */
> >> +    if ( !((instr->code.fixed1 == 1) && (instr->code.fixed2 == 0) &&
> >> +         (instr->code.fixed3 == 0) && (instr->code.fixed4 == 7)) )
> >> +    {
> >> +        gprintk(XENLOG_ERR, "Decoding not supported for instructions 
> >> other than"
> >> +            " ldr/str post indexing\n");
> >> +        goto bad_32bit_loadstore;
> >> +    }
> >> +
> >> +    if ( instr->code.size != 2 )  
> > 
> > I don't see a good reason for this limitation. If you are going to dissect
> > the instruction, why not just support at least all access widths, so
> > 64-bits, but also {ldr,str}{b,w}? I think the framework does the heavy
> > lifting for you already?  
> 
> I see your point. My intention was to test first with the restricted 
> instruction set (ie ldr/str - 32 bit access with post indexing only) and 
> get an opinion from the community. If the patch looks sane, then this 
> can be extended with other variants as well.
> 
> > Same for the restriction to post-index above, supporting pre-index as well
> > should be easy.  
> For Pre-indexing instruction, the ISS is valid. So I am not sure what is 
> to be done here?

Where did you find this? The ARM ARM says that ISV=0 when the instruction
is using writeback, which would include pre-index.
I tested "ldr w0, [x1, #4]!" and it failed on KVM, as expected.

> AFAIU, if the ISS is valid, there is no need to explicitly decode the 
> instructions.
> > 
> > To me this has the bitter taste for being a one trick pony to work around
> > your particular (broken!) use case.
> >   
> >> +    {
> >> +        gprintk(XENLOG_ERR,
> >> +            "ldr/str post indexing is supported for 32 bit variant 
> >> only\n");
> >> +        goto bad_32bit_loadstore;
> >> +    }
> >> +
> >> +    if ( instr->code.v != 0 )
> >> +    {
> >> +        gprintk(XENLOG_ERR,
> >> +            "ldr/str post indexing for vector types are not supported\n");
> >> +        goto bad_32bit_loadstore;
> >> +    }
> >> +
> >> +    /* Check for STR (immediate) - 32 bit variant */
> >> +    if ( instr->code.opc == 0 )
> >> +    {
> >> +        dabt->write = 1;
> >> +    }
> >> +    /* Check for LDR (immediate) - 32 bit variant */
> >> +    else if ( instr->code.opc == 1 )
> >> +    {
> >> +        dabt->write = 0;
> >> +    }
> >> +    else
> >> +    {
> >> +        gprintk(XENLOG_ERR,
> >> +            "Decoding ldr/str post indexing is not supported for this 
> >> variant\n");
> >> +        goto bad_32bit_loadstore;
> >> +    }
> >> +
> >> +    gprintk(XENLOG_INFO,
> >> +        "instr->code.rt = 0x%x, instr->code.size = 0x%x, instr->code.imm9 
> >> = %d\n",
> >> +        instr->code.rt, instr->code.size, instr->code.imm9);
> >> +
> >> +    update_dabt(dabt, instr->code.rt, instr->code.size, false);
> >> +    dabt->valid = 1;
> >> +
> >> +    return 0;
> >> +bad_32bit_loadstore:
> >> +    gprintk(XENLOG_ERR, "unhandled 32bit Arm instruction 0x%x\n", 
> >> instr->value);
> >> +    return 1;
> >> +}
> >> +
> >>   static int decode_thumb(register_t pc, struct hsr_dabt *dabt)
> >>   {
> >>       uint16_t instr;
> >> @@ -150,11 +210,17 @@ bad_thumb:
> >>       return 1;
> >>   }
> >>   
> >> -int decode_instruction(const struct cpu_user_regs *regs, struct hsr_dabt 
> >> *dabt)
> >> +int decode_instruction(const struct cpu_user_regs *regs, struct hsr_dabt 
> >> *dabt,
> >> +                       union ldr_str_instr_class *instr)
> >>   {
> >>       if ( is_32bit_domain(current->domain) && regs->cpsr & PSR_THUMB )
> >>           return decode_thumb(regs->pc, dabt);
> >>   
> >> +    if ( (is_64bit_domain(current->domain) && !psr_mode_is_32bit(regs)) )
> >> +    {
> >> +        return decode_32bit_loadstore_postindexing(regs->pc, dabt, instr);
> >> +    }
> >> +
> >>       /* TODO: Handle ARM instruction */
> >>       gprintk(XENLOG_ERR, "unhandled ARM instruction\n");
> >>   
> >> diff --git a/xen/arch/arm/decode.h b/xen/arch/arm/decode.h
> >> index 4613763bdb..d82fc4a0f6 100644
> >> --- a/xen/arch/arm/decode.h
> >> +++ b/xen/arch/arm/decode.h
> >> @@ -35,7 +35,8 @@
> >>    */
> >>   
> >>   int decode_instruction(const struct cpu_user_regs *regs,
> >> -                       struct hsr_dabt *dabt);
> >> +                       struct hsr_dabt *dabt,
> >> +                       union ldr_str_instr_class *instr);
> >>   
> >>   #endif /* __ARCH_ARM_DECODE_H_ */
> >>   
> >> diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
> >> index 729287e37c..0d60754bc4 100644
> >> --- a/xen/arch/arm/io.c
> >> +++ b/xen/arch/arm/io.c
> >> @@ -65,6 +65,16 @@ static enum io_state handle_write(const struct
> >> mmio_handler *handler, return ret ? IO_HANDLED : IO_ABORT;
> >>   }
> >>   
> >> +static void post_incremenet_register(union ldr_str_instr_class *instr)
> >> +{
> >> +    struct cpu_user_regs *regs = guest_cpu_user_regs();
> >> +    unsigned int val;
> >> +
> >> +    val = get_user_reg(regs, instr->code.rn);
> >> +    val += instr->code.imm9;
> >> +    set_user_reg(regs, instr->code.rn, val);
> >> +}
> >> +
> >>   /* This function assumes that mmio regions are not overlapped */
> >>   static int cmp_mmio_handler(const void *key, const void *elem)
> >>   {
> >> @@ -106,14 +116,26 @@ enum io_state try_handle_mmio(struct cpu_user_regs
> >> *regs, .gpa = gpa,
> >>           .dabt = dabt
> >>       };
> >> +    int rc;
> >> +    union ldr_str_instr_class instr = {0};
> >>   
> >>       ASSERT(hsr.ec == HSR_EC_DATA_ABORT_LOWER_EL);
> >>   
> >> +    /*
> >> +     * Armv8 processor does not provide a valid syndrome for post-indexing
> >> +     * ldr/str instructions. So in order to process these instructions,
> >> +     * Xen must decode them.
> >> +     */
> >> +    if ( !info.dabt.valid )
> >> +    {
> >> +        rc = decode_instruction(regs, &info.dabt, &instr);
> >> +        if ( rc )
> >> +            return IO_ABORT;
> >> +    }
> >> +
> >>       handler = find_mmio_handler(v->domain, info.gpa);
> >>       if ( !handler )
> >>       {
> >> -        int rc;
> >> -
> >>           rc = try_fwd_ioserv(regs, v, &info);
> >>           if ( rc == IO_HANDLED )
> >>               return handle_ioserv(regs, v);
> >> @@ -122,7 +144,7 @@ enum io_state try_handle_mmio(struct cpu_user_regs
> >> *regs, }
> >>   
> >>       /* All the instructions used on emulated MMIO region should be
> >> valid */
> >> -    if ( !dabt.valid )
> >> +    if ( !info.dabt.valid )
> >>           return IO_ABORT;
> >>   
> >>       /*
> >> @@ -134,7 +156,7 @@ enum io_state try_handle_mmio(struct cpu_user_regs
> >> *regs, {
> >>           int rc;
> >>   
> >> -        rc = decode_instruction(regs, &info.dabt);
> >> +        rc = decode_instruction(regs, &info.dabt, NULL);
> >>           if ( rc )
> >>           {
> >>               gprintk(XENLOG_DEBUG, "Unable to decode instruction\n");
> >> @@ -143,9 +165,15 @@ enum io_state try_handle_mmio(struct cpu_user_regs
> >> *regs, }
> >>   
> >>       if ( info.dabt.write )
> >> -        return handle_write(handler, v, &info);
> >> +        rc = handle_write(handler, v, &info);
> >>       else
> >> -        return handle_read(handler, v, &info);
> >> +        rc = handle_read(handler, v, &info);
> >> +
> >> +    if ( instr.value != 0 )
> >> +    {
> >> +        post_incremenet_register(&instr);
> >> +    }
> >> +    return rc;
> >>   }
> >>   
> >>   void register_mmio_handler(struct domain *d,
> >> diff --git a/xen/include/asm-arm/hsr.h b/xen/include/asm-arm/hsr.h
> >> index 9b91b28c48..72d67d2801 100644
> >> --- a/xen/include/asm-arm/hsr.h
> >> +++ b/xen/include/asm-arm/hsr.h
> >> @@ -15,6 +15,32 @@ enum dabt_size {
> >>       DABT_DOUBLE_WORD = 3,
> >>   };
> >>   
> >> +/*
> >> + * Refer to the ARMv8 ARM (DDI 0487G.b), Section C4.1.4 Loads and Stores
> >> + * Page 318 specifies the following bit pattern for
> >> + * "load/store register (immediate post-indexed)".
> >> + *
> >> + * 31 30 29  27 26 25  23   21 20              11   9         4       0
> >> + * ___________________________________________________________________
> >> + * |size|1 1 1 |V |0 0 |opc |0 |      imm9     |0 1 |  Rn     |  Rt   |
> >> + * |____|______|__|____|____|__|_______________|____|_________|_______|
> >> + */
> >> +union ldr_str_instr_class {
> >> +    uint32_t value;
> >> +    struct ldr_str {
> >> +        unsigned int rt:5;     /* Rt register */  
> > 
> > I don't think it's a particular good idea to use a bit-field here, if that
> > is expected to mimic a certain hardware provided bit pattern.
> > It works in practise (TM), but the C standard does not guarantee the order
> > the bits are allocated (ISO/IEC 9899:201x §6.7.2.1, stanza 11).
> > Since you are *reading* only from the instruction word, you should get away
> > with accessor macros to extract the bits you need. For instance for
> > filtering the opcode, you could use: ((insn & 0x3fe00c00) == 0x38400400)  
> 
> Yes, this is a very good point. I will use bitmasks to access the bits 
> from the register.
> 
> I saw the same logic (ie using bitfields) is used for some other 
> registers as well. For eg hsr_dabt, hsr_iabt in 
> xen/include/asm-arm/hsr.h. May be that needs fixing as well for some 
> other time. :)

(erroneously put that part already in the mail to Bertrand, for the
sake of completeness here it is again):

Well, there is no easy and clear answer as to whether to use bit-fields in
those occasions or not. From a practical point of view, it works (TM), and
has some advantages, like saving you from fiddling around with a 9-bit
sign extension, for instance.
But the Linux kernel discourages those works-for-me solutions, one killer
problem here is endianess, which is not a problem for Xen, IIRC.
I personally prefer robust code: by not relying on certain implementation
specifics (and be they very obvious or wide-spread), there will be less
surprises in the future.

So I'd leave this up to the maintainers to decide, IIUC the original
suggestion came from Bertrand?

Cheers,
Andre

> >> +        unsigned int rn:5;     /* Rn register */
> >> +        unsigned int fixed1:2; /* value == 01b */
> >> +        int imm9:9;            /* imm9 */
> >> +        unsigned int fixed2:1; /* value == 0b */
> >> +        unsigned int opc:2;    /* opc */
> >> +        unsigned int fixed3:2; /* value == 00b */
> >> +        unsigned int v:1;      /* vector */
> >> +        unsigned int fixed4:3; /* value == 111b */
> >> +        unsigned int size:2;   /* size */
> >> +    } code;
> >> +};
> >> +
> >>   union hsr {
> >>       register_t bits;
> >>       struct {  
> >   




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.