[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [XEN v2] xen/arm64: io: Decode 32-bit ldr/str post-indexing instructions
On Tue, 30 Nov 2021 19:13:41 +0000 Ayan Kumar Halder <ayan.kumar.halder@xxxxxxxxxx> wrote: Hi Ayan, > Thanks for your comments. They are useful. > > On 30/11/2021 09:49, Andre Przywara wrote: > > On Mon, 29 Nov 2021 19:16:38 +0000 > > Ayan Kumar Halder <ayan.kumar.halder@xxxxxxxxxx> wrote: > > > > Hi, > > > >> At the moment, Xen is only handling data abort with valid syndrome (i.e. > >> ISV=0). Unfortunately, this doesn't cover all the instructions a domain > >> could use to access MMIO regions. > >> > >> For instance, Xilinx baremetal OS will use: > >> > >> volatile u32 *LocalAddr = (volatile u32 *)Addr; > >> *LocalAddr = Value; > >> > >> This leave the compiler to decide which store instructions to use. > > > > As mentioned in the other email, this is wrong, if this points to MMIO: > > don't let the compiler do MMIO accesses. If a stage 2 fault isn't in > > an MMIO area, you should not see traps that you cannot handle already. > > > > So I don't think it's a good idea to use that as an example. And since > > this patch only seems to address this use case, I would doubt its > > usefulness in general. > Yes, I should have fixed the comment. > > Currently, I am testing with baremetal app which uses inline assembly > code with post indexing instructions, to access the MMIO. > > ATM, I am testing with 32 bit MMIO only. > > On the usefulness, I am kind of torn as it is legitimate for post > indexing instructions to be used in an inline-assembly code for > accessing MMIO. However, that may not be something commonly seen. It is legitimate, but I question the usefulness: for a start it wouldn't work under today's Xen (or KVM), and I doubt this would be backported. So you would always require users to use the newest version of the hypervisor. Also MMIO accesses are slow anyway, so while using post-indexing might be convenient from an assembly writer's point of view, it doesn't give you much performance-wise, probably. So if you are interested in running under Xen (or other virtualisation solutions), just don't use them for MMIO. And I am not sure that Xen aims to virtualise every piece of code on the planet: certainly we have certain expectations about the platform, so you would probably consider running under a hypervisor from the very beginning. > @Stefano/Bertrand/Julien/Volodymyr :- As you are the Arm mantainers, can > you comment if we should have decoding logic or not ? > > > > >> This > >> may be a post-index store instruction where the HW will not provide a > >> valid syndrome. > >> > >> In order to handle post-indexing store/load instructions, Xen will need > >> to fetch and decode the instruction. > >> > >> This patch only cover post-index store/load instructions from AArch64 > >> mode. For now, this is left unimplemented for trap from AArch32 mode. > >> > >> Signed-off-by: Ayan Kumar Halder <ayankuma@xxxxxxxxxx> > >> --- > >> > >> Changelog :- > >> > >> v2 :- 1. Updated the rn register after reading from it. (Pointed by > >> Julien, Stefano) > >> 2. Used a union to represent the instruction opcode (Suggestd by > >> Bertrand) 3. Fixed coding style issues (Pointed by Julien) > >> 4. In the previous patch, I was updating dabt->sign based on the > >> signedness of imm9. This was incorrect. As mentioned in ARMv8 ARM DDI > >> 0487G.b, Page 3221, SSE indicates the signedness of the data item > >> loaded. In our case, the data item loaded is always unsigned. > >> > >> This has been tested for the following cases :- > >> ldr x2, [x1], #4 > > > > As Jan already mentioned: this is a bad example. First, this is a 64-bit > > access, which you don't emulate below. And second, you want to keep the > > pointer aligned. Unaligned accesses to device memory always trap, as per > > the architecture, even on bare metal. > > > >> > >> ldr w2, [x1], #-4 > >> > >> str x2, [x1], #4 > > > > Same as above. > > > >> str w2, [x1], #-4 > >> > >> The reason being I am testing on 32bit MMIO registers only. I don't see > >> a 8bit or 16bit MMIO register. > > > > Where did you look? There are plenty of examples out there, even the GIC > > allows 8-bit accesses to certain registers (grep for "VGIC_ACCESS_"), and > > the Linux GIC driver is using them (but with proper accessors, of course). > > Also GICv3 supports 64-bit accesses to some registers. Some PL011 UARTs use > > 16-bit MMIO accesses. > Yes, sorry I see them now. GICD_IPRIORITYR can be accessed with 8 bits. > Unfortunately, I have GIC-v2 on my hardware system. So, probably I can't > test 64 bit access. > > > > >> xen/arch/arm/decode.c | 68 ++++++++++++++++++++++++++++++++++++++- > >> xen/arch/arm/decode.h | 3 +- > >> xen/arch/arm/io.c | 40 +++++++++++++++++++---- > >> xen/include/asm-arm/hsr.h | 26 +++++++++++++++ > >> 4 files changed, 129 insertions(+), 8 deletions(-) > >> > >> diff --git a/xen/arch/arm/decode.c b/xen/arch/arm/decode.c > >> index 792c2e92a7..0b3e8fcbc6 100644 > >> --- a/xen/arch/arm/decode.c > >> +++ b/xen/arch/arm/decode.c > >> @@ -84,6 +84,66 @@ bad_thumb2: > >> return 1; > >> } > >> > >> +static int decode_32bit_loadstore_postindexing(register_t pc, > >> + struct hsr_dabt *dabt, > >> + union ldr_str_instr_class > >> *instr) > >> +{ > >> + if ( raw_copy_from_guest(&instr->value, (void * __user)pc, sizeof > >> (instr)) ) > >> + return -EFAULT; > >> + > >> + /* First, let's check for the fixed values */ > >> + if ( !((instr->code.fixed1 == 1) && (instr->code.fixed2 == 0) && > >> + (instr->code.fixed3 == 0) && (instr->code.fixed4 == 7)) ) > >> + { > >> + gprintk(XENLOG_ERR, "Decoding not supported for instructions > >> other than" > >> + " ldr/str post indexing\n"); > >> + goto bad_32bit_loadstore; > >> + } > >> + > >> + if ( instr->code.size != 2 ) > > > > I don't see a good reason for this limitation. If you are going to dissect > > the instruction, why not just support at least all access widths, so > > 64-bits, but also {ldr,str}{b,w}? I think the framework does the heavy > > lifting for you already? > > I see your point. My intention was to test first with the restricted > instruction set (ie ldr/str - 32 bit access with post indexing only) and > get an opinion from the community. If the patch looks sane, then this > can be extended with other variants as well. > > > Same for the restriction to post-index above, supporting pre-index as well > > should be easy. > For Pre-indexing instruction, the ISS is valid. So I am not sure what is > to be done here? Where did you find this? The ARM ARM says that ISV=0 when the instruction is using writeback, which would include pre-index. I tested "ldr w0, [x1, #4]!" and it failed on KVM, as expected. > AFAIU, if the ISS is valid, there is no need to explicitly decode the > instructions. > > > > To me this has the bitter taste for being a one trick pony to work around > > your particular (broken!) use case. > > > >> + { > >> + gprintk(XENLOG_ERR, > >> + "ldr/str post indexing is supported for 32 bit variant > >> only\n"); > >> + goto bad_32bit_loadstore; > >> + } > >> + > >> + if ( instr->code.v != 0 ) > >> + { > >> + gprintk(XENLOG_ERR, > >> + "ldr/str post indexing for vector types are not supported\n"); > >> + goto bad_32bit_loadstore; > >> + } > >> + > >> + /* Check for STR (immediate) - 32 bit variant */ > >> + if ( instr->code.opc == 0 ) > >> + { > >> + dabt->write = 1; > >> + } > >> + /* Check for LDR (immediate) - 32 bit variant */ > >> + else if ( instr->code.opc == 1 ) > >> + { > >> + dabt->write = 0; > >> + } > >> + else > >> + { > >> + gprintk(XENLOG_ERR, > >> + "Decoding ldr/str post indexing is not supported for this > >> variant\n"); > >> + goto bad_32bit_loadstore; > >> + } > >> + > >> + gprintk(XENLOG_INFO, > >> + "instr->code.rt = 0x%x, instr->code.size = 0x%x, instr->code.imm9 > >> = %d\n", > >> + instr->code.rt, instr->code.size, instr->code.imm9); > >> + > >> + update_dabt(dabt, instr->code.rt, instr->code.size, false); > >> + dabt->valid = 1; > >> + > >> + return 0; > >> +bad_32bit_loadstore: > >> + gprintk(XENLOG_ERR, "unhandled 32bit Arm instruction 0x%x\n", > >> instr->value); > >> + return 1; > >> +} > >> + > >> static int decode_thumb(register_t pc, struct hsr_dabt *dabt) > >> { > >> uint16_t instr; > >> @@ -150,11 +210,17 @@ bad_thumb: > >> return 1; > >> } > >> > >> -int decode_instruction(const struct cpu_user_regs *regs, struct hsr_dabt > >> *dabt) > >> +int decode_instruction(const struct cpu_user_regs *regs, struct hsr_dabt > >> *dabt, > >> + union ldr_str_instr_class *instr) > >> { > >> if ( is_32bit_domain(current->domain) && regs->cpsr & PSR_THUMB ) > >> return decode_thumb(regs->pc, dabt); > >> > >> + if ( (is_64bit_domain(current->domain) && !psr_mode_is_32bit(regs)) ) > >> + { > >> + return decode_32bit_loadstore_postindexing(regs->pc, dabt, instr); > >> + } > >> + > >> /* TODO: Handle ARM instruction */ > >> gprintk(XENLOG_ERR, "unhandled ARM instruction\n"); > >> > >> diff --git a/xen/arch/arm/decode.h b/xen/arch/arm/decode.h > >> index 4613763bdb..d82fc4a0f6 100644 > >> --- a/xen/arch/arm/decode.h > >> +++ b/xen/arch/arm/decode.h > >> @@ -35,7 +35,8 @@ > >> */ > >> > >> int decode_instruction(const struct cpu_user_regs *regs, > >> - struct hsr_dabt *dabt); > >> + struct hsr_dabt *dabt, > >> + union ldr_str_instr_class *instr); > >> > >> #endif /* __ARCH_ARM_DECODE_H_ */ > >> > >> diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c > >> index 729287e37c..0d60754bc4 100644 > >> --- a/xen/arch/arm/io.c > >> +++ b/xen/arch/arm/io.c > >> @@ -65,6 +65,16 @@ static enum io_state handle_write(const struct > >> mmio_handler *handler, return ret ? IO_HANDLED : IO_ABORT; > >> } > >> > >> +static void post_incremenet_register(union ldr_str_instr_class *instr) > >> +{ > >> + struct cpu_user_regs *regs = guest_cpu_user_regs(); > >> + unsigned int val; > >> + > >> + val = get_user_reg(regs, instr->code.rn); > >> + val += instr->code.imm9; > >> + set_user_reg(regs, instr->code.rn, val); > >> +} > >> + > >> /* This function assumes that mmio regions are not overlapped */ > >> static int cmp_mmio_handler(const void *key, const void *elem) > >> { > >> @@ -106,14 +116,26 @@ enum io_state try_handle_mmio(struct cpu_user_regs > >> *regs, .gpa = gpa, > >> .dabt = dabt > >> }; > >> + int rc; > >> + union ldr_str_instr_class instr = {0}; > >> > >> ASSERT(hsr.ec == HSR_EC_DATA_ABORT_LOWER_EL); > >> > >> + /* > >> + * Armv8 processor does not provide a valid syndrome for post-indexing > >> + * ldr/str instructions. So in order to process these instructions, > >> + * Xen must decode them. > >> + */ > >> + if ( !info.dabt.valid ) > >> + { > >> + rc = decode_instruction(regs, &info.dabt, &instr); > >> + if ( rc ) > >> + return IO_ABORT; > >> + } > >> + > >> handler = find_mmio_handler(v->domain, info.gpa); > >> if ( !handler ) > >> { > >> - int rc; > >> - > >> rc = try_fwd_ioserv(regs, v, &info); > >> if ( rc == IO_HANDLED ) > >> return handle_ioserv(regs, v); > >> @@ -122,7 +144,7 @@ enum io_state try_handle_mmio(struct cpu_user_regs > >> *regs, } > >> > >> /* All the instructions used on emulated MMIO region should be > >> valid */ > >> - if ( !dabt.valid ) > >> + if ( !info.dabt.valid ) > >> return IO_ABORT; > >> > >> /* > >> @@ -134,7 +156,7 @@ enum io_state try_handle_mmio(struct cpu_user_regs > >> *regs, { > >> int rc; > >> > >> - rc = decode_instruction(regs, &info.dabt); > >> + rc = decode_instruction(regs, &info.dabt, NULL); > >> if ( rc ) > >> { > >> gprintk(XENLOG_DEBUG, "Unable to decode instruction\n"); > >> @@ -143,9 +165,15 @@ enum io_state try_handle_mmio(struct cpu_user_regs > >> *regs, } > >> > >> if ( info.dabt.write ) > >> - return handle_write(handler, v, &info); > >> + rc = handle_write(handler, v, &info); > >> else > >> - return handle_read(handler, v, &info); > >> + rc = handle_read(handler, v, &info); > >> + > >> + if ( instr.value != 0 ) > >> + { > >> + post_incremenet_register(&instr); > >> + } > >> + return rc; > >> } > >> > >> void register_mmio_handler(struct domain *d, > >> diff --git a/xen/include/asm-arm/hsr.h b/xen/include/asm-arm/hsr.h > >> index 9b91b28c48..72d67d2801 100644 > >> --- a/xen/include/asm-arm/hsr.h > >> +++ b/xen/include/asm-arm/hsr.h > >> @@ -15,6 +15,32 @@ enum dabt_size { > >> DABT_DOUBLE_WORD = 3, > >> }; > >> > >> +/* > >> + * Refer to the ARMv8 ARM (DDI 0487G.b), Section C4.1.4 Loads and Stores > >> + * Page 318 specifies the following bit pattern for > >> + * "load/store register (immediate post-indexed)". > >> + * > >> + * 31 30 29 27 26 25 23 21 20 11 9 4 0 > >> + * ___________________________________________________________________ > >> + * |size|1 1 1 |V |0 0 |opc |0 | imm9 |0 1 | Rn | Rt | > >> + * |____|______|__|____|____|__|_______________|____|_________|_______| > >> + */ > >> +union ldr_str_instr_class { > >> + uint32_t value; > >> + struct ldr_str { > >> + unsigned int rt:5; /* Rt register */ > > > > I don't think it's a particular good idea to use a bit-field here, if that > > is expected to mimic a certain hardware provided bit pattern. > > It works in practise (TM), but the C standard does not guarantee the order > > the bits are allocated (ISO/IEC 9899:201x §6.7.2.1, stanza 11). > > Since you are *reading* only from the instruction word, you should get away > > with accessor macros to extract the bits you need. For instance for > > filtering the opcode, you could use: ((insn & 0x3fe00c00) == 0x38400400) > > Yes, this is a very good point. I will use bitmasks to access the bits > from the register. > > I saw the same logic (ie using bitfields) is used for some other > registers as well. For eg hsr_dabt, hsr_iabt in > xen/include/asm-arm/hsr.h. May be that needs fixing as well for some > other time. :) (erroneously put that part already in the mail to Bertrand, for the sake of completeness here it is again): Well, there is no easy and clear answer as to whether to use bit-fields in those occasions or not. From a practical point of view, it works (TM), and has some advantages, like saving you from fiddling around with a 9-bit sign extension, for instance. But the Linux kernel discourages those works-for-me solutions, one killer problem here is endianess, which is not a problem for Xen, IIRC. I personally prefer robust code: by not relying on certain implementation specifics (and be they very obvious or wide-spread), there will be less surprises in the future. So I'd leave this up to the maintainers to decide, IIUC the original suggestion came from Bertrand? Cheers, Andre > >> + unsigned int rn:5; /* Rn register */ > >> + unsigned int fixed1:2; /* value == 01b */ > >> + int imm9:9; /* imm9 */ > >> + unsigned int fixed2:1; /* value == 0b */ > >> + unsigned int opc:2; /* opc */ > >> + unsigned int fixed3:2; /* value == 00b */ > >> + unsigned int v:1; /* vector */ > >> + unsigned int fixed4:3; /* value == 111b */ > >> + unsigned int size:2; /* size */ > >> + } code; > >> +}; > >> + > >> union hsr { > >> register_t bits; > >> struct { > >
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |