[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug] Bring up Dom0 on Arm board



Hi,

On 10/01/2023 02:35, 蔡力刚 wrote:
On 06/01/2023 06:41, 蔡力刚 wrote:
I try to run Xen on a Rockchip RK3588 board and encountered some problems.
The command I used:
load mmc 1:1 0xC400000 dom0-Image;
load mmc 1:1 0x47C00000 xen4.14.5;

We have made a lot of improvement since Xen 4.14. This is also out of
support since January 2022. It is still security supported but not for
long (July 2023).

Would you be able to try Xen 4.17 (this was released a month ago)?
I also tried the Xen4.17.0, But failed to run xl command in dom0, the error 
like below:
root@RK3588:~# xl list
libxl: error: libxl_utils.c:820:libxl_cpu_bitmap_alloc: failed to retrieve the 
maximum number of cpus
libxl: error: libxl_utils.c:820:libxl_cpu_bitmap_alloc: failed to retrieve the 
maximum number of cpus
libxl: error: libxl_utils.c:820:libxl_cpu_bitmap_alloc: failed to retrieve the 
maximum number of cpus
libxl: error: libxl_domain.c:334:libxl_list_domain: getting domain info list: 
Permission denied
libxl_list_domain failed.
In Rootfs, Xen tool version is 4.14.3,
I suspect that Xen tool and Xen hypervisor version conflict cause this problem, 
is that right?

Part of the ABI used between the tools and the hypervisor is not stable. So you will need to rebuild the tools for every new major releases (for minor releases it is usually not necessary).

And although I used Xen4.17.0, The problems I mentioned are still there,
The Device tree generation failed error, the dev mali0 and mmcblk2 still failed 
to run.

I will reply to this below.

load mmc 1:1 0x47E00000 rk3588-evb7-lp4-v10-linux.dtb
fdt addr 0x47E00000
fdt resize 1024
fdt set /chosen \#address-cells <0x2>
fdt set /chosen \#size-cells <0x2>
fdt set /chosen xen,xen-bootargs "console=dtuart dtuart=serial2 dom0_mem=4G 
dom0_max_vcpus=4 vwfi=native sched=null"
fdt mknod /chosen dom0
fdt set /chosen/dom0 compatible "xen,linux-zimage" "xen,multiboot-module" 
"multiboot,module"
fdt set /chosen/dom0 reg <0x0 0xC400000 0x0 0x2000000>
fdt set /chosen xen,dom0-bootargs "console=hvc0 earlycon=xen earlyprintk=xen 
clk_ignore_unused root=/dev/mmcblk0p6 rw rootwait"
setenv fdt_high 0xffffffffffffffff
booti 0x47C00000 - 0x47E00000
1. Device tree generation failed errors.
when I used the default dtb to run xen, Painc occured on xen.
log:
(XEN) Unable to get irq 0 for /pcie@fe180000/legacy-interrupt-controller
(XEN) Device tree generation failed (-1).
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Could not set up DOM0 guest OS
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...
the dtb:
pcie2x1l1_intc: legacy-interrupt-controller {
interrupt-controller;
#address-cells = <0>;
#interrupt-cells = <1>;
interrupt-parent = <&gic>;
interrupts = <GIC_SPI 245 IRQ_TYPE_EDGE_RISING>;
}; > I modified the legacy-interrupt-controller of interrupts from
IRQ_TYPE_EDGE_RISING to IRQ_TYPE_LEVEL_HIGH.

Based on this change, I would say the call to irq_set_spi_type() (called
from platform_get_irq()) will return -1. The function will validate the
type and will throw an error if there is a problem.

Can you confirm whether the interrupt is shared with another device? Is
it described twice in the DT?

If yes to one of the two questions. Is the type different?

You could also print the old and new type in irq_set_spi_type() to
confirm the difference.
It may cause by the interrupt interrupt-controller@fe600000,
set the interrupt IRQ_TYPE_LEVEL_HIGH first according to 
interrupt-controller@fe600000 ,
then irq_set_spi_type() try to set the interrupt IRQ_TYPE_EDGE_RISING according 
to
pcie2x1l1_intc: legacy-interrupt-controller, but return -1.
the gic: interrupt-controller@fe600000 like below:
gic: interrupt-controller@fe600000 {
  compatible = "arm,gic-v3";
  #interrupt-cells = <3>;
  #address-cells = <2>;
  #size-cells = <2>;
  ranges;
  interrupt-controller;
  reg = <0x0 0xfe600000 0 0x10000>, /* GICD */
  <0x0 0xfe680000 0 0x100000>; /* GICR */
  interrupts = <GIC_PPI 9 IRQ_TYPE_LEVEL_HIGH>;
  its0: msi-controller@fe640000 {
  compatible = "arm,gic-v3-its";
  msi-controller;
  #msi-cells = <1>;
  reg = <0x0 0xfe640000 0x0 0x20000>;
  };
  its1: msi-controller@fe660000 {
  compatible = "arm,gic-v3-its";
  msi-controller;
  #msi-cells = <1>;
  reg = <0x0 0xfe660000 0x0 0x20000>;
  };
};

I am a bit confused. Reading the binding, it looks like the GIC and PCI interrupt controller don't share an interrupt. Can you confirm the IRQ number you saw in Xen?

And bring up Xen successed, through I not sure the modification is correct.
2. After boot up, I tried to input in the console but failed.
I added some log in api do_trap_guest_sync, try_handle_mmio as below:
In function do_trap_guest_sync:
static unsigned long ec = 0;
if(hsr.ec != ec)
{
gprintk(XENLOG_INFO, "do_trap_guest_sync hsr.ec=%x \n", hsr.ec);
ec = hsr.ec;
}
In function try_handle_mmio:
gprintk(XENLOG_INFO, "handler->addr: %lx\n", handler->addr);
Then everytime I type enter in the console, console show the log below:
(XEN) d0v0 do_trap_guest_sync hsr.ec=24
(XEN) d0v0 handler->addr: fe600000
(XEN) d0v0 handler->addr: fe600000
(XEN) d0v0 do_trap_guest_sync hsr.ec=18
Is that something wrong with the GIC interrupt ?
A few questions:
  * What is the corresponding device in the host physical address space
for 0xfe600000?
  * What is the UART on your board? Is there any specific workaround
required?
0xfe600000 is: gic: interrupt-controller@fe600000, full content above.

Thanks. So the trap is expected because the GICD exposed to the domains is emulated.

The UART is 8250, I set menuconfig in Debugging Options, the config like below:
[*] Early printk (Early printk via 8250 UART) --->
(0Xfeb50000) Early printk, physical base address of debug UART
(2) Early printk, left-shift to apply to the register offsets within the 8250 
UART
I found that if I config the early printk in xen, I don't need the 
xen,dom0-bootargs=
"console=hvc0 earlycon=xen earlyprintk=xen" anymore, is that right?

I don't know the exact configuration of the 8250. So I can't tell whether this is correct.

That said, as you see some ouput, it would indicate that the configuration might be right.

This could indicate that Xen is still using early printk and therefore it would not be able to read character. From your previous email, I see that you are requesting serial2. I am assuming this is an alias to the same UART as the one you configure for the early printk?

Can you paste the content of the related Device-Tree node? Also, I would suggest to check if there are any errors in the Xen logs.

3. In Dom0, the dev mali0 and mmcblk2 is missing, and weston running failed.
Do you have any log in the kernel indicating why the mali and/or the mmc
driver didn't load?

Also, can you confirm that the same kernel image works without Xen?
Boot without xen, the mali0 log like below:
root@RK3588:/# dmesg | grep mali
[ 4.192093] mali fb000000.gpu: Kernel DDK version g12p0-01eac0
[ 4.192148] mali fb000000.gpu: Looking up mali-supply from device tree
[ 4.194569] mali fb000000.gpu: Looking up mem-supply from device tree
[ 4.194747] mali fb000000.gpu: Looking up mali-supply from device tree
[ 4.194792] mali fb000000.gpu: Looking up mem-supply from device tree
[ 4.195383] mali fb000000.gpu: leakage=16
[ 4.195457] mali fb000000.gpu: Looking up mali-supply from device tree
[ 4.197004] mali fb000000.gpu: pvtm=858
[ 4.197099] mali fb000000.gpu: pvtm-volt-sel=2
[ 4.198437] mali fb000000.gpu: avs=0
[ 4.201271] W : [File] : 
drivers/gpu/arm/bifrost/platform/rk/mali_kbase_config_rk.c; [Line] : 136; 
[Func] :
kbase_platform_rk_init(); power-off-delay-ms not available.
[ 4.206668] mali fb000000.gpu: GPU hardware issue table may need updating:
[ 4.206683] mali fb000000.gpu: GPU identified as 0x7 arch 10.8.6 r0p0 status 0
[ 4.206810] mali fb000000.gpu: No priority control manager is configured
[ 4.206823] mali fb000000.gpu: No memory group manager is configured
[ 4.206852] mali fb000000.gpu: Protected memory allocator not available
[ 4.208342] mali fb000000.gpu: Couldn't find power_model DT node matching 
'arm,mali-simple-power-model'
[ 4.208356] mali fb000000.gpu: Error -22, no DT entry: 
mali-simple-power-model.static-coefficient = 1*[0]
[ 4.208572] mali fb000000.gpu: Error -22, no DT entry: 
mali-simple-power-model.dynamic-coefficient = 1*[0]
[ 4.208766] mali fb000000.gpu: Error -22, no DT entry: 
mali-simple-power-model.ts = 4*[0]
[ 4.208958] mali fb000000.gpu: Error -22, no DT entry: 
mali-simple-power-model.thermal-zone = ''
[ 4.212287] mali fb000000.gpu: Using configured power model 
mali-lodx-power-model, and fallback mali-simple-power-model
[ 4.212539] mali fb000000.gpu: l=10000 h=85000 hyst=5000 l_limit=0 
h_limit=800000000 h_table=0
[ 4.214528] mali fb000000.gpu: Probed as mali0
[ 4.318492] I : [File] : 
drivers/gpu/arm/mali400/mali/linux/mali_kernel_linux.c; [Line] : 405; [Func] : 
mali_module_init();
svn_rev_string_from_arm of this mali_ko is '', rk_ko_ver is '5', built at 
'10:04:19', on 'Dec 12 2022'.
[ 6.959913] mali fb000000.gpu: Loading Mali firmware 0x1010000
[ 6.960491] mali fb000000.gpu: Protected memory allocator not found, Firmware 
protected mode entry will not be supported
[ 6.960498] mali fb000000.gpu: Protected memory allocator not found, Firmware 
protected mode entry will not be supported
[ 6.960503] mali fb000000.gpu: Protected memory allocator not found, Firmware 
protected mode entry will not be supported
Boot with xen, the mali0 log like below:
[ 2.969638] I : [File] : 
drivers/gpu/arm/mali400/mali/linux/mali_kernel_linux.c; [Line] : 405; [Func] : 
mali_module_init();
  svn_rev_string_from_arm of this mali_ko is '', rk_ko_ver is '5', built at 
'14:06:00', on 'Dec 16 2022'.
So no error at all afterwards? Interestingly, this line is not shown in your output above. So I would suggest to check the code to understand if somehow we are using a different path.

Boot without xen, the mmcblk2 log like below:
root@RK3588:/# dmesg |grep sdmmc
root@RK3588:/# dmesg |grep mmc
[ 1.842460] Kernel command line: storagemedia=emmc 
androidboot.storagemedia=emmc androidboot.mode=normal
androidboot.verifiedbootstate=orange rw rootwait 
earlycon=uart8250,mmio32,0xfeb50000
console=ttyFIQ0 irqchip.gicv3_pseudo_nmi=0 root=PARTUUID=614e0000-0000
[ 3.981216] dwmmc_rockchip fe2c0000.mmc: IDMAC supports 32-bit address mode.
[ 3.981321] dwmmc_rockchip fe2c0000.mmc: Using internal DMA controller.
[ 3.981349] dwmmc_rockchip fe2c0000.mmc: Version ID is 270a
[ 3.981435] dwmmc_rockchip fe2c0000.mmc: DW MMC controller at irq 77,32 bit 
host data width,256 deep fifo
[ 3.981588] dwmmc_rockchip fe2c0000.mmc: Looking up vmmc-supply from device tree
[ 3.982932] dwmmc_rockchip fe2c0000.mmc: Looking up vqmmc-supply from device 
tree
[ 3.983121] sdhci-dwcmshc fe2e0000.mmc: Looking up vmmc-supply from device tree
[ 3.983135] sdhci-dwcmshc fe2e0000.mmc: Looking up vmmc-supply property in node 
/mmc@fe2e0000 failed
[ 3.983168] sdhci-dwcmshc fe2e0000.mmc: Looking up vqmmc-supply from device tree
[ 3.983177] sdhci-dwcmshc fe2e0000.mmc: Looking up vqmmc-supply property in 
node /mmc@fe2e0000 failed
[ 3.983294] dwmmc_rockchip fe2c0000.mmc: Failed getting OCR mask: -22
[ 3.983461] dwmmc_rockchip fe2c0000.mmc: could not set regulator OCR (-22)
[ 3.983473] dwmmc_rockchip fe2c0000.mmc: failed to enable vmmc regulator
[ 3.995539] mmc_host mmc2: Bus speed (slot 0) = 400000Hz (slot req 400000Hz, 
actual 400000HZ div = 0)
[ 4.012246] mmc0: SDHCI controller on fe2e0000.mmc [fe2e0000.mmc] using ADMA
[ 4.043129] mmc_host mmc2: Bus speed (slot 0) = 49500000Hz (slot req 
50000000Hz, actual 49500000HZ div = 0)
[ 4.043689] mmc2: new high speed SDHC card at address 0007
[ 4.044681] mmcblk2: mmc2:0007 SD8GB 7.21 GiB
[ 4.047294] mmcblk2: p1
[ 4.060614] mmc0: new HS400 Enhanced strobe MMC card at address 0001
[ 4.061406] mmcblk0: mmc0:0001 BJTD4R 29.1 GiB
[ 4.061539] mmcblk0boot0: mmc0:0001 BJTD4R partition 1 4.00 MiB
[ 4.061663] mmcblk0boot1: mmc0:0001 BJTD4R partition 2 4.00 MiB
[ 4.062273] mmcblk0rpmb: mmc0:0001 BJTD4R partition 3 4.00 MiB, chardev (236:0)
[ 4.068960] mmcblk0: p1 p2 p3 p4 p5 p6 p7 p8
[ 5.835901] EXT4-fs (mmcblk0p6): recovery complete
[ 5.836462] EXT4-fs (mmcblk0p6): mounted filesystem with ordered data mode. 
Opts: (null)
[ 5.839971] storagemedia=emmc
[ 5.867859] EXT4-fs (mmcblk0p6): re-mounted. Opts: (null)
[ 6.409008] FAT-fs (mmcblk2p1): utf8 is not a recommended IO charset for FAT 
filesystems, filesystem will be case sensitive!
[ 6.414043] FAT-fs (mmcblk2p1): Volume was not properly unmounted. Some data 
may be corrupt. Please run fsck.
[ 7.039362] EXT4-fs (mmcblk0p7): mounting ext2 file system using the ext4 
subsystem
[ 7.040313] EXT4-fs (mmcblk0p7): warning: mounting unchecked fs, running e2fsck 
is recommended
[ 7.041225] EXT4-fs (mmcblk0p7): mounted filesystem without journal. Opts: 
(null)
[ 7.162903] EXT4-fs (mmcblk0p8): mounting ext2 file system using the ext4 
subsystem
[ 7.165824] EXT4-fs (mmcblk0p8): warning: mounting unchecked fs, running e2fsck 
is recommended
[ 7.172777] EXT4-fs (mmcblk0p8): mounted filesystem without journal. Opts: 
(null)
Boot with xen, the mmcblk2(sdmmc) log like below:
root@RK3588:/sys/firmware# dmesg |grep sdmmc
[ 69.563072] rockchip-pm-domain fd8d8000.power-management:power-controller:

It looks like the command line between Xen and baremetal is different. When running under Xen, the command line should mostly be the same (aside clk_* and console=hvc0). Otherwise you don't compare the same and therefore the difference may only be due to your command line options.

Looking up sdmmc-supply from device tree
[ 69.563112] rockchip-pm-domain fd8d8000.power-management:power-controller:
Looking up sdmmc-supply property in node 
/power-management@fd8d8000/power-controller failed

Can you check why this is failing?

While I can't input in the console, I tried use console via ssh.
In the /dev list, I can't find mali0 and mmcblk2(sdcard),
In u-boot mode, mmcblk2 can be recognized, I loaded dom0-Image, xen, and dtb 
from mmcblk2.
While booting without xen, the mali0 and mmcblk2 can be recognized,
Is that something wrong with xen while Initialize the driver?
4. xl command can not executed, and seems to be suspended.

xl requires the initscript (or systemd service) to be executed. The fact
it hangs usually means this didn't happen.

Just in case, can you also check that your kernel has been build with
Xen support?
initscript is xendriverdomain? I tried the command in dom0 like below:
(xl list command suspended)
root@RK3588:/# ./etc/init.d/xendriverdomain restart

You would want to use xencommons rather than xendriverdomain.

root@RK3588:/#
root@RK3588:/# ps aux |grep xen
root 59 0.0 0.0 0 0 ? S 00:00 0:00 [xenbus]
root 60 0.0 0.0 0 0 ? S 00:00 0:00 [xenwatch]
root 165 0.0 0.0 0 0 ? D 00:00 0:00 [xenbus_probe]
root 5993 0.0 0.0 3044 380 pts/0 S+ 00:09 0:00 grep xen
root@RK3588:/# xl list
Name ID Mem VCPUs State Time(s)
I config the kernel according the manuals:
https://wiki.xenproject.org/wiki/Mainline_Linux_Kernel_Configs#Configuring_the_Kernel_for_dom0_Support
And used the kernel/arch/arm64/boot/Image as the dom0-Image.
How can I check the Kernel has been build with Xen support?

You can grep XEN in your kernel config. You should see some enabled.

But looking at the output above, you don't have xenstored running. So the most probable cause if that you didn't run xencommons.

Cheers,

--
Julien Grall



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.