[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] kernel bootup slow issue on ovm3.1.1
>>> On 10.08.12 at 06:40, "zhenzhong.duan" <zhenzhong.duan@xxxxxxxxxx> wrote: > ä 2012-08-09 18:35, Jan Beulich åé: >>>>> On 09.08.12 at 11:42, "zhenzhong.duan"<zhenzhong.duan@xxxxxxxxxx> wrote: >>> ä 2012-08-08 23:01, Jan Beulich åé: >>>>>>> On 08.08.12 at 11:48, "zhenzhong.duan"<zhenzhong.duan@xxxxxxxxxx> >>>>>>> wrote: >>>>> ä 2012-08-07 16:37, Jan Beulich åé: >>>>> Some spin at stop_machine after finish their job. >>>> And here you'd need to find out what they're waiting for, >>>> and what those CPUs are doing. >>> They are waiting the vcpu calling generic_set_all and those spin at >>> set_atomicity_lock. >>> In fact, all are waiting generic_set_all >> I think we're moving in circles - what is the vCPU currently >> generic_set_all() then doing? > Add some debug print, generic_set_all->prepare_set->write_cr0 took much > time, > all else are quick. set_atomicity_lock serialized this process between > cpus, make it worse. > One iteration: > MTRR: CPU 2 > prepare_set: before read_cr0 > prepare_set: before write_cr0 ------*block here* Yeah, that CR0 write disables the caches, and that's pretty expensive on EPT (not sure why NPT doesn't use/need the same hook) when the guest has any active MMIO regions: vmx_set_uc_mode(), when HAP is enabled, calls ept_change_entry_emt_with_range(), which is a walk through the entire guest page tables (i.e. scales with guest size, or, to be precise, with the highest populated GFN). Going back to your original mail, I wonder however why this gets done at all. You said it got there via mtrr_aps_init() \-> set_mtrr() \-> mtrr_work_handler() yet this isn't done unconditionally - see the comment before checking mtrr_aps_delayed_init. Can you find out where the obviously necessary call(s) to set_mtrr_aps_delayed_init() come(s) from? > prepare_set: before wbinvd > prepare_set: before read_cr4 > prepare_set: before write_cr4 > prepare_set: before __flush_tlb > prepare_set: before rdmsr > prepare_set: before wrmsr > generic_set_all: before set_mtrr_state > generic_set_all: before pat_init > post_set: before wbinvd > post_set: before wrmsr > post_set: before write_cr0 > post_set: before write_cr4 > >> >>>> There's not that much being done in generic_set_all(), so the >>>> code should finish reasonably quickly. Are you perhaps having >>>> more vCPU-s in the guest than pCPU-s they can run on? >>> System env is an exalogic node with 24 cores + 100G mem (2 socket , 6 >>> cores per socket, 2 HT threads per core). >>> Bootup a pvhvm with 12vpcus (or 24) + 90 GB + pci passthroughed device. >> So you're indeed over-committing the system. How many vCPU-s >> does you Dom0 have? Are there any other VMs? Is there any >> vCPU pinning in effect? > dom0 boot with 24 vcpus(same result with dom0_max_vcpus=4). No other vm > except dom0. All 24 vcpus spin from xentop result. Below is xentop clip. Yes, this way you do overcommit - 24 guest vCPU-s spinning, plus anything Dom0 may need to do. >>>> Does >>>> your hardware support Pause-Loop-Exiting (or the AMD >>>> equivalent, don't recall their term right now)? >>> I have no access to serial line, could I get the info by a command? >> "xl dmesg" run early enough (i.e. before the log buffer wraps). > Below is xl dmesg result for your reference. thanks >... > (XEN) VMX: Supported advanced features: > (XEN) - APIC MMIO access virtualisation > (XEN) - APIC TPR shadow > (XEN) - Extended Page Tables (EPT) > (XEN) - Virtual-Processor Identifiers (VPID) > (XEN) - Virtual NMI > (XEN) - MSR direct-access bitmap > (XEN) - Unrestricted Guest I'm sorry, I had expected this to be printed here, but it isn't. Hence I can't tell for sure whether PLE is implemented there, but given how long it has been available it ought to be when "Unrestricted Guest" is there (which iirc got introduced much later). Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |