[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [REPOST] Re: [PATCH] GSoC 2010 - Memory hotplug support for Xen guests - third fully working version
Hello, Just wondering.. what's the status on this feature? Is it ready to be merged? -- Pasi On Wed, Aug 25, 2010 at 04:00:27PM +0200, Daniel Kiper wrote: > Hi, > > On Thu, Aug 12, 2010 at 05:43:33PM -0700, Jeremy Fitzhardinge wrote: > > On 08/11/2010 06:22 PM, Daniel Kiper wrote: > > >>Overall, this looks much better. The next step is to split this into at > > >>least two patches: one for the core code, and one for the Xen bits. > > >>Each patch should do just one logical operation, so if you have several > > >>distinct changes to the core code, put them in separate patches. > > >I will do that if this patch will be accepted. > > > > First step is to post it to lkml for discussion, cc:ing the relevant > > maintainers. (I'm not really sure who that is at the moment. It will > > take some digging around in the history.) > > I took all relevant addresses (sorry if I missed somebody) from MAINTAINERS > file and they are in To in most of e-mails from me. > > > >>Can you find a clean way to prevent/disable ARCH_MEMORY_PROBE at runtime > > >>when in a Xen context? > > >There is no simple way to do that. It requiers to do some > > >changes in drivers/base/memory.c code. I think it should > > >be done as kernel boot option (on by default to not break > > >things using this interface now). If it be useful for maintainers > > >of mm/memory_hotplug.c and drivers/base/memory.c code then > > >I could do that. Currently original arch/x86/Kconfig version > > >is restored. > > > > I think adding a global flag which the Xen balloon driver can disable > > should be sufficient. There's no need to make an separate user-settable > > control. > > OK. > > > >>>+/* we are OK calling __meminit stuff here - we have > > >>>CONFIG_MEMORY_HOTPLUG > > >>>*/ > > >>>+static int __ref xen_add_memory(int nid, u64 start, u64 size) > > >>Could this be __meminit too then? > > >Good question. I looked throught the code and could > > >not find any simple explanation why mm/memory_hotplug.c > > >authors used __ref instead __meminit. Could you (mm/memory_hotplug.c > > >authors/maintainers) tell us why ??? > > > > Quite possibly a left-over from something else. You could just try > > making it __meminit, then compile with, erm, the option which shows you > > section conflicts (it shows the number of conflicts at the end of the > > kernel build by default, and tells you how to explicitly list them). > > Small reminder: make CONFIG_DEBUG_SECTION_MISMATCH=y > > I reviewed kernel source code once again. It is OK. Normaly it is > not allowed to reference code/data tagged as .init.* because > that sections are freed at the end of kernel boot sequence and > they do not exists any more in memory. However it is sometimes > required to use code/data marked .init.*. To allow that __ref > tag is used and then referenced objects are not removed from > memory (and no warnings are displayed during kernel compilation). > > > >>What's this for? I see all its other users are in the memory hotplug > > >>code, but presumably they're concerned about a real S3 suspend. Do we > > >>care about that here? > > >Yes, because as I know S3 state is supported by Xen guests. > > > > Yes, but I'm assuming the interaction between S3 and ACPI hotplug memory > > isn't something that concerns a Xen guest; our hotplug mechanism is > > completely different. > > Suspend/Hibernation code in Linux Kernel is platform independent > to some extent and it does not require ACPI. It means that > lock_system_sleep/unlock_system_sleep is required in that > place to have memory state intact during suspend/hibernation. > > > >>>+ r->name = "System RAM"; > > >>How about making it clear its Xen hotplug RAM? Or do things care about > > >>the "System RAM" name? > > >As I know no however as I saw anybody do not differentiate between > > >normal and hotplugged memory. I thought about that ealier however > > >stated that this soultion does not give us any real gain. That is why > > >I decided to use standard name for hotplugged memory. > > > > Its cosmetic, but it would be useful to see what's going on. > > If you wish I will do that, however then it should be changed > as well add_registered_memory() function syntax. It should > contain pointer to name published through /sys/firmware/memmap > interface. I am not sure it is good solution to change > add_registered_memory() function syntax which I think should be > same as add_memory() function syntax. > > > >+#ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG + unsigned long > > >boot_max_pfn; + u64 hotplug_start_paddr; + u64 hotplug_size; > > > > So does this mean you only support adding a single hotplug region? What > > happens if your initial increase wasn't enough and you want to add > > more? Would you make this a list of hot-added memory or something? > > > > But I'm not even quite sure why you need to keep this as global data. > > No. It supports multiple allocations. This variables are > used mostly for communication between allocate_additional_memory > and hotplug_allocated_memory functions. > > > >PFN_PHYS(section_nr_to_pfn(pfn_to_section_nr(balloon_stats.boot_max_pfn) > > >+ 1)); + r_size = (balloon_stats.target_pages - > > >balloon_stats.current_pages) << PAGE_SHIFT; > > > > So this just reserves enough resource to satisfy the current outstanding > > requirement? That's OK if we can repeat it, but it looks like it will > > only do this once? > > For full description of current algorithm > please look at the end of this e-mail. > > > >+ + rc = allocate_resource(&iomem_resource, r, r_size, r_min, + > > >ULONG_MAX, PAGE_SIZE, NULL, NULL); > > > > Does this need to be section aligned, with a section size? Or is any old > > place OK? > > It is always PAGE_SIZE aligned and not below than > <max_address_of_section_allocated_at_boot> + 1. > > > >reservation.nr_extents = nr_pages; + + rc = > > >HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation); > > > > Allocating all the memory here seems sub-optimal. > > Here is allocated min(nr_pages, ARRAY_SIZE(frame_list)) of pages. > > > >0 ? rc : rc != nr_pages; +} + +static void hotplug_allocated_memory(void) > > > > Why is this done separately from the reservation/allocation from xen? > > First memory is allocated in batches of min(nr_pages, ARRAY_SIZE(frame_list)) > of pages and then whole allocated memory is hotplugged. > > > >{ - unsigned long target = balloon_stats.target_pages; + int nid, ret; > > >+ struct memory_block *mem; + unsigned long pfn, pfn_limit; + + nid = > > >memory_add_physaddr_to_nid(balloon_stats.hotplug_start_paddr); > > > > Is the entire reserved memory range guaranteed to be within one node? > > > > I see that this function has multiple definitions depending on a number > > of config settings. Do we care about what definition it has? > > As I know (maybe I have missed something) currently Xen does not support > NUMA in guests and nid is always 0. However maybe it will be good to create > Xen specific version of memory_add_physaddr_to_nid function. > > > >= online_pages(PFN_DOWN(balloon_stats.hotplug_start_paddr), + > > >balloon_stats.hotplug_size >> PAGE_SHIFT); > > > > Are the pages available for allocation by the rest of the kernel from > > this point on? > > Yes. > > > Which function allocates the actual page structures? > > add_registered_memory() > > > > > >+ + if (ret) { + pr_err("%s: online_pages: Failed: %i\n", __func__, > > >ret); + goto error; + } + + pfn = > > >PFN_DOWN(balloon_stats.hotplug_start_paddr); + pfn_limit = pfn + > > >(balloon_stats.hotplug_size >> PAGE_SHIFT); - return target; + for (; > > >pfn < pfn_limit; pfn += PAGES_PER_SECTION) { + mem = > > >find_memory_block(__pfn_to_section(pfn)); + BUG_ON(!mem); + > > >BUG_ON(!present_section_nr(mem->phys_index)); + > > >mutex_lock(&mem->state_mutex); + mem->state = MEM_ONLINE; + > > >mutex_unlock(&mem->state_mutex); + } > > > > What does this do? How is it different from what online_pages() does? > > This updates /sys/devices/system/memory/memory*/state files > which contain information about states of sections. > > > >+ else if (is_memory_resource_reserved()) + hotplug_allocated_memory(); > > > > Why can't this be done in allocate_additional_memory()? > > Because memory is allocated in relatively small > batches and then whole memory is hotplugged. > > > In your earlier, patch I think you made the firmware_map_add_hotplug() > > be specific to add_memory, but now you have it in __add_memory. Does it > > make a difference either way? > > It was not available in Linux Kernel Ver. 2.6.32.* > on which based first versions of this patch. > It updates /sys/firmware/memmap. > > > As before, this all looks reasonably good. I think the next steps > > should be: > > > > 1. identify how to incrementally allocate the memory from Xen, rather > > than doing it at hotplug time > > 2. identify how to disable the sysfs online interface for Xen > > hotplugged memory > > > > For 1., I think the code should be something like: > > > > increase_address_space(unsigned long pages) > > { > > - reserve resource for memory section > > - online section > > for each page in section { > > online page > > mark page structure allocated > > add page to ballooned_pages list > > balloon_stats.balloon_(low|high)++; > > } > > } > > Here is current algorithm: > - allocate_resource() with size requested by user, > - allocate memory in relatively small batches, > - add_registered_memory(), > - online_pages(), > - update /sys/devices/system/memory/memory*/state files. > > > The tricky part is making sure that the memory for the page structures > > has been populated so it can be used. Aside from that, there should be > > no need to have another call to > > HYPERVISOR_memory_op(XENMEM_populate_physmap, ...) aside from the > > existing one. > > Currently it is. > > > 2 requires a deeper understanding of the existing hotplug code. It > > needs to be refactored so that you can use the core hotplug machinery > > without enabling the sysfs page-onlining mechanism, while still leaving > > it available for physical hotplug. In the short term, having a boolean > > to disable the onlining mechanism is probably the pragmatic solution, so > > the balloon code can simply disable it. > > I think that sysfs should stay intact because it contains some > useful information for admins. We should reconsider avaibilty > of /sys/devices/system/memory/probe. In physical systems it > is available however usage without real hotplug support > lead to big crash. I am not sure we should disable probe in Xen. > Maybe it is better to stay in sync with standard behavior. > Second solution is to prepare an interface (kernel option > or only some enable/disable functions) which give possibilty > to enable/disable probe interface when it is required. > > Daniel > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxxxxxxxx > http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |