[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Kernel 3.11 / 3.12 OOM killer and Xen ballooning



On 01/16/2014 12:35 AM, James Dingwall wrote:
> Bob Liu wrote:
>> On 01/15/2014 04:49 PM, James Dingwall wrote:
>>> Bob Liu wrote:
>>>> On 01/07/2014 05:21 PM, James Dingwall wrote:
>>>>> Bob Liu wrote:
>>>>>> Could you confirm that this problem doesn't exist if loading tmem
>>>>>> with
>>>>>> selfshrinking=0 during compile gcc? It seems that you are compiling
>>>>>> difference packages during your testing.
>>>>>> This will help to figure out whether selfshrinking is the root cause.
>>>>> Got an oom with selfshrinking=0, again during a gcc compile.
>>>>> Unfortunately I don't have a single test case which demonstrates the
>>>>> problem but as I mentioned before it will generally show up under
>>>>> compiles of large packages such as glibc, kdelibs, gcc etc.
>>>>>
>>>> So the root cause is not because enabled selfshrinking.
>>>> Then what I can think of is that the xen-selfballoon driver was too
>>>> aggressive, too many pages were ballooned out which causeed heavy
>>>> memory
>>>> pressure to guest OS.
>>>> And kswapd started to reclaim page until most of pages were
>>>> unreclaimable(all_unreclaimable=yes for all zones), then OOM Killer was
>>>> triggered.
>>>> In theory the balloon driver should give back ballooned out pages to
>>>> guest OS, but I'm afraid this procedure is not fast enough.
>>>>
>>>> My suggestion is reserve a min memory for your guest OS so that the
>>>> xen-selfballoon won't be so aggressive.
>>>> You can do it through parameters selfballoon_reserved_mb or
>>>> selfballoon_min_usable_mb.
>>> I am still getting OOM errors with both of these set to 32 so I'll try
>>> another bump to 64.  I think that if I do find values which prevent it
>>> though then it is more of a work around than a fix because it still
>>> suggests that swap is not being used when ballooning is no longer
>> Yes, it's like a work around. But I don't think there is a better way.
>>
>>  From the recent OOM log your reported:
>> [ 8212.940769] Free swap  = 1925576kB
>> [ 8212.940770] Total swap = 2097148kB
>>
>> [504638.442136] Free swap  = 1868108kB
>> [504638.442137] Total swap = 2097148kB
>>
>> 171572KB and 229040KB data are swapped out to swap disk, I think there
>> are already significantly values for guest OS has only ~300M usable
>> memory.
>> guest OS can't find out pages suitable for swap any more after so many
>> pages are swapped out, although at that time the swap device still have
>> enough space.
>>
>> The OOM may not be triggered if the balloon driver can give back memory
>> to guest OS fast enough but I think it's unrealistic.
>> So the best way is reserve more memory to guest OS.
>>
>>> capable of satisfying the request.  I've also got an Ubuntu Saucy (3.11
>>> kernel) guest running on the dom0 with tmem activated so I'm going to
>>> see if I can find a comparable workload to see if I get the same issue
>>> with a different kernel configuration.
>>>
> I've done a bit more testing and seem to have found an extra condition
> which is affecting the OOM behaviour in my guests.  All my Gentoo guests
> have swap space backed by a dm-crypt volume and if I remove this layer
> then things seem to be behaving much more reliably.  In my Ubuntu guests
> I have plain swap space and so far I haven't been able to trigger an OOM
> condition.  Is it possible that it is the dm-crypt layer failing to get
> working memory when swapping something in/out and causing the error?
> 

One possible reason is the dm layer and related dm target driver occupy
a significant mount of memory and there is no way for xenselfballoon to
know this. So selfballoon driver ballooned out more memory than the
system really requires.

I have made a patch by reserving extra 10% of original total memory, by
this way I think we can make the system much more reliably in all cases.
Could you please have a test? You don't need to set
selfballoon_reserved_mb by yourself any more.

-- 
Regards,
-Bob

Attachment: xen_selfballoon_deaggressive.patch
Description: Text Data

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.