Re: [Xen-devel] [RFC PATCH] xen: free_domheap_pages: delay page scrub to tasklet

On 05/20/2014 03:26 PM, Jan Beulich wrote:
>>>> On 20.05.14 at 09:11, <bob.liu@xxxxxxxxxx> wrote:
>> On 05/20/2014 02:27 PM, Jan Beulich wrote:
>>> So if you have the system scrub 1Tb at boot (via suitable
>>> dom0_mem=), how long does that take?
>> I only have a 32G machine, the 1Tb bug was reported by our testing engineer.
>> On 32G machine, if set dom0_mem=2G the scrub time in "(XEN) Scrubbing
>> Free RAM:" is around 12s at boot.
>> The xl destroy time for a 30G guest is always around 15s even decreased
>> the rate of calling hypercall_preempt_check().
> Okay, so these numbers at least appear to correlate. And in fact I
> think 3Gb/s (approximated) isn't that unreasonable a number; at
> least it's not orders of magnitude away from theoretical bandwidth.
> Which means yes, better dealing with the load resulting from the
> post-guest-death scrubbing would be desirable, but otoh it's also
> not really unexpected for this taking minutes for huge guests. Any
> change here clearly need proper judgment between latency and
> the effect on other guests it has: As said previously, impacting all
> other guests just so that the scrubbing would get done quickly
> doesn't seem right either.

Yes, so I have sent out an new version mainly based on your suggestions
with title "[RFC PATCH v2] xen: free_domheap_pages: delay page scrub to
idle loop".

Pages are added to a percpu scrub list in free_domheap_pages(), and the
real scrub work is done in idle_loop(). By this way, no scrub work is
assigned to unrelated cpu which never executes free_domheap_pages().

The trade off is we can't use all cpu resources to do the scrub job in
But at least we arrived:
1. Make xl destroy return faster, ~3s for a 30G guest.
2. Do the scrub job in idle_loop() is still faster than in
relinquish_memory(), because E.g there are some atomic instructions in
relinquish_memory() every loop.

Please take a review.


