[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [linux-4.1 bisection] complete test-amd64-i386-xl-raw



branch xen-unstable
xenbranch xen-unstable
job test-amd64-i386-xl-raw
testid debian-di-install

Tree: linux 
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git
Tree: qemuu git://xenbits.xen.org/qemu-xen.git
Tree: xen git://xenbits.xen.org/xen.git

*** Found and reproduced problem changeset ***

  Bug is in tree:  linux 
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
  Bug introduced:  c5ad33184354260be6d05de57e46a5498692f6d6
  Bug not present: c5bcec6cbcbf520f088dc7939934bbf10c20c5a5
  Last fail repro: http://logs.test-lab.xenproject.org/osstest/logs/99823/


  commit c5ad33184354260be6d05de57e46a5498692f6d6
  Author: Lukasz Odzioba <lukasz.odzioba@xxxxxxxxx>
  Date:   Fri Jun 24 14:50:01 2016 -0700
  
      mm/swap.c: flush lru pvecs on compound page arrival
      
      [ Upstream commit 8f182270dfec432e93fae14f9208a6b9af01009f ]
      
      Currently we can have compound pages held on per cpu pagevecs, which
      leads to a lot of memory unavailable for reclaim when needed.  In the
      systems with hundreads of processors it can be GBs of memory.
      
      On of the way of reproducing the problem is to not call munmap
      explicitly on all mapped regions (i.e.  after receiving SIGTERM).  After
      that some pages (with THP enabled also huge pages) may end up on
      lru_add_pvec, example below.
      
        void main() {
        #pragma omp parallel
        {
        size_t size = 55 * 1000 * 1000; // smaller than  MEM/CPUS
        void *p = mmap(NULL, size, PROT_READ | PROT_WRITE,
                MAP_PRIVATE | MAP_ANONYMOUS , -1, 0);
        if (p != MAP_FAILED)
                memset(p, 0, size);
        //munmap(p, size); // uncomment to make the problem go away
        }
        }
      
      When we run it with THP enabled it will leave significant amount of
      memory on lru_add_pvec.  This memory will be not reclaimed if we hit
      OOM, so when we run above program in a loop:
      
        for i in `seq 100`; do ./a.out; done
      
      many processes (95% in my case) will be killed by OOM.
      
      The primary point of the LRU add cache is to save the zone lru_lock
      contention with a hope that more pages will belong to the same zone and
      so their addition can be batched.  The huge page is already a form of
      batched addition (it will add 512 worth of memory in one go) so skipping
      the batching seems like a safer option when compared to a potential
      excess in the caching which can be quite large and much harder to fix
      because lru_add_drain_all is way to expensive and it is not really clear
      what would be a good moment to call it.
      
      Similarly we can reproduce the problem on lru_deactivate_pvec by adding:
      madvise(p, size, MADV_FREE); after memset.
      
      This patch flushes lru pvecs on compound page arrival making the problem
      less severe - after applying it kill rate of above example drops to 0%,
      due to reducing maximum amount of memory held on pvec from 28MB (with
      THP) to 56kB per CPU.
      
      Suggested-by: Michal Hocko <mhocko@xxxxxxxx>
      Link: 
http://lkml.kernel.org/r/1466180198-18854-1-git-send-email-lukasz.odzioba@xxxxxxxxx
      Signed-off-by: Lukasz Odzioba <lukasz.odzioba@xxxxxxxxx>
      Acked-by: Michal Hocko <mhocko@xxxxxxxx>
      Cc: Kirill Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
      Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx>
      Cc: Vladimir Davydov <vdavydov@xxxxxxxxxxxxx>
      Cc: Ming Li <mingli199x@xxxxxx>
      Cc: Minchan Kim <minchan@xxxxxxxxxx>
      Cc: <stable@xxxxxxxxxxxxxxx>
      Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
      Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
      Signed-off-by: Sasha Levin <sasha.levin@xxxxxxxxxx>


For bisection revision-tuple graph see:
   
http://logs.test-lab.xenproject.org/osstest/results/bisect/linux-4.1/test-amd64-i386-xl-raw.debian-di-install.html
Revision IDs in each graph node refer, respectively, to the Trees above.

----------------------------------------
Running cs-bisection-step 
--graph-out=/home/logs/results/bisect/linux-4.1/test-amd64-i386-xl-raw.debian-di-install
 --summary-out=tmp/99823.bisection-summary --basis-template=96211 
--blessings=real,real-bisect linux-4.1 test-amd64-i386-xl-raw debian-di-install
Searching for failure / basis pass:
 99741 fail [host=baroque0] / 96211 [host=fiano0] 96183 [host=rimava1] 96160 
[host=baroque1] 95848 [host=italia0] 95818 [host=chardonnay0] 95591 
[host=fiano1] 95517 [host=pinot1] 95455 [host=rimava0] 95408 ok.
Failure / basis pass flights: 99741 / 95408
(tree with no url: minios)
(tree with no url: ovmf)
(tree with no url: seabios)
Tree: linux 
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git
Tree: qemuu git://xenbits.xen.org/qemu-xen.git
Tree: xen git://xenbits.xen.org/xen.git
Latest 5880876e94699ce010554f483ccf0009997955ca 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
6e20809727261599e8527c456eb078c0e89139a1 
44a072f0de0d57c95c2212bbce02888832b7b74f 
e763268781d341fef05d461f3057e6ced5e033f2
Basis pass 888172862fa78505c4e4674c205a06586443d83f 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
df553c056104e3dd8a2bd2e72539a57c4c085bae 
44a072f0de0d57c95c2212bbce02888832b7b74f 
c2a17869d5dcd845d646bf4db122cad73596a2be
Generating revisions with ./adhoc-revtuple-generator  
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git#888172862fa78505c4e4674c205a06586443d83f-5880876e94699ce010554f483ccf0009997955ca
 
git://xenbits.xen.org/osstest/linux-firmware.git#c530a75c1e6a472b0eb9558310b518f0dfcd8860-c530a75c1e6a472b0eb9558310b518f0dfcd8860
 
git://xenbits.xen.org/qemu-xen-traditional.git#df553c056104e3dd8a2bd2e72539a57c4c085bae-6e20809727261599e8527c456eb078c0e89139a1
 
git://xenbits.xen.org/qemu-xen.git#44a072f0de0d57c95c2212bbce02888832b7b74f-44a072f0de0d57c95c2212bbce02888832b7b74f
 
git://xenbits.xen.org/xen.git#c2a17869d5dcd845d646bf4db122cad73596a2be-e763268781d341fef05d461f3057e6ced5e033f2
Loaded 3004 nodes in revision graph
Searching for test results:
 95408 pass 888172862fa78505c4e4674c205a06586443d83f 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
df553c056104e3dd8a2bd2e72539a57c4c085bae 
44a072f0de0d57c95c2212bbce02888832b7b74f 
c2a17869d5dcd845d646bf4db122cad73596a2be
 95455 [host=rimava0]
 95517 [host=pinot1]
 95591 [host=fiano1]
 95848 [host=italia0]
 95818 [host=chardonnay0]
 96211 [host=fiano0]
 96160 [host=baroque1]
 96183 [host=rimava1]
 97279 fail irrelevant
 97434 fail irrelevant
 97394 fail irrelevant
 97496 fail 5880876e94699ce010554f483ccf0009997955ca 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
6e20809727261599e8527c456eb078c0e89139a1 
44a072f0de0d57c95c2212bbce02888832b7b74f 
b48be35ac86cd6369124cf06ca3006d086095297
 97558 fail 5880876e94699ce010554f483ccf0009997955ca 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
6e20809727261599e8527c456eb078c0e89139a1 
44a072f0de0d57c95c2212bbce02888832b7b74f 
b48be35ac86cd6369124cf06ca3006d086095297
 97613 fail 5880876e94699ce010554f483ccf0009997955ca 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
6e20809727261599e8527c456eb078c0e89139a1 
44a072f0de0d57c95c2212bbce02888832b7b74f 
b48be35ac86cd6369124cf06ca3006d086095297
 97644 fail 5880876e94699ce010554f483ccf0009997955ca 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
6e20809727261599e8527c456eb078c0e89139a1 
44a072f0de0d57c95c2212bbce02888832b7b74f 
b48be35ac86cd6369124cf06ca3006d086095297
 97692 fail 5880876e94699ce010554f483ccf0009997955ca 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
6e20809727261599e8527c456eb078c0e89139a1 
44a072f0de0d57c95c2212bbce02888832b7b74f 
e763268781d341fef05d461f3057e6ced5e033f2
 97730 fail 5880876e94699ce010554f483ccf0009997955ca 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
6e20809727261599e8527c456eb078c0e89139a1 
44a072f0de0d57c95c2212bbce02888832b7b74f 
e763268781d341fef05d461f3057e6ced5e033f2
 99604 []
 99664 fail 5880876e94699ce010554f483ccf0009997955ca 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
6e20809727261599e8527c456eb078c0e89139a1 
44a072f0de0d57c95c2212bbce02888832b7b74f 
e763268781d341fef05d461f3057e6ced5e033f2
 99701 fail 5880876e94699ce010554f483ccf0009997955ca 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
6e20809727261599e8527c456eb078c0e89139a1 
44a072f0de0d57c95c2212bbce02888832b7b74f 
e763268781d341fef05d461f3057e6ced5e033f2
 99714 fail 5880876e94699ce010554f483ccf0009997955ca 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
6e20809727261599e8527c456eb078c0e89139a1 
44a072f0de0d57c95c2212bbce02888832b7b74f 
e763268781d341fef05d461f3057e6ced5e033f2
 99741 fail 5880876e94699ce010554f483ccf0009997955ca 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
6e20809727261599e8527c456eb078c0e89139a1 
44a072f0de0d57c95c2212bbce02888832b7b74f 
e763268781d341fef05d461f3057e6ced5e033f2
 99804 pass 691c507ec01fa0cab2a9cfb5bd4398ddd5480a8a 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
6e20809727261599e8527c456eb078c0e89139a1 
44a072f0de0d57c95c2212bbce02888832b7b74f 
7da483b0236d8974cc97f81780dcf8e559a63175
 99809 pass 7f3724b8951735ef1d5ae4f2846b8af98a665d73 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
6e20809727261599e8527c456eb078c0e89139a1 
44a072f0de0d57c95c2212bbce02888832b7b74f 
7da483b0236d8974cc97f81780dcf8e559a63175
 99779 pass 888172862fa78505c4e4674c205a06586443d83f 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
df553c056104e3dd8a2bd2e72539a57c4c085bae 
44a072f0de0d57c95c2212bbce02888832b7b74f 
c2a17869d5dcd845d646bf4db122cad73596a2be
 99810 fail c5ad33184354260be6d05de57e46a5498692f6d6 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
6e20809727261599e8527c456eb078c0e89139a1 
44a072f0de0d57c95c2212bbce02888832b7b74f 
7da483b0236d8974cc97f81780dcf8e559a63175
 99787 fail 5880876e94699ce010554f483ccf0009997955ca 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
6e20809727261599e8527c456eb078c0e89139a1 
44a072f0de0d57c95c2212bbce02888832b7b74f 
e763268781d341fef05d461f3057e6ced5e033f2
 99791 pass ec3e73223a8e5a5ec66ca6a2b12d37ddaf2530dd 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
6e20809727261599e8527c456eb078c0e89139a1 
44a072f0de0d57c95c2212bbce02888832b7b74f 
7da483b0236d8974cc97f81780dcf8e559a63175
 99812 pass c5bcec6cbcbf520f088dc7939934bbf10c20c5a5 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
6e20809727261599e8527c456eb078c0e89139a1 
44a072f0de0d57c95c2212bbce02888832b7b74f 
7da483b0236d8974cc97f81780dcf8e559a63175
 99793 fail ea0b24134918a838f1ff94ac4707d3dcc637630c 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
6e20809727261599e8527c456eb078c0e89139a1 
44a072f0de0d57c95c2212bbce02888832b7b74f 
7da483b0236d8974cc97f81780dcf8e559a63175
 99814 fail c5ad33184354260be6d05de57e46a5498692f6d6 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
6e20809727261599e8527c456eb078c0e89139a1 
44a072f0de0d57c95c2212bbce02888832b7b74f 
7da483b0236d8974cc97f81780dcf8e559a63175
 99796 fail fda740f68c700b1da33a512ec8005d86f275265a 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
6e20809727261599e8527c456eb078c0e89139a1 
44a072f0de0d57c95c2212bbce02888832b7b74f 
7da483b0236d8974cc97f81780dcf8e559a63175
 99800 fail cc6fd729b8a04fbb4b88e45209c1241dd89a3fbe 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
6e20809727261599e8527c456eb078c0e89139a1 
44a072f0de0d57c95c2212bbce02888832b7b74f 
7da483b0236d8974cc97f81780dcf8e559a63175
 99816 pass c5bcec6cbcbf520f088dc7939934bbf10c20c5a5 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
6e20809727261599e8527c456eb078c0e89139a1 
44a072f0de0d57c95c2212bbce02888832b7b74f 
7da483b0236d8974cc97f81780dcf8e559a63175
 99802 pass 97e2a92930008f6087b6d59f761277f0839b30be 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
6e20809727261599e8527c456eb078c0e89139a1 
44a072f0de0d57c95c2212bbce02888832b7b74f 
7da483b0236d8974cc97f81780dcf8e559a63175
 99818 fail c5ad33184354260be6d05de57e46a5498692f6d6 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
6e20809727261599e8527c456eb078c0e89139a1 
44a072f0de0d57c95c2212bbce02888832b7b74f 
7da483b0236d8974cc97f81780dcf8e559a63175
 99820 pass c5bcec6cbcbf520f088dc7939934bbf10c20c5a5 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
6e20809727261599e8527c456eb078c0e89139a1 
44a072f0de0d57c95c2212bbce02888832b7b74f 
7da483b0236d8974cc97f81780dcf8e559a63175
 99823 fail c5ad33184354260be6d05de57e46a5498692f6d6 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
6e20809727261599e8527c456eb078c0e89139a1 
44a072f0de0d57c95c2212bbce02888832b7b74f 
7da483b0236d8974cc97f81780dcf8e559a63175
Searching for interesting versions
 Result found: flight 95408 (pass), for basis pass
 Result found: flight 97692 (fail), for basis failure
 Repro found: flight 99779 (pass), for basis pass
 Repro found: flight 99787 (fail), for basis failure
 0 revisions at c5bcec6cbcbf520f088dc7939934bbf10c20c5a5 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
6e20809727261599e8527c456eb078c0e89139a1 
44a072f0de0d57c95c2212bbce02888832b7b74f 
7da483b0236d8974cc97f81780dcf8e559a63175
No revisions left to test, checking graph state.
 Result found: flight 99812 (pass), for last pass
 Result found: flight 99814 (fail), for first failure
 Repro found: flight 99816 (pass), for last pass
 Repro found: flight 99818 (fail), for first failure
 Repro found: flight 99820 (pass), for last pass
 Repro found: flight 99823 (fail), for first failure

*** Found and reproduced problem changeset ***

  Bug is in tree:  linux 
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
  Bug introduced:  c5ad33184354260be6d05de57e46a5498692f6d6
  Bug not present: c5bcec6cbcbf520f088dc7939934bbf10c20c5a5
  Last fail repro: http://logs.test-lab.xenproject.org/osstest/logs/99823/


  commit c5ad33184354260be6d05de57e46a5498692f6d6
  Author: Lukasz Odzioba <lukasz.odzioba@xxxxxxxxx>
  Date:   Fri Jun 24 14:50:01 2016 -0700
  
      mm/swap.c: flush lru pvecs on compound page arrival
      
      [ Upstream commit 8f182270dfec432e93fae14f9208a6b9af01009f ]
      
      Currently we can have compound pages held on per cpu pagevecs, which
      leads to a lot of memory unavailable for reclaim when needed.  In the
      systems with hundreads of processors it can be GBs of memory.
      
      On of the way of reproducing the problem is to not call munmap
      explicitly on all mapped regions (i.e.  after receiving SIGTERM).  After
      that some pages (with THP enabled also huge pages) may end up on
      lru_add_pvec, example below.
      
        void main() {
        #pragma omp parallel
        {
        size_t size = 55 * 1000 * 1000; // smaller than  MEM/CPUS
        void *p = mmap(NULL, size, PROT_READ | PROT_WRITE,
                MAP_PRIVATE | MAP_ANONYMOUS , -1, 0);
        if (p != MAP_FAILED)
                memset(p, 0, size);
        //munmap(p, size); // uncomment to make the problem go away
        }
        }
      
      When we run it with THP enabled it will leave significant amount of
      memory on lru_add_pvec.  This memory will be not reclaimed if we hit
      OOM, so when we run above program in a loop:
      
        for i in `seq 100`; do ./a.out; done
      
      many processes (95% in my case) will be killed by OOM.
      
      The primary point of the LRU add cache is to save the zone lru_lock
      contention with a hope that more pages will belong to the same zone and
      so their addition can be batched.  The huge page is already a form of
      batched addition (it will add 512 worth of memory in one go) so skipping
      the batching seems like a safer option when compared to a potential
      excess in the caching which can be quite large and much harder to fix
      because lru_add_drain_all is way to expensive and it is not really clear
      what would be a good moment to call it.
      
      Similarly we can reproduce the problem on lru_deactivate_pvec by adding:
      madvise(p, size, MADV_FREE); after memset.
      
      This patch flushes lru pvecs on compound page arrival making the problem
      less severe - after applying it kill rate of above example drops to 0%,
      due to reducing maximum amount of memory held on pvec from 28MB (with
      THP) to 56kB per CPU.
      
      Suggested-by: Michal Hocko <mhocko@xxxxxxxx>
      Link: 
http://lkml.kernel.org/r/1466180198-18854-1-git-send-email-lukasz.odzioba@xxxxxxxxx
      Signed-off-by: Lukasz Odzioba <lukasz.odzioba@xxxxxxxxx>
      Acked-by: Michal Hocko <mhocko@xxxxxxxx>
      Cc: Kirill Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
      Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx>
      Cc: Vladimir Davydov <vdavydov@xxxxxxxxxxxxx>
      Cc: Ming Li <mingli199x@xxxxxx>
      Cc: Minchan Kim <minchan@xxxxxxxxxx>
      Cc: <stable@xxxxxxxxxxxxxxx>
      Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
      Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
      Signed-off-by: Sasha Levin <sasha.levin@xxxxxxxxxx>

dot: graph is too large for cairo-renderer bitmaps. Scaling by 0.928138 to fit
pnmtopng: 79 colors found
Revision graph left in 
/home/logs/results/bisect/linux-4.1/test-amd64-i386-xl-raw.debian-di-install.{dot,ps,png,html,svg}.
----------------------------------------
99823: tolerable ALL FAIL

flight 99823 linux-4.1 real-bisect [real]
http://logs.test-lab.xenproject.org/osstest/logs/99823/

Failures :-/ but no regressions.

Tests which did not succeed,
including tests which could not be run:
 test-amd64-i386-xl-raw        9 debian-di-install       fail baseline untested


jobs:
 test-amd64-i386-xl-raw                                       fail    


------------------------------------------------------------
sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
    http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
    http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
    http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
    http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.