[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] weird memory access problem running on dom0



well, no balloon, command line has dom0_mem=

no any useful dmesg, xm dmesg

kernel haven't config DEBUG

and maybe if we open DEBUG, this problem can't be reproduced. 

has any ideas about pte_flags ?

some info about mtrr

reg00: base=0x0ffc00000 ( 4092MB), size=    4MB, count=1: write-protect
reg01: base=0x080000000 ( 2048MB), size= 1024MB, count=1: uncachable
reg02: base=0x0c0000000 ( 3072MB), size=  512MB, count=1: uncachable
reg03: base=0x0e0000000 ( 3584MB), size=  256MB, count=1: uncachable
reg04: base=0x0f0000000 ( 3840MB), size=  128MB, count=1: uncachable
reg05: base=0x0f8000000 ( 3968MB), size=   64MB, count=1: uncachable
reg06: base=0x0fc000000 ( 4032MB), size=   32MB, count=1: uncachable
reg07: base=0x0fec00000 ( 4076MB), size=    4MB, count=1: uncachable

regards,
wanjia


2013/10/23 Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
On 22/10/13 16:41, Alice Wan wrote:
hi all,

    recently we met an unbelievable weird memory problem running on dom0, test case is very simple, code is as following:

#define BUF_SIZE        4096
#define IO_PATTERN      0xab

int main(int argc, char *argv[])
{
        void *buf;
        char cmp_buf[BUF_SIZE];
        int err = 0;

        buf = malloc(BUF_SIZE);
        if (!buf) {
                fprintf(stderr, "error %s during %s\n",
                        strerror(-err),
                        "malloc");
                return 1;
        }
        memset(buf, IO_PATTERN, BUF_SIZE);
        memset(cmp_buf, IO_PATTERN, BUF_SIZE);

        if (memcmp(buf, cmp_buf, BUF_SIZE)) {
                unsigned long long *ubuf = (unsigned long long *)buf;
                int i;

                for (i = 0; i < BUF_SIZE / sizeof(unsigned long long); i++)
                        printf("%d: 0x%llx\n", i, ubuf[i]);
                        
                return 2;
        }

        return 0;
}

    memcmp failure occurs while the case is running on 500 machines with Xen, each for billion times. 
    error log has two results, one is 0x0, it shows buf is zero,  the other one is 0xabababa...ababa,  it shows cmp_buf isn't 0xabab..ab

    both of error log shows either buf or cmp_buf is all incorrect.

    However, this case pass when we run on native linux kernel(2.6.32) without Xen.

    we suspect maybe it's relevent to pvops behavior of dom0.

    we're not sure whether it's a bug fixed in newer version of kernel and xen, so we have tried diffrent version of Xen and dom0 including Xen4.0.1+kernel2.6.32/3.0/3.11 and Xen4.2 + kernel2.6.32, unfortunately, all of these failed.

    we found PAT behaves differenly between linux and xen, so we try to add nopat into command line of kernel 3.11, and it also failed.

    now we're blocked, realy need some help.

    any advice will be appreciated

    thanks in advance



regards,
wanjia

Picking randomly at some ideas:

Do you have ballooning enabled?

At the time of a failure, is there anything interesting in the Linux or Xen dmesg?

Are you running a debug version of Linux or Xen?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.