hi all,
recently we met an unbelievable weird memory problem
running on dom0, test case is very simple, code is as
following:
#define BUF_SIZE 4096
#define IO_PATTERN 0xab
int main(int argc, char *argv[])
{
void *buf;
char cmp_buf[BUF_SIZE];
int err = 0;
buf = malloc(BUF_SIZE);
if (!buf) {
fprintf(stderr, "error %s during %s\n",
strerror(-err),
"malloc");
return 1;
}
memset(buf, IO_PATTERN, BUF_SIZE);
memset(cmp_buf, IO_PATTERN, BUF_SIZE);
if (memcmp(buf, cmp_buf, BUF_SIZE)) {
unsigned long long *ubuf = (unsigned long
long *)buf;
int i;
for (i = 0; i < BUF_SIZE /
sizeof(unsigned long long); i++)
printf("%d: 0x%llx\n", i, ubuf[i]);
return 2;
}
return 0;
}
memcmp failure occurs while the case is running on 500
machines with Xen, each for billion times.
error log has two results, one is 0x0, it shows buf is
zero, the other one is 0xabababa...ababa, it shows cmp_buf
isn't 0xabab..ab
both of error log shows either buf or cmp_buf is all
incorrect.
However, this case pass when we run on native linux
kernel(2.6.32) without Xen.
we suspect maybe it's relevent to pvops behavior of dom0.
we're not sure whether it's a bug fixed in newer
version of kernel and xen, so we have tried diffrent version
of Xen and dom0 including Xen4.0.1+kernel2.6.32/3.0/3.11 and
Xen4.2 + kernel2.6.32, unfortunately, all of these failed.
we found PAT behaves differenly between linux and xen,
so we try to add nopat into command line of kernel 3.11, and
it also failed.
now we're blocked, realy need some help.
any advice will be appreciated
thanks in advance
regards,
wanjia