[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] System hangs when NVMe is under load
Hello, I would like to be excused beforehand if i am sending something the the wrong folks. We have a strange situation going on here with a couple of our servers. We've been experiencing issues with the combination of Debian+XEN+Samsung NVMe. Problem: It all began with
https://serverfault.com/questions/1006366/samsung-nvme-disappears-when-server-on-average-to-high-load The situation is close to the one described above with some differences. Now It can be reproduced.
We've gathered some more information - It happens only when XEN
is loaded. The command that breaks everything is the following and it
breaks it fast. In the following situation it just needs approx
20 secs to hang the whole system. I am attaching the Call trace
which occurs during the hang up. date; echo; fio --filename=/dev/nvme0n1 --direct=1 --rw=randread --bs=4k --ioengine=libaio --iodepth=256 --runtime=345600 --numjobs=10 --time_based --group_reporting --name=iops-test-job --readonly --output=fio_log.randread4k.log; date I have currently ran
the test on one of the nodes where I have booted without
xen. Have in mind that all servers are provisioned with
Ansible and are the same. What is tried so far: Setting kernel option nvme_core.default_ps_max_latency_us to 5500/200 as read https://wiki.archlinux.org/index.php/Solid_state_drive/NVMe#Samsung_drive_errors_on_Linux_4.10 and https://askubuntu.com/questions/905710/ext4-fs-error-after-ubuntu-17-04-upgrade Setting kernel option nvme_core.force_apst=1 thus trying to
force APST since (nvme id-ctrl /dev/nvme0n1 | grep apst
I have kind of "overheated" on the subject right now and could be possibly missing something important out. Let me know if you need any more information. NB: We began testing this cluster because it was showing really
slow disk related operations (on the nvme). For comparison - the
other cluster (mentioned in serverfault), never showed any
performance issues. Best Regards, -- Stanislav Ivanov System Administrator ––––––––––––––––––––––––– Abilix Soft LTD. Варна, ул."Студентска" №1А, Офис 24Б Support: +359 700 911 44 https://abscloud.eu Attachment:
CallTraceXenNvmeProblem.txt
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |