[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Lots of connections led oxenstored stuck

Hi Joe,

On 27/08/2014 02:59, Joe Jin wrote:
No this patch does not intend to fix NR_OPEN(this is from system side, need to
unlimited before start the daemon) and select limitations.
We met the bug not because of the limitations, the original issue is when more
connect(i.e 2000) request coming at the same time, accept() failed because of
open fds > SYSCONF.OPEN_MAX, this is as expected. The thing is when client 
oxenstored should close the sockets as well, but during our test, it did not, 
oxenstored keeping reported accept failed, any new request hang as well.
So my changes let oxenstored check and delete closed fds, then oxenstored able 
ack new requests.
During our testing, when issue happened also xenstored log enabled, 
full of below error and xenstored.log keeping be rotated:
[20140827T15:48:25.399Z|error|xenstored] caught exception Unix.Unix_error(15, "accept", 

Thanks for the explanation, I now understand the intention of your patch 
better. Still, I think the change can only mitigate/delay the issue to certain 
extend. If I'm wrong on this, please help me to understand it better.

Here is an example walking through: Suppose your xenstored process have the 
default max open fd setting of 1024 and it currently has 1024 fds open already. 
Now 2000 more connections are coming:

  * Without your change, all the 2000 connection failed with EMFILE one by one 
(by Unix.accept in process_special_fds), you'll see 2000 lines of errors of 
Unix_error(15) in xenstored.log. But since there are always finite number of 
connections coming, eventually xenstored should recover and continue the normal 
logic after all these go through. And the recovery can happen even before that, 
if there are small time gaps among all your incoming connections (which 
basically give the process_connection_fd a chance to run).

  * With your change, xenstored will handle existing fds first. So if there is 
any of the fds being closed already, it basically gives its slot away to the 
coming connections, so the issue won't happen at the very beginning. But IIUIC, 
this all depends on how many persistent connections currently exist. Suppose 
among the 1024 fds, 800 of them are from relatively persistent connections 
(e.g. qemu connections for long running VMs) , then only the first small batch 
in your 2000 incoming connections will succeed and the rest will face exactly 
the same situation.

So, IMHO, the change might help a great deal in a artificial testcase where a 
lot of temporary connections come and go, but less so in real life system where 
a large number of the connections are persistent.

Please let me know if my understanding is correct. I have no objection to the 
change itself, mitigation is positive change too. Hence,

Acked-by: Zheng Li <dev@xxxxxxxx>

A better way is probably wrapping try-with logic on each of process_special_fds 
and process_connections_fds and process_domains instead of the single top-level 
try-with, so that the failure on one aspect will not block the others. We can 
leave this as future work.

   * When select is given 1024 fds (which can still happen even with your 
patch), the behavior is _undefined_. IIRC, some bits in the bitmap might be 
reused (wrongly), so that the output (fds reported as ready for read/write) 
might be wrong for some fds, so that the following read/write might be blocked 
on them.

The max open fd setting  of a process normally defaults to 1024 for a reason. 
If, as you said, your tests raise the ulimit setting beforehand but still use 
select internally, then there are potential issues here.

   * Also, we generally prefer to handle special fds first, as the eventchn fd 
represents all the domain connections.

Remove closed firstly may reduce system resource usage?

My initial guess was the original authors of oxenstored might be handling 
special fds first for performance consideration. But after more thinking about 
it, I don't think that could make a great difference. We can also separate the 
eventchn fd from the other two special sockets and give it higher priority if 
it turns out to be a problem afterwards.

I previously mentioned I've got patches for these. I'm currently testing with 
1,000 Windows 7 VMs on a single host (each consume at least 2 persistent 
xenstored socket connections). Besides the two limits just mentioned, I've also 
fixed several bugs and bottlenecks along the way.

I'm going to upstream these patches very soon, just a bit clean up and 
documentation are needed. However if you (or anyone) need them urgently or 
eager to have a test, please send me an private email separately. I'm happy to 
send you the patch in its current form --- a single non-disaggregated patch for 
multiple issues, not very well commented, but should just work.

Can you please send a copy of your patch? I'd like to test when connections 
more than @nfds of
poll, what happened.

Some logic in the patch changes the xenstored process's fd limit to the NR_OPEN 
of system max (usually 1024x1024), though you obviously won't be able to reach 
that exact number as other processes will have open fds too. But I guess that's 
a big enough number for real life cases.

I'll send out the patch to those on the threads (except xen-devel) late today.


On 26/08/2014 09:15, Joe Jin wrote:
This bug caused by oxenstored handle incoming requests, when lots of
connections came at same time it has not chance to delete closed sockets.

I created a patch for this, please review:


[PATCH] oxenstored: check and delete closed socket before accept incoming 

When more than SYSCONF.OPEN_MAX connections came at the same time and
connecitons been closed later, oxenstored has not change to delete closed
socket, this led oxenstored stuck and unable to handle any incoming
requests any more. This patch let oxenstored check and process closed
socket before handle incoming connections to avoid the stuck.

Cc: David Scott <dave.scott@xxxxxxxxxxxxx>
Cc: Zheng Li <dev@xxxxxxxx>
Cc: Luis R. Rodriguez <mcgrof@xxxxxxxx>
Cc: Ian Jackson <Ian.Jackson@xxxxxxxxxx>
Signed-off-by: Joe Jin <joe.jin@xxxxxxxxxx>
   tools/ocaml/xenstored/xenstored.ml |    4 ++--
   1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/ocaml/xenstored/xenstored.ml 
index 1c02f2f..b142952 100644
--- a/tools/ocaml/xenstored/xenstored.ml
+++ b/tools/ocaml/xenstored/xenstored.ml
@@ -373,10 +373,10 @@ let _ =
               [], [], [] in
           let sfds, cfds =
               List.partition (fun fd -> List.mem fd spec_fds) rset in
-        if List.length sfds > 0 then
-            process_special_fds sfds;
           if List.length cfds > 0 || List.length wset > 0 then
               process_connection_fds store cons domains cfds wset;
+        if List.length sfds > 0 then
+            process_special_fds sfds;
           process_domains store cons domains

Xen-devel mailing list

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.