Re: [Xen-devel] Lots of connections led oxenstored stuck

Hi Joe,

I read your patch and understand the basic idea behind it. It can mitigate the 
situation when bad things happen, but it doesn't solve the double limits 
imposed by both select and NR_OPEN. E.g.

  * When the number of fds is beyond NR_OPEN, is there any strict order for 
which fds being chosen to close? If no, then the special fds might get closed 
as well, in which case the xenstored might stuck still.

  * When select is given 1024 fds (which can still happen even with your 
patch), the behavior is _undefined_. IIRC, some bits in the bitmap might be 
reused (wrongly), so that the output (fds reported as ready for read/write) 
might be wrong for some fds, so that the following read/write might be blocked 
on them.

  * Also, we generally prefer to handle special fds first, as the eventchn fd 
represents all the domain connections.

I previously mentioned I've got patches for these. I'm currently testing with 
1,000 Windows 7 VMs on a single host (each consume at least 2 persistent 
xenstored socket connections). Besides the two limits just mentioned, I've also 
fixed several bugs and bottlenecks along the way.

I'm going to upstream these patches very soon, just a bit clean up and 
documentation are needed. However if you (or anyone) need them urgently or 
eager to have a test, please send me an private email separately. I'm happy to 
send you the patch in its current form --- a single non-disaggregated patch for 
multiple issues, not very well commented, but should just work.


On 26/08/2014 09:15, Joe Jin wrote:
This bug caused by oxenstored handle incoming requests, when lots of
connections came at same time it has not chance to delete closed sockets.

I created a patch for this, please review:


[PATCH] oxenstored: check and delete closed socket before accept incoming 

When more than SYSCONF.OPEN_MAX connections came at the same time and
connecitons been closed later, oxenstored has not change to delete closed
socket, this led oxenstored stuck and unable to handle any incoming
requests any more. This patch let oxenstored check and process closed
socket before handle incoming connections to avoid the stuck.

  tools/ocaml/xenstored/xenstored.ml |    4 ++--
  1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/ocaml/xenstored/xenstored.ml 
index 1c02f2f..b142952 100644
--- a/tools/ocaml/xenstored/xenstored.ml
+++ b/tools/ocaml/xenstored/xenstored.ml
@@ -373,10 +373,10 @@ let _ =
                        [], [], [] in
                let sfds, cfds =
                        List.partition (fun fd -> List.mem fd spec_fds) rset in
-               if List.length sfds > 0 then
-                       process_special_fds sfds;
                if List.length cfds > 0 || List.length wset > 0 then
                        process_connection_fds store cons domains cfds wset;
+               if List.length sfds > 0 then
+                       process_special_fds sfds;
                process_domains store cons domains

