Эх сурвалжийг харах

Delay closing until the next loop tick when gracefully shutting down (#38)

Motivation:

The server cancellation test fails every now and then because the client
sees the stream close unexpectedly. This happens when the server is
gracefully shutting down. Writing the final frame with end stream set
triggers the HTTP/2 handler to emit a stream closed event which the
connection management handler picks up and concludes that the connection
is no longer needed (the server is shutting down and now no streams are
open) so closes it.

However because this close happens during the write, it ends up being
ordered before the flush. This means that the frame isn't actually
written to the network before the connection is closed.

Modifications:

- Delay the close until the next event loop tick when it happens during
graceful shutdown

Result:

Fewer bugs
George Barnett 1 жил өмнө
parent
commit
4ed582d34a

+ 17 - 3
Sources/GRPCNIOTransportCore/Client/Connection/ClientConnectionHandler.swift

@@ -397,9 +397,23 @@ extension ClientConnectionHandler {
       }
 
     case .close:
-      // Connection was closing but waiting for all streams to close. They must all be closed
-      // now so close the connection.
-      context.close(promise: nil)
+      // Defer closing until the next tick of the event loop.
+      //
+      // This point is reached because the server is shutting down gracefully and the stream count
+      // has dropped to zero, meaning the connection is no longer required and can be closed.
+      // However, the stream would've been closed by writing and flushing a frame with end stream
+      // set. These are two distinct events in the channel pipeline. The HTTP/2 handler updates the
+      // state machine when a frame is written, which in this case results in the stream closed
+      // event which we're reacting to here.
+      //
+      // Importantly the HTTP/2 handler hasn't yet seen the flush event, so the bytes of the frame
+      // with end-stream set - and potentially some other frames - are sitting in a buffer in the
+      // HTTP/2 handler. If we close on this event loop tick then those frames will be dropped.
+      // Delaying the close by a loop tick will allow the flush to happen before the close.
+      let loopBound = NIOLoopBound(context, eventLoop: context.eventLoop)
+      context.eventLoop.execute {
+        loopBound.value.close(mode: .all, promise: nil)
+      }
 
     case .none:
       ()

+ 17 - 1
Sources/GRPCNIOTransportCore/Server/Connection/ServerConnectionManagementHandler.swift

@@ -521,7 +521,23 @@ extension ServerConnectionManagementHandler {
     case .startIdleTimer:
       self.maxIdleTimerHandler?.start()
     case .close:
-      context.close(mode: .all, promise: nil)
+      // Defer closing until the next tick of the event loop.
+      //
+      // This point is reached because the server is shutting down gracefully and the stream count
+      // has dropped to zero, meaning the connection is no longer required and can be closed.
+      // However, the stream would've been closed by writing and flushing a frame with end stream
+      // set. These are two distinct events in the channel pipeline. The HTTP/2 handler updates the
+      // state machine when a frame is written, which in this case results in the stream closed
+      // event which we're reacting to here.
+      //
+      // Importantly the HTTP/2 handler hasn't yet seen the flush event, so the bytes of the frame
+      // with end-stream set - and potentially some other frames - are sitting in a buffer in the
+      // HTTP/2 handler. If we close on this event loop tick then those frames will be dropped.
+      // Delaying the close by a loop tick will allow the flush to happen before the close.
+      let loopBound = NIOLoopBound(context, eventLoop: context.eventLoop)
+      context.eventLoop.execute {
+        loopBound.value.close(mode: .all, promise: nil)
+      }
 
     case .none:
       ()