瀏覽代碼

Ignore errors in idle state. (#1133)

Motivation:

When a connection is in the idle state it is unlikely, but possible, to
see channel errors. This is most likely to happen when a channel has
just gone idle, and will shortly become entirely unavailable, but some
other I/O operation sees an error in the meantime. However, it could
arguably happen before any activity has been seen at all.

Right now we handle this by crashing. That's not great, mostly because
it's entirely possible to see weird reordering of events or delayed
error delivery from other handlers. It would be better to treat these
similarly to the way we treat errors in transientFailure or shutdown: as
fundamentally not adding drastically new information.

After all, consider the two options: either this error is channel fatal,
or it is not. In either case, being idle is immaterial. If the error is
fatal to the channel, we'll see the channel go inactive shortly and be
happy. Alternatively, the channel may be just fine, in which case we can
safely attempt to re-use it later. If the channel is in a weird broken
state we'll likely hit an immediate error on reconnection and move on
with our lives.

The one minor difference I've proposed here is that it's probably worth
us notifying about errors of this kind. While they aren't likely to
cause a bug in grpc, they could be a source of all kinds of weird
issues, and at the very least they probably represent a bug somewhere
else in the code. We'd like to know about them if we can.

Modifications:

- Remove the hard-crash on error in idle, replace it with a log.

Result:

Fewer crashes.

Fixes #1132.
Cory Benfield 4 年之前
父節點
當前提交
897a5bd2a1
共有 2 個文件被更改,包括 20 次插入4 次删除
  1. 6 4
      Sources/GRPC/ConnectionManager.swift
  2. 14 0
      Tests/GRPCTests/ConnectionManagerTests.swift

+ 6 - 4
Sources/GRPC/ConnectionManager.swift

@@ -463,11 +463,13 @@ internal class ConnectionManager {
     self.eventLoop.preconditionInEventLoop()
 
     switch self.state {
-    // These cases are purposefully separated: some crash reporting services provide stack traces
-    // which don't include the precondition failure message (which contain the invalid state we were
-    // in). Keeping the cases separate allows us work out the state from the line number.
+    // Hitting an error in idle is a surprise, but not really something we do anything about. Either the
+    // error is channel fatal, in which case we'll see channelInactive soon (acceptable), or it's not,
+    // and future I/O will either fail fast or work. In either case, all we do is log this and move on.
     case .idle:
-      self.invalidState()
+      self.logger.warning("ignoring unexpected error in idle", metadata: [
+        MetadataKey.error: "\(error)",
+      ])
 
     case .connecting:
       self.invalidState()

+ 14 - 0
Tests/GRPCTests/ConnectionManagerTests.swift

@@ -1011,6 +1011,20 @@ extension ConnectionManagerTests {
       channel.pipeline.fireChannelInactive()
     }
   }
+
+  func testIdleErrorDoesNothing() throws {
+    let manager = ConnectionManager(configuration: self.defaultConfiguration, logger: self.logger)
+
+    // Dropping an error on this manager should be fine.
+    manager.channelError(DoomedChannelError())
+
+    // Shutting down is then safe.
+    try self.waitForStateChange(from: .idle, to: .shutdown) {
+      let shutdown = manager.shutdown()
+      self.loop.run()
+      XCTAssertNoThrow(try shutdown.wait())
+    }
+  }
 }
 
 internal struct Change: Hashable, CustomStringConvertible {