This problem surfaced in our API gateway service, which we built using Spring Gateway. The service handles exceptions by implementing the ErrorWebExceptionHandler
interface, returning processed exception details to the client and adding some header information. Everything was functioning well until one day, after a code merge, issues began to sporadically appear. Here's what we observed:
Whenever a request processing exception arose, it would be addressed by the GatewayErrorWebExceptionHandler
. However, oddly enough, clients weren't receiving any responses. Our server-side logs looked fine, but it appeared as if the connection was somehow stalled, preventing the client-side from getting any feedback. Further analysis, specifically packet captures, confirmed our suspicion: no data packets were being transmitted. A deeper dive into the GatewayErrorWebExceptionHandler
revealed an inconsistency with the traceID. In certain cases, this traceID might be absent. Notably, with the gateway rooted in WebFlux and Netty as its foundational network library, it's important to highlight that Netty performs null checks when headers are added. Encountering a null value triggers an exception, which seemed counterintuitive to us.
public class GatewayErrorWebExceptionHandler implements ErrorWebExceptionHandler {
public Mono<Void> handle(ServerWebExchange exchange, Throwable ex) {
...
String traceID = exchange.getRequest().getHeaders().getFirst("traceID");
exchange.getResponse().getHeaders().add("traceID", traceID);
...
}
}
Using the arthas tool, the call stack was traced:
[arthas@3423878]$ stack *.GatewayErrorWebExceptionHandler *
It was observed that the underlying servlet had changed to Tomcat instead of the expected Netty:
[arthas@3423878]$ stack *.GatewayErrorWebExceptionHandler *
Press Q or Ctrl+C to abort.
Affect(class count: 2 , method count: 11) cost in 212 ms, listenerId: 1
ts=2023-07-29 19:01:14;thread_name=http-nio-8080-exec-1;id=20;is_daemon=true;priority=5;TCCL=org.springframework.boot.web.embedded.tomcat.TomcatEmbeddedWebappClassLoader@6cc8c13c
@com.xxx.server.gateway.filter.GatewayErrorWebExceptionHandler.handle()
at org.springframework.web.server.handler.ExceptionHandlingWebHandler.lambda$handle$0(ExceptionHandlingWebHandler.java:77)
...
at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.drain(MonoIgnoreThen.java:172)
at reactor.core.publisher.MonoIgnoreThen.subscribe(MonoIgnoreThen.java:56)
at reactor.core.publisher.Mono.subscribe(Mono.java:4210)
at org.springframework.http.server.reactive.ServletHttpHandlerAdapter.service(ServletHttpHandlerAdapter.java:192)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:202)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:541)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:139)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:92)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:74)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:343)
at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:373)
at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:65)
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:868)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1594)
at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:829)
Upon further investigation, it was found that the tomcat-embed-core
package had been added to the project's dependencies, which changed the underlying Servlet container of Spring Gateway:
<dependency>
<groupId>org.apache.tomcat.embed</groupId>
<artifactId>tomcat-embed-core</artifactId>
</dependency>
Subsequently, I speculated whether Tomcat might have swallowed the null pointer exception during its processing. Time to delve into the Tomcat source code: Http11Processor.java
/**
* When committing the response, we have to validate the set of headers, as
* well as setup the response filters.
*/
@Override
protected final void prepareResponse() throws IOException {
...
// Build the response header
try {
outputBuffer.sendStatus();
int size = headers.size();
for (int i = 0; i < size; i++) {
outputBuffer.sendHeader(headers.getName(i), headers.getValue(i));
}
outputBuffer.endHeaders();
} catch (Throwable t) {
...
outputBuffer.resetHeaderBuffer();
throw t;
}
outputBuffer.commit();
}
As expected, an exception could potentially arise at headers.getValue(i)
. Moreover, when this exception is thrown outward, only I/O exceptions are handled externally:
public final void action(ActionCode actionCode, Object param) {
switch (actionCode) {
// 'Normal' servlet support
case COMMIT: {
if (!response.isCommitted()) {
try {
// Validate and write response headers
prepareResponse();
} catch (IOException e) {
handleIOException(e);
}
}
break;
}
...
}
private void handleIOException (IOException ioe) {
if (ioe instanceof CloseNowException) {
// Close the channel but keep the connection open
setErrorState(ErrorState.CLOSE_NOW, ioe);
} else {
// Close the connection and all channels within that connection
setErrorState(ErrorState.CLOSE_CONNECTION_NOW, ioe);
}
}
public static void handleThrowable(Throwable t) {
if (t instanceof ThreadDeath) {
throw (ThreadDeath) t;
}
...
// All other instances of Throwable will be silently swallowed
}
No logs are printed or exceptions thrown outward. Only when an IOException
occurs is the connection closed. In this case, it's a null pointer exception, which explains why the client remains unresponsive. When the OutputBuffer
flushes the cache, it merely sets doFlush
to false, effectively swallowing the exception.
@Override
public void flush() throws IOException {
doFlush(true);
}
Furthermore, Spring Reactor's implementation class for Tomcat's AbstractListenerWriteFlushProcessor
doesn't handle the event when it receives a send signal. Neither of the two libraries does anything or prints any logs, leading the client to continuously wait for the request response. This part of the code can be found in: AbstractListenerWriteFlushProcessor.java
try {
processor.flush();
}
catch (Throwable ex) {
processor.flushingFailed(ex);
return;
}
public abstract class AbstractListenerWriteFlushProcessor<T> implements Processor<Publisher<? extends T>, Void> {
/**
* Invoked when an error happens while flushing. Sub-classes may choose
* to ignore this if they know the underlying API will provide an error
* notification in a container thread.
* <p>Defaults to no-op.
*/
protected void flushingFailed(Throwable t) {
}
}
Compared to WebMvc, when using Tomcat as the servlet container, the strategy for handling headers is: headers with null values are directly filtered out during the filter process: Response.java
private void addHeader(String name, String value, Charset charset) {
if (name == null || name.length() == 0 || value == null) {
return;
}
...
}
However, this filtering operation only exists in the Spring MVC processing chain. In this case, the gateway project relies on WebFlux
. So, when the traceID
header with a null value is ignored, Tomcat encounters a null pointer exception while preparing to write buffered data to the socket. This exception is internally swallowed, causing the connection to remain open indefinitely without writing any data.
tomcat-embed-core
dependency from the common module and only introduce the java-servlet-api
, allowing Spring Gateway to revert to using Netty as the underlying server.GatewayErrorWebExceptionHandler
, perform a null check on traceID
to prevent adding empty headers.public class GatewayErrorWebExceptionHandler implements ErrorWebExceptionHandler {
...
public Mono<Void> handle(ServerWebExchange exchange, Throwable ex) {
...
String traceID = exchange.getRequest().getHeaders().getFirst("traceID");
if(traceID != null){
exchange.getResponse().getHeaders().add("traceID", traceID);
}
}
}
This case highlights the importance of understanding the technical stack and being clear about the project's dependencies. Only by doing so can we quickly locate and solve problems when they arise. Additionally, testing confirmed that undertow
doesn't exhibit this issue. This is because the version of Spring in question has overloaded flushingFailed
for undertow
's AbstractListenerWriteFlushProcessor
implementation class. Only Tomcat handles internal null pointers in this manner.
However, two years ago, someone had already fixed this issue. But the version in our project was outdated.
This actually reflects a problem: