How a 2 GB CLOB crashed our app

Our latest production issue: one morning our app kept crashing. Restart. Crash after some minutes. Out of memory.

It turned out to be a CLOB that was almost 2 GB big and got read when a user triggered a certain action. The 2 GB CLOB ended up in an almost 4 GB big char[] (because char is 16 bit in Java) and this was too much even for 8 GB heap space.

Of course that CLOB was not supposed to be that big!

It took some time to identify the root cause.

WLS “Context propagation” forced to restart our app server

Our app runs on a Weblogic server (WLS) and talks to another WLS-hosted app via SOAP-Webservice.

When the other app was patched to a new version, some WS-requests failed indicating that the webservice in the other wanted to JNDI-lookup the previous version of something.

After we restarted our app everything worked fine. But still you don’t want to restart a client when the server was upgraded. Moreover, a spring-boot client did not have these problems.

It turned out that WLS uses a feature called “Context Propagation” that inserts an additional SOAP-Header into the request as well as the response. This header contains a serialized object. It indicated that our app transmits the version of the other app and apperantly the other app somehow uses that information in the JNDI lookup.

How does our app knows about the version number of the other app? Probably because the other apps sends that info in the response. This explains why it worked after we restarted our app: at first it hadn’t that information at all and when it got it, it was about the new version.

What I still can’t explain is that some request were successful before the restart.

The solution is to disable “context propagation” by using a system parameter: