A Netty ByteBuf Memory Leak Story and the Lessons Learned

By: Asaf Mesika

Just a while ago, I was chasing a memory leak we had at Logz.io while I was refactoring our log receiver. We were using Netty, and after a major refactoring, we noticed that there was a gradual decrease of free memory to the machine.

Our first action was to try to run garbage collection to see if this was an on-heap or off-heap (utilizing ByteBuf) memory issue. We quickly found that it was an off-heap issue and started to read through the code to see where we forgot to call the release() method on the ByteBuf type. We could not find anything obvious — but that is usually the case when it comes to memory leaks.

Then, I noticed that there was a message that appeared only once when we started the application:

ERROR i.n.u.ResourceLeakDetector: LEAK: ByteBuf.release() was not called before it's garbage-collected. Enable advanced leak reporting to find out where the leak occurred. To enable advanced leak reporting, specify the JVM option '-Dio.netty.leakDetectionLevel=advanced' or call ResourceLeakDetector.setLevel()

At first, I did not pay much attention to the message because it only appeared once. So, I figured that it was a single ByteBuf that I forgot to release and that I would fix it the following week. After a couple of days, we noticed that the host’s free memory was still decreasing. So, I realized that I needed to understand more about this error.

In the reference counted objects section in Netty’s documentation, there was a detailed section entitled “Troubleshooting buffer leaks.” When I read that part of the documentation, I did not understand it completely until I read the following:

Netty adds a hook to the ByteBuf code such that when a GC occurs, it checks whether this buffer was released(), if it doesn’t it prints the error message above. ONE important detail here is that it only does this check for a fraction of the byte buffers (sampling), thus when you see this error message only once, it probably means it happens a lot more than once.

Once I understood that I added the JVM option switch

-Dio.netty.leakDetectionLevel=advanced

as recommended. However, when the application started, I then saw two error messages instead of one as a side effect. There was one more important detail in the log message: the location in the code where I had created the specific ByteBuf that had not been released. This helped me to understand the location where I was causing the leak. The first takeaway: Do not ignore memory leak messages — immediately switch the leak detection level to advanced mode in the JVM command line argument to detect the origin of the leak.

The second takeaway: When hunting down ByteBuf memory leaks, run a “find usage” on the class and trace your code upwards through the calling hierarchy until you get to the actual code that created it — even if it seems obvious and specifically if it is third-party code that is causing the problem.

More on the subject:

Logz.io Open 360 Observability Platform Demo

Observability with Zero Code Instrumentation? Meet eBPF

How an APM Alternative Helps You Do Observability Right

The third takeaway was a side effect of changing the leak-detection level to advanced mode. When I ran my performance load test, I noticed that the receiver barely made it through 25 MB/sec, but the rate when using the same machine is usually 200 MB/sec. I had placed more code into the build that I had tested, so I was not sure of the cause of the slowdown.

I started commenting out code until I had reached a point where my handler simply did nothing — the handler practically looked like a copy-paste of the Discard Server example from Netty’s documentation. When I removed the

-Dio.netty.leakDetectionLevel=advanced

JVM option, the speed returned to normal. I was amazed! So, just to boil this article down to a single point to remember: The leak detection level’s advanced mode may slow down Netty by a factor of 10.

Have you had any experiences with memory leaks using Netty and had learned some lessons as a result? If so, I’d love to hear your stories in the comments below!

AI-Powered Observability

Proven Results. Users Agree.

See For Yourself

Fast-Track Kubernetes Observability with Logz.io and OpenTelemetry: A quick getting started guide

Reimagining Log Management Tools and Software: The Impact of AI and GenAI

Logz.io Adds PrivateLink Support, Introduces the Parsing Rules Hub, and Significantly Enhances DIY Parsing Capabilities

Completely free for 14 days, no strings attached.

Start Free Trial

Schedule Demo

A Netty ByteBuf Memory Leak Story and the Lessons Learned

AI-Powered Observability

You Might Also Like

Fast-Track Kubernetes Observability with Logz.io and OpenTelemetry: A quick getting started guide

Reimagining Log Management Tools and Software: The Impact of AI and GenAI

Logz.io Adds PrivateLink Support, Introduces the Parsing Rules Hub, and Significantly Enhances DIY Parsing Capabilities

Get started for free