A Netty ByteBuf Memory Leak Story and the Lessons Learned

netty

Just a while ago, I was chasing a memory leak we had at Logz.io while I was refactoring our log receiver. We were using Netty, and after a major refactoring, we noticed that there was a gradual decrease of free memory to the machine.

Our first action was to try to run garbage collection to see if this was an on-heap or off-heap (utilizing ByteBuf) memory issue. We quickly found that it was an off-heap issue and started to read through the code to see where we forgot to call the release() method on the ByteBuf type. We could not find anything obvious — but that is usually the case when it comes to memory leaks.

Then, I noticed that there was a message that appeared only once when we started the application:

ERROR i.n.u.ResourceLeakDetector: LEAK: ByteBuf.release() was not called before it's garbage-collected. Enable advanced leak reporting to find out where the leak occurred. To enable advanced leak reporting, specify the JVM option '-Dio.netty.leakDetectionLevel=advanced' or call ResourceLeakDetector.setLevel()

At first, I did not pay much attention to the message because it only appeared once. So, I figured that it was a single ByteBuf that I forgot to release and that I would fix it the following week. After a couple of days, we noticed that the host’s free memory was still decreasing. So, I realized that I needed to understand more about this error.

In the reference counted objects section in Netty’s documentation, there was a detailed section entitled “Troubleshooting buffer leaks.” When I read that part of the documentation, I did not understand it completely until I read the following:

Netty adds a hook to the ByteBuf code such that when a GC occurs, it checks whether this buffer was released(), if it doesn’t it prints the error message above. ONE important detail here is that it only does this check for a fraction of the byte buffers (sampling), thus when you see this error message only once, it probably means it happens a lot more than once.

Once I understood that I added the JVM option switch

-Dio.netty.leakDetectionLevel=advanced

as recommended. However, when the application started, I then saw two error messages instead of one as a side effect. There was one more important detail in the log message: the location in the code where I had created the specific ByteBuf that had not been released. This helped me to understand the location where I was causing the leak. The first takeaway: Do not ignore memory leak messages — immediately switch the leak detection level to advanced mode in the JVM command line argument to detect the origin of the leak.

The second takeaway: When hunting down ByteBuf memory leaks, run a “find usage” on the class and trace your code upwards through the calling hierarchy until you get to the actual code that created it — even if it seems obvious and specifically if it is third-party code that is causing the problem.

The third takeaway was a side effect of changing the leak-detection level to advanced mode. When I ran my performance load test, I noticed that the receiver barely made it through 25 MB/sec, but the rate when using the same machine is usually 200 MB/sec. I had placed more code into the build that I had tested, so I was not sure of the cause of the slowdown.

I started commenting out code until I had reached a point where my handler simply did nothing — the handler practically looked like a copy-paste of the Discard Server example from Netty’s documentation. When I removed the

-Dio.netty.leakDetectionLevel=advanced

JVM option, the speed returned to normal. I was amazed! So, just to boil this article down to a single point to remember: The leak detection level’s advanced mode may slow down Netty by a factor of 10.

Have you had any experiences with memory leaks using Netty and had learned some lessons as a result? If so, I’d love to hear your stories in the comments below!

Power your DevOps Initiatives with Logz.io's Machine Learning Features!

Artboard Created with Sketch.

3 responses to “A Netty ByteBuf Memory Leak Story and the Lessons Learned”

  1. Dmitry Bundin says:

    Very interesting article. Thanks. The only question that is still bothering me is what is the point of releasing unpooled Heap byte buffers? The operations that do the actual de-allocations are NO-OP anyway (as far as I can find in Netty sources). For Pooled ones sounds reasonable since they can be returned to an object pool.

  2. Abhijit Sarkar says:

    I have the opposite problem where I don’t retain anything and keep getting IllegalReferenceCountException.
    https://stackoverflow.com/q/48074943/839733

Leave a Reply

Your email address will not be published. Required fields are marked *

× Big News! Announcing Infrastructure Monitoring and our Cloud Observability Platform! Read more