Breeding scalable web zombies using HBase
In our current project we use hadoop and hbase to store hundreds of thousands of documents and content elements for every user we have. We decided to move all persistence to use HBase. The advantages are clear: much easier programming model means faster success and fewer bugs. And we have unlimited space for growth in production; simply add servers. Add a sessionless web application as a GUI and you have an efficient solution both for small and for large crowds. Lovely. Now let's skip 4 months building the application to spec and our story really starts.. zombie hunting.
Help! We got a zombie!
Our development cluster runs 2 web containers with one load balancer in front; persisting into a small 6-machine hadoop / hbase cluster. We controlled resources inside the web application by instantiating one HTable per thread ( stored in a ThreadLocal ) and making sure the HConnection was shared among all threads. We were breeding zombies but we did not know it yet.
We found out when we did our first load test: a meager 32-thread continuous load sustained for 20 minutes. It needed only 3 to make our webapp go Zombie: out of native threads. And yes, we did RTFM & the ulimit and nproc thing & stuff. Still: a webapp that reacts to nothing and just sits there hogging resources.
Why ?
An HBase client uses a central object to manage its connections to the range of servers it communicates to. This HConnection object is basically a queue that serializes access to several TCP-IP sockets. And it keeps adding threads and sockets whenever it likes; in our limited setup every thread that accessed an HConnection would spawn between 2 to 5 sockets and between 2 to 10 threads.
This works fine for controlled applications where work is processed from a bound queue with a limited set of worker threads. However, a web container typically spawns threads based on the number of requests coming in. We were confronted with a web application that already had 300 threads running and 75 active sockets open only after it initialized with a 10-thread threadpool to serve requests. After 3 minutes of uploading we had 800 open sockets and 4500 threads or so. Hmm... interesting.
So.. why can I not tell HBase which threadpool to use ? I want to have one application-wide threadpool of, say, 200 threads, and have the HConnection use that. Why is there this hard-coded unbounded new threadpool in the HConnectionImpl constructor instead ? That smells like a shortcut.
There is a hackie solution called HTablePool. It gives you some, but incomplete control of resources. You can limit the number of HTable objects that are instantiated. Which in itself helps us nothing. But, because callers need to wait until one of them is available, in the end this will actually limit the number of threads actually in use. The catch is that you can never really predict how many connections and threads, as the number of region servers and zookeeper servers that the HConnection needs to talk to is logically undetermined.
No more zombies
Actually, the HBase guys know. And they fixed it; HBASE-4805 introduces a constructor so you can simply set up a bounded executor service yourself and have your HConnection use it. As we are bound by our client's infrastructure to use a cloudera hbase 0.90 we are missing out on the fix. No way around it. We ended up implementing an HTablePool solution and we managed to do it without rewriting all our existing table accessing code.. We will probably hit limits again, when the number of regionservers per table increases. But for now, we have a solution. To be continued.