Hadoop moving forward: a sneak peek at security & performance
As Hadoop is steadily making its way to the enterprise, the bar is raised in many ways. Since two of the most obvious requirements to be able to make it there are security and performance, i cherry-picked some related announcements made recently – shortly before and after Hadoop Summit, the biggest Hadoop event of the industry – to help shed some light to where we may be headed.
There was a session on Hadoop security in Hadoop Summit, and it was packed. It’s easy to see why. For starters, Hadoop security – or lack thereof- is a hot topic. At the moment, there are a number of issues with it, making some go as far as to say that no security is better than minimal security. It’s not easy to set up and configure, it lacks features, and it does not work uniformly across the stack. So naturally people would like to see some improvement there.
And right now the vendor that organized this session – Cloudera – is among the leaders in trying to get security right. Sentry, the project they started that aims to provide more fine-grained security capabilities to Hadoop, is currently undergoing incubation at the Apache Foundation.
They are not the only ones though, as recently Hortonworks announced the acquisition of XASecure, a startup working on a security framework for Hadoop that promises to deal with many of its shortcomings in that area. As pointed out by Andrew Brust in his recent Gigaom blog post, this is definitely a domain to keep an eye on.
In fact, we are just finishing off a research note on the topic to be published shortly as a syndicated Gigaom report, so if you want more details stay tuned.
The other thing that caught my eye was this blog post by Jethro Data. We’ve covered Jethro Data and what they do recently, and one of the things we talked about was their use of benchmarks. So it seems they’ve decided to go public with some of their early results which were not disclosable at that time.
What Jethro Data did is they took the benchmark specification published by Cloudera in their recent Impala benchmark and used it to evaluate their own system, and the results look pretty impressive. In Cloudera’s benchmark, Impala 1.3.0 clearly outperformed Hive 0.13 on Tez, Shark 0.9.2 and Presto 0.6.0. In this benchmark, Jethro Data clearly outperformed Impala.
Of course, both benchmarks are to be taken with some pinches of salt as being vendor-driven, however if anything they give some hints as to what we can expect at this point. Interestingly enough, this coincides with Jethro Data’s private beta program kicking off, and will obviously provide quite a boost for them. My feeling? Jethro Data is a prime candidate for acquisition.
Bonus stage: since benchmarks and competition is bound to leave someone unhappy in the end, let’s finish off this post in a cheerful way: i would not go as far as to call this a “Holy Moment”, but yes, it sure was nice 🙂