The Strata Conference always brings a lot of press releases and big news about new distributions of software. Some of the most talked about but unsurprising announcements today are that Greenplum and Intel are both releasing their own Hadoop distributions. I think these are only the first of what will soon be many additional distributions of Hadoop; everyone is trying to get their own slice of the Big Data pie. The important question is: What do these new distributions bring and how do they play with Cloudera, MapR and Hortonworks?
The Intel Distribution for Apache Hadoop is most notable for the additional security capabilities it provides. Security is the not-so-secret gaping hole in Hadoop. Intel completely rethinks security providing fine-grained file access control and, more importantly, encryption support. Previously in Hadoop, you could perform authentication through Kerberos, but file permissions was limited, and there was no support for encryption. This announcement from Intel should put pressure on the other distributions to improve security support within Hadoop.
Greenplum’s announcement of Pivotal HD on its own does not look to provide anything new. The new factor in Greenplum’s offering is the add-on Pivotal Advanced Database Services that provide improved SQL support to Hadoop-based data. I am hoping Greenplum will be using a true ANSI SQL dialect allowing it to run machine generated queries. Hive, with its SQL-like HiveQL, is generally not accessible to applications that generate queries. Concurrent’s newly announced Lingual, which sits on top of Cascading and Greenplum, could challenge Hive in this space.
Cloudera is still the elephant in the room, pardon the pun. With a several-year head start on all of their competitors, Cloudera has had longer to turn Hadoop into a commercially viable product and is still the market leader. Not content to sit on their laurels, Cloudera announced this week two major additions to their enterprise offering. Cloudera Navigator will ease the challenges of permission control, data lineage and file monitoring within Hadoop. In addition, Cloudera BDR (Backup & Disaster Recovery) will make many system administrators smile simplifying the backup and disaster recovery processes and configuration in a Hadoop cluster. Both are important elements in making Hadoop a true enterprise class solution.
Hortonworks is still new to the market and where I expect Intel and Greenplum will be in a year. And eventually, Hortonworks will release their much anticipated version 2.0, giving it a stable foothold. Hortonworks’ ability to use Yahoo as a test bed means that software they release is stable enough for Yahoo and, therefore, is probably stable enough for your organization. Hortonworks is also opening Hadoop up to companies that run Windows with their recent release of Hortonworks Data Platform for Windows.
MapR, some would argue, is Cloudera’s biggest competitor at this time. With their proprietary mountable file system with high availability name nodes and job trackers, MapR continues to push Hadoop to the level of enterprise class hardware. With disaster recovery and performance improvements over Apache Hadoop, MapR continues to push the Hadoop envelope in its quest to become a best in class application.
I look for all of these distributions to continue to innovate and challenge each other for the next few years. These new challengers should accelerate the path to Hadoop becoming best in class enterprise software; however, the rule of three suggests that there can only be three big competitors in any industry. If this rule holds true we will see that at least two of these distributions will merge or only play a niche role in the market.