Out of the Hadoop box: SQL everywhere and AtScale
AtScale has made a name for itself by providing an access layer on top of Hadoop that enables it to be used directly as a data warehouse. AtScale is now announcing support for Teradata DW and Google Dataproc and BigQuery, offering what it calls a Unified Analytics Platform. Why this move now, how does it work and what does it mean?
You may not realize it, but Hadoop has already been around for 10 years. Even now, with most organizations having in one way or another adopted it, not everything about it is obvious and clear. But when it first came out from Yahoo in 2006, Dave Mariani, AtScale’s co-founder and CEO, was one of the first to use it and realize its potential.
He was at the right place at the right time: Mariani was doing analytics in Yahoo, delivering data to drive business insights and advertising on the company’s assets. DW and cubes were pretty much the only game in town for analytics then, and a big game too. Mariani, a data cube veteran with numerous implementations under his belt, mentioned that “a single one of these cubes at Yahoo could drive revenue in the area of 50 million dollars”.
Mariani, like most industry experts today, realized that Hadoop could revolutionize the data industry due to its properties: a shared-nothing architecture that meant it can scale-out in a seamless, cost-effective way, a framework on which ETL and processing jobs can run, and late binding / schema on read. He realized that earlier than most, or at least he acted upon it earlier.
In Yahoo, as well as in Klout which Mariani joined after Yahoo, Hadoop was heavily used, but the BI landscape was what it had always been: fragmented, using a plethora of tools ranging from Excel to MicroStrategy. At that time, the only way for those tools to be able to use the data stored in Hadoop was to take data out of Hadoop and store it in a DW. Then SQL-on-Hadoop came along, Cloudera set out to release Impala, Mariani was recruited, and the rest is history.