Towards a unifying data theory and practice: Combining operations, analytics, and streaming
We’ve heard the one data platform to rule them all story before. Could it be this time it’s actually true? New Pivotal-backed and Spark-compatible open source solution SnappyData promises so, and we take the opportunity to look inside and around for whys, hows, and options.
Hadoop disrupted, and in some ways became synonymous with big data, by offering a framework for cheap storage and scale out processing. Parallel to Hadoop came the flurry of NoSQL solutions that also addressed the need for massive storage and processing for data that is not necessarily zdnet.com/…/towards-a-unifying-data-theory-and-practice-combining-operations-analytics-and-streamingstructured.
Over time, Hadoop evolved into an ecosystem built on HDFS and MapReduce, its storage and processing foundations, including pieces such as a key-value store (HBase) and various SQL-on-Hadoop implementations. NoSQL solutions have also been gradually adding SQL to their arsenal, as SQL is a point of convergence and a de facto industry standard.
Hadoop started out geared towards analytics, NoSQL solutions come in many flavors and often support both operational applications and analytics. A third type of processing that has become part of the equation is streaming.
Ingesting and processing infinite streams of data in real-time is getting to be part of everyday operations for many organizations, and solutions have emerged in this space as well. Now evolution is moving towards unifying these hitherto disparate modes — transactional operations, analytics and stream processing — into a common framework.
The evolution of Hadoop has brought on Spark, a new framework and API that builds on Hadoop’s ecosystem but brings in-memory processing, SQL and streaming support to the table, among other things. And now Spark is becoming the foundation for convergence of transactional (OLTP), analytical (OLAP) and streaming data processing.