Data lakes going the way of the visual spreadsheet?

If you’re a spreadsheet kind of person with a ton of data sitting in a data lake, Datameer’s new visual exploration feature may be your thing.

Self-service analytics comes in different shapes and sizes, and so data lakes. Both are widely popular concepts that have been shaping the big data world, so it’s no wonder that a flurry of approaches and tools exist there.

There is also a fair amount of overlap between the two. Hadoop-based data lakes are rather common these days, but that does not make them easy to work with for the non-data science types. So self-service analytics tools make a point in trying to support them as data sources their users can connect to.

This happens through a layer of mediation, typically SQL-based. There are various SQL-on-Hadoop engines around, ranging from proprietary to open source, and each distribution comes with its own.

So, depending on how fast your SQL-on-Hadoop engine is and how big your big data lake is, your mileage on the self-service tool side will vary. Typically, such tools also try to facilitate things on their side, by supporting as many engines as possible, applying smart connection techniques, and so on.

In any case, the whole point in self-service analytics, as opposed to traditional data warehouses, is to skip the data mediation process. This requires things such as dimension definition and data cube preparation and therefore a team of people whose job is to work on that.

Read the full article on ZDNet