Multi-cloud data catalogs the easy way, using metadata and machine learning by Waterline Data

Data governance is drudgery, but in the post-GDPR world, it’s beyond foundational. It’s essential. Waterline Data wants to help do it the easy way, automating as much of it as possible.
Keeping track of all your data — where it’s been, where it’s going, who accesses it, and what they do with it — is neither fun nor exciting. But it is a necessary substrate for holistic data management, and in the age of GDPR and CCPA, it’s also a legal requirement. This is what data governance is about.
Data catalogs are the unsung heroes of data governance. A data catalog is loosely defined as a metadata management tool designed to help organizations find and manage large amounts of data. Today, one of the key players in the data catalog space, Waterline Data, is announcing updates in its product, and ZDNet took the opportunity to discuss with founder and CTO Alex Gorelik.
Waterline Data is a single-product company. Its data catalog is what every solution it offers is based on, from Metadata Management and Data Lineage to Sensitive Data Discovery and Data Rationalization. Today’s release is centered around a new DataOps Dashboard, which Waterline says can serve as a regulatory hub where companies can understand the macro risk of their data estate.
The DataOps Dashboard allows users to easily locate and view specific files that contain regulated sensitive data, and help expedite the identification, remediation, and documentation processes to meet GDPR and CCPA requirements. Gorelik, however, pointed out that there is another big improvement: A new agent architecture that enables hybrid multi-cloud support.
“Waterline can now catalog and automatically tag data in multiple clouds like AWS, Azure and Google Cloud Platform; on-premise big data systems like Cloudera and MapR; cloud databases like Snowflake and RedShift; and on-premise relational databases. The agents can run natively on Apache Spark or in a container for environments that do not have a Spark cluster,” says Gorelik.
Another new feature is support for data residency laws that restrict sending data out of the country. An agent can be configured to do all processing and discovery locally, and only send non-sensitive metadata to the central catalog. Finally, there are improvements around usability, personalization, and collaboration.