Videos

Solving Data Discovery with Modern Metadata Platform

Watch the Data-Centric AI Summit 2022 talk on solving data discovery challenges with a modern metadata platform that enhances visibility and collaboration

Co-Founder & CEO
5
 min. read
September 29, 2022
Solving Data Discovery with Modern Metadata Platform

Hi, I am Pardhu Gunnam, CEO of Metaphor Data. Welcome to my talk on solving data discovery with modern metadata platforms. In today's talk, I will cover why data discovery is difficult, what a modern metadata platform (MMP) is, why you need one, and how to build it.

Quick introduction about myself: I am CEO and co-founder of Metaphor Data. In the past, I've also worked on the metadata team at LinkedIn, building something similar where we created a data hub with my teammates and started a company around that as well.

Why is data discovery difficult? You've been hearing this term quite a lot these days, and it's been around for a while, but why has it really become a challenge especially these days? We should blame that on the modern data stack. The modern data stack has helped to democratize various aspects of data processing, especially on the creation side and the query side. There are wide varieties of tools all the way from Snowflakes, Databricks, pipelines, etc. These have made it extremely easy to create these assets, move them, and transform across your dataset within the company. What used to be just a core data team's job has been democratized across multiple personas in the company. This also brought in a lot of variety of data assets and use cases across your company, enabling your company to be data-driven.

One of the challenges this brought in was data discovery, which is essential for finding the right data to solve your use case the right way. This used to be just a big company's problem but has now transformed into a small company's problem with thousands of data assets created to drive the business based on the data.

Data discovery problems are not just the data team's problem. All the personas involved, from data engineers to business users, face challenges. Data engineers spend time supporting and maintaining observability. Producers like analysts and data scientists spend time discovering and creating trusted data artifacts. Business users care about trusting the data to make the right decisions. Heads of data platforms worry about the ROI on their data investments. All these challenges are common, especially with the modern data stack.

Why don't existing tools solve the problem? Data discovery is traditionally solved with data catalogs, which have existed for over two decades. These catalogs, from Informatica to Alation and Collibra, focus on technical metadata aimed at core data teams. However, this approach alone is not enough.

Metaphor's approach combines technical metadata with social and business aspects. We look at what business use cases the data solves, who is using it, and how it is being used. This combination provides not just discoverability but also visibility across the company, enabling collaboration and reuse.

How did we do this? We created a knowledge and context-driven platform. Traditional catalogs looked at dashboards, metrics, and business terms, but we went further by including social and behavioral contexts. We looked at slack conversations, incident reports, and usage patterns to provide a 360-degree view of the data.

Combining technical, business, and behavioral metadata creates a modern metadata platform (MMP). We ingest all this data and serve it through various interfaces like web apps, Slack, Teams, and browser plugins, providing a seamless discovery experience.

What is an MMP? It started with our journey at LinkedIn, evolving from a traditional data catalog to a push-based real-time metadata platform. In 2020, we open-sourced it as DataHub, now one of the most popular open-source projects in this domain.

Metadata can solve various use cases beyond data discovery, like lineage, observability, governance, and privacy. New paradigms like data mesh and active metadata management push data management through metadata.

Why do you need an MMP? Independent systems have their metadata, but a modern MMP handles complexity and scale. Metadata has become as complex as data itself, requiring scalability, reliability, extensibility, and ease of integration.

Solving data discovery with an MMP involves rich filtering, documentation integration, and collaboration. At Metaphor, we've built a platform that addresses these needs, combining technical, business, and behavioral metadata.

To conclude, data discovery is challenging with the modern data stack and democratization. An MMP solves various business and organizational challenges. Building a great MMP involves scalability, reliability, extensibility, rich APIs, and ease of integration. With Metaphor, we've demonstrated how an MMP can effectively solve data discovery and other use cases.

Thank you for attending this talk. If you want to give it a try yourself, go to Metaphor. Thank you.

About Metaphor

The Metaphor Metadata Platform represents the next evolution of the Data Catalog - it combines best in class Technical Metadata (learnt from building DataHub at LinkedIn) with Behavioral and Social Metadata. It supercharges an organization’s ability to democratize data with state of the art capabilities for Data Governance, Data Literacy and Data Enablement, and provides an extremely intuitive user interface that turns even the most non-technical user into a fan of the catalog. See Metaphor in action today!