Data Documentation Neglect: The Hidden AI Killer Lurking in Your Organization
Skip the painful guesswork in data documentation—Metaphor’s AI-powered platform keeps you compliant, collaborative, and innovative.
Dive into the world of data catalogs with insights from Our CEO Pardhu Gunnam and CTO Mars Lan in an exclusive chat with @Amplitude_HQ
Speakers: Mars Lan, Pardhu Gunnam
Hey Mario, how’s it going?
Good, good. Thanks for having us.
No problem, excited to have you all here.
We’ll probably skip some introductions on our team, but at a high level, we have a lot of folks who work on our data initiatives, primarily within the product development organization. So, a lot of engineers, designers, and PMs. This is being recorded, and I’ll share it with the go-to-market team as well. I’m really excited to have you both here today to give the team a quick overview of data catalogs. Imagine we’re all beginners here. I want to get your perspective as well. The call is being recorded, so without further ado, I’ll kick it off to you to tell us a bit about yourselves.
Thanks for having us. I’m Patrick, and I’m Pardhu Mukkamala, co-founder of Metaphor. I’ve been working in data platforms for quite a while. My last gig before Metaphor was working with Morris and the team at LinkedIn, building various metadata platforms and data tools and products. Happy to be here.
Yeah, Morris, co-founder and CTO of Metaphor. Like Pardhu mentioned, we worked at LinkedIn together before Metaphor. We’ll probably get into that story a bit later. Before LinkedIn, I was at Google, working on Google Cloud in the early days, specifically on Google App Engine, which is like the grandfather of AWS Lambda. So, I didn’t come from a pure data background but brought a different perspective when I joined LinkedIn and worked on the data team.
That’s exciting! On the Google Cloud team, were you working on the data catalog product there?
No, this was early Google Cloud, working on Google App Engine.
Well, excited to have you. We’ve known each other for a while, and I’ve seen your journey from LinkedIn to now. Can you describe the evolution of why you ended up building DataHub and how it emerged at LinkedIn during your time there?
Sure, happy to talk about it. When I joined LinkedIn, I was promised a brand new metadata team to lead, doing great things together. But literally the second week after I joined, the order came from the top to drop everything and focus on GDPR. We were still working in metadata to power LinkedIn’s GDPR initiative. Data catalog was a centerpiece of that, bringing visibility into the data at LinkedIn’s scale. By the time we left LinkedIn, we had 40 use cases built on top of DataHub.
Got it. How do you see tools like Amundsen or Metacat from Netflix? Were they focused on data discovery, whereas at LinkedIn, you had a broader set of use cases?
Yes, many tools focus on enabling discovery first. We approached it by getting all the metadata together to build a platform, on top of which you can build various applications. Compliance was one use case, but there were many others like data ops.
When you think about tools like Alation, Informatica, or Collibra, were they just not solving the problem the right way?
LinkedIn had an Alation installation used primarily by analysts. The major gap was that these tools were built for a top-down approach to data management. LinkedIn was always pioneering data democratization and self-service approaches, which existing tools weren’t compatible with.
So, existing tools focused on centralized governance, whereas LinkedIn focused on data democratization. Your goal was to make data accessible to everyone and open for self-service?
Yes, and also automating and streamlining workflows for data management, like data deletion and access control. Existing tools weren’t designed as platforms but as applications with APIs.
When you see the workflow around data management, how does DataHub manage these workflows?
DataHub isn’t just about having APIs. It’s about making integrations easy and supporting workflows. For example, using webhooks to trigger downstream processes or integrating with Jira for ticketing. Also, analyzing metadata should be as easy as analyzing data, exposing data APIs for this purpose.
Have you seen similar requests for metadata access on the Amplitude side?
Yes, metadata is a hot topic. What led to spinning out DataHub as a startup?
LinkedIn has a great open-source culture. We initially focused internally, but when we open-sourced DataHub, we saw a flood of interest and realized it wasn’t just a big company problem. Data democratization was important for companies of all sizes.
What needs did smaller organizations have related to data democratization? What tools were they using before DataHub?
Surprisingly, Excel was the biggest challenge to replace. Smaller companies struggled with scaling Excel sheets and needed something more efficient. Existing tools required long engagements and were expensive, so our approach helped small groups leverage data better.
In different company sizes, do you start with a subset of data and expand over time?
Yes, it happens organically. The more metadata you bring together, the more powerful the connectivity becomes, providing insights across teams and projects.
Have you run into challenges with data literacy or education around tooling in this space?
Yes, especially with business teams. They often work in silos and don’t understand data governance well. Educating these users is a challenge.
How do companies justify the ROI of data catalogs, both initially and ongoing?
One indication is the drop in time spent answering data-related questions on Slack, showing that a bit of investment in documentation goes a long way. Also, ongoing maintenance of documentation and lifecycle management is crucial to ensure it stays relevant.
What advice do you have for folks looking to start their own company?
Starting a company is rewarding. Despite market fears, data is still a hot area, and these times are often the best to start a company.
Any questions from the team?
Don’t be shy.
Maybe we should talk less next time to get more questions.
Thanks for your time, Pardhu and Morris. If we can ever help, let us know.
Thanks for having us.
The Metaphor Metadata Platform represents the next evolution of the Data Catalog - it combines best in class Technical Metadata (learnt from building DataHub at LinkedIn) with Behavioral and Social Metadata. It supercharges an organization’s ability to democratize data with state of the art capabilities for Data Governance, Data Literacy and Data Enablement, and provides an extremely intuitive user interface that turns even the most non-technical user into a fan of the catalog. See Metaphor in action today!