Videos

Modern Data Governance and Trust for Winning in Manufacturing

Learn about modern data governance and trust in manufacturing, focusing on data accuracy, compliance, and agile processes that drive data-driven decisions

Co-Founder & CEO
5
 min. read
March 21, 2023
Modern Data Governance and Trust for Winning in Manufacturing

All right, so we're onto the last technical talk for today and for the whole symposium, before the final keynote which will also take place here. I'm very honored to introduce Pardhu, who is an expert in data governance for developing trustworthy AI systems. He is both an open source thought leader and entrepreneur, creating this Metaphor data platform that he will be sharing with us about today. So let's welcome Pardhu.

Thank you, thanks for the intro. Thanks for having me here. Before I introduce my talk on data governance, I want to just take a quick pulse check around how many of you have worked with any data governance platforms or are familiar with it, or anything related to data compliance or data warehouses? Not much? Okay, a little bit, thank you. So, I will just cover a little bit of an intro to what that means before I talk about the most practical aspects around this, right?

Why is this relevant, especially for this kind of conference? As you must have heard, data is everywhere. Every single company is trying to be data-driven and make every business decision based on data rather than just gut feelings or domain knowledge alone. But is everyone being successful? What does it really need? What are the challenges they are seeing, and why is data governance important to win? I will take an example. This is not very specific to the manufacturing industry, but we have a customer who is from a manufacturing industry, and I will explain how that becomes relevant in that domain.

Going into that, as introduced, in my past before starting my company Metaphor to bring this solution to every company out there, I worked on the metadata team at LinkedIn and also built multiple data platforms at Amazon, and within LinkedIn, metric stores, etc. The theme is constant, right? Everyone wants to democratize the data and make a lot more people inside the company use the data to make decisions. How do you scale it? It's not just a technology problem, it is an organization problem, it's a process problem. Solving that for LinkedIn inspired us to create this company and bring it as a solution for every company out there.

What is data governance? If you Google it, you will see a wide variety of explanations. Unfortunately, it's not a concise one-liner or two-liner; people have written long descriptions, including Wikipedia. But considering all of those, the simplest explanation I like to give is: how do you ensure the data you are using inside your organization is accurate for making every business decision and compliant? You must have heard about compliance for a long time, but it's a lot more relevant now with GDPR, CCPA, and new laws coming around, even laws about AI model compliance and explainability. It's not just about making decisions; you need to make sure you do it the right way and can explain it to people. Everyone is kind of familiar with this.

Why are people, processes, and technologies important? Data governance splits into multiple facets around the data. Who owns this data? Who is responsible for this data? As the data keeps moving across your systems, starting from sensors or IoT devices collecting data, it gets transformed by analysts or real-time systems, processes into multiple layers, and eventually models and features are built out of it. Who owns which stage of the data, and who has access to this data? Stories abound about people misusing data, like AI models trained with all kinds of data leading to personalized pricing. Governance has to step in to prevent abuses. Data must be secure; any leaks can bring down your company. Ensuring data quality and capturing knowledge about the data's domain or context are essential. 

The most common solution for the last two decades has been the manual data governance approach, which is very top-down and restrictive. A chief information security officer or data governance officer makes decisions based on laws and organizational interests. This approach is very centralized and operated by data steward committees, leading to a disconnect from the actual data systems and a slow process that can take up to six months to get access to data.

Companies like LinkedIn and Facebook have tried complete AI-based detection and enforcement, translating governance laws into models that manage data automatically. However, this approach lacks human oversight and domain context, leading to challenges in practical application.

Metaphor aims to find the right balance by combining technical metadata with social and business context to build trust in data. We provide insights into what's happening with your data across the organization and enable collaboration between data producers and consumers. We give a 360-degree view of data governance, including who is involved, what data assets are involved, and why they are involved.

An example use case is Sub-Zero, an appliance manufacturer. They adopted a modern data stack and Metaphor to bridge the gap between manufacturing and data analytics. We integrated in less than a month and surpassed the usage of their previous platform within four months. We provide insights into business impact and data trust, enabling agile data governance across the company.

In summary, agile data governance enables data democratization and data-driven decisions across companies. Any questions?

Fantastic. Let's start the Q&A. So I know different areas of the world have different data governance laws. How do you handle that from a company's side, like between America and EU laws?

Good question. We translate those laws into policies regarding access, storage, ownership, and reporting. We implement these policies across data warehouses, BI reports, and models, ensuring enforcement and auditability.

The attraction of automated model governance is its automation. How tunable is your model if people aren't contributing knowledge to the governance model? How much automation can I apply?

Great question. We believe in leveraging automation as much as possible but also allowing human oversight. You can customize your controls depending on the criticality of your data systems. For example, finance-related data is more restrictive than R&D data.

Regarding data discovery, how much do you depend on user contributions?

Not much. We leverage out-of-the-box automation for about 60% of the work, providing value before expecting user engagement. We tap into historical data and conversations to bring relevant insights automatically.

Two questions: how much time does it take to start with a client compared to the automated process, and how do you deal with mature architectures and cultural issues?

Excellent points. Fully automated systems require high integration costs and often break down with domain-based exceptions. We balance automation with human oversight and operate with lightweight metadata integrations, making us compatible with both modern and legacy systems.

What about unstructured data?

We face more challenges with unstructured data but leverage metadata and profiling tools to handle it. Continuous progress is being made to improve support for unstructured data.

Regarding trustworthy AI and auditability, for example, a self-driving car incident, how do you achieve that level of auditability?

Data governance solutions provide auditability into each system, showing how data originated, transformed, and led to the decision-making point. This involves understanding the entire data stack and its modifications.

How much extra effort is required from developers to ensure auditability?

It's a shared responsibility between developers and data systems. Platforms should support developers by making these processes easier while developers need to follow best practices.

About Metaphor

The Metaphor Metadata Platform represents the next evolution of the Data Catalog - it combines best in class Technical Metadata (learnt from building DataHub at LinkedIn) with Behavioral and Social Metadata. It supercharges an organization’s ability to democratize data with state of the art capabilities for Data Governance, Data Literacy and Data Enablement, and provides an extremely intuitive user interface that turns even the most non-technical user into a fan of the catalog. See Metaphor in action today!