Let Robots Do the Hard Work: Data Governance in the Age of AI
This talk explores how GenAI transforms data governance from compliance to innovation, driving data discovery, access, and automation at Metaphor
Explore the evolution of the modern data stack at Zendesk with insights on tackling the challenges of a multi-cloud environment & the shift towards a data mesh
Experts from Zendesk discuss the benefits and challenges of using a modern data stack, the evolution towards a data mesh architecture, and the importance of data governance, literacy, and an API-first approach for effective data management and collaboration.
Speakers: Kirit Basu, Mahdi Karabiben
Really good to talk to you. Can you tell us a little bit about your background and what you do?
Yeah, sure. Well, first of all, thank you for having me. I'm really glad to be here and looking forward to our discussion. My name is Maddie. I'm an engineer and business analyst based in Paris, France. I have a background in software engineering with a degree in the field, and I started working with data as a student. In 2017, I worked with an NGO on a data visualization project, mainly building dashboards and interactive charts for different types of websites. Since then, I realized I really enjoyed working with data, even the messy parts like data cleanup and wrangling.
When I came to Paris for my end-of-studies internship, I joined an ed-tech company called Nomalys, where I worked on revamping the data architecture for one of our products. I got into the Hadoop ecosystem, using Spark, Airflow, and more. After that, I worked in finance for a few years, first at an investment bank on a Hadoop cluster for about a year and a half, and then at FactSet, a financial data provider. At FactSet, I built a data lake on AWS and set up an Apache Superset environment, which was a great experience working with various stakeholders and use cases.
In 2021, I joined Zendesk, working with the Enterprise Data and Analytics team, mainly delivering data products to internal teams.
Nice, very cool. I'd love to understand more about the technologies you work with at Zendesk. How has the architecture evolved?
At Zendesk, we currently use a modern data stack. Our data warehouse is BigQuery on GCP, and we manage data orchestration with Airflow and data transformation with DBT. We also have a data lake on AWS, with S3 for storage and other components of the data stack on AWS. It's a pretty standard architecture but with the added complexity of a multi-cloud environment, so we have to manage data movement between AWS and GCP, as well as from our SaaS applications and vendor data.
Got it. I know multi-cloud is a big thing. Have you found it ultimately beneficial, or would you prefer a single-cloud environment?
Honestly, there is no definitive answer because it depends on your use case and product. For Zendesk, we started using BigQuery a few years ago when it was the only data warehouse offering serverless storage and compute separation. Today, the ecosystem has evolved with options like Snowflake that work across multiple cloud environments. If we were making the decision today, we might not choose a multi-cloud setup, but at the time, BigQuery was the best option.
Could you tell us a bit about the business challenges you're trying to solve at Zendesk?
We focus on data domains, dividing the assets and data products we deliver into different domains like product data, finance data, and customer data. This helps us address various business needs efficiently. For example, product data helps product teams understand feature usage and make roadmap decisions. Each data domain caters to different business needs, allowing us to be more scalable and responsive.
How is the data team structured at Zendesk? Are you centralized or decentralized?
It's a mix. Zendesk has a distributed architecture with different data engineering teams. The Enterprise Data and Analytics (EDA) team, which I'm part of, maintains the data warehouse and internal data products, serving various use cases. We also have foundational engineering teams managing product data on AWS and teams like Explore, which manages data for specific products. This setup naturally evolves towards a data mesh architecture, with distributed ownership of data, though we're not fully there yet.
How do your customers like this setup? Does it affect their experience?
The closer we get to a data mesh implementation, the better it is for our customers. Moving data ownership to the teams producing the data improves efficiency and workflow, making us more agile and better at delivering data products. Centralized teams can't scale as well because they lack the business knowledge of all the data assets. A data mesh approach helps streamline processes and deliver better outcomes.
Did the shift towards a data mesh happen organically, or was it a directive from the top?
It happened organically. As the company grew and data assets became more complex, specific teams naturally developed in-depth knowledge of certain data domains. This led to a distributed ownership model, aligning with the principles of a data mesh.
Have you encountered any pushback against this approach?
It depends on how easy we make it for teams to deliver data products and prioritize data quality. The platform team has a responsibility to simplify the process for other teams. It's also about changing the mentality to see data as a product that needs to be managed and maintained with clear standards.
How does governance work at Zendesk?
Governance is crucial for managing and documenting data assets, ensuring compliance and data quality. Metadata management is key to unlocking governance at scale. If you tag and document your data properly, it becomes much easier to implement governance practices and comply with regulations.
How important is data literacy, and how do you approach it?
Data literacy is very important. Having a well-documented data catalog is essential for democratizing data access. Users need to understand what the data represents, its quality, and how to use it effectively. Without proper documentation and metadata, even accessible data can be confusing and underutilized.
How close are you to the end-users, the citizen analysts, in your daily work?
It varies, but having frequent interactions with end-users can indicate issues with the data architecture. We aim for a self-service pattern where users can build on top of foundational data domains. This reduces the need for constant back-and-forth and allows us to focus on core data management while enabling users to create specific reports and analyses independently.
What are your thoughts on an API-first approach for data catalogs?
I think it's a missed opportunity not to leverage metadata through APIs. An API-first approach allows metadata to be integrated into various tools and workflows, adding value beyond a standalone catalog. For example, it can inform BI tools of data quality issues or help analysts access data directly from their notebooks.
If you had unlimited resources, what would you change in the industry to improve governance and data management?
There needs to be a shift in mindset towards pushing data ownership to the teams producing it. Data mesh is a concept for this, but it's challenging to implement fully. Ensuring that data is well-managed at the source and minimizing transformations can simplify data management and open up more opportunities for automation and advanced features.
Thanks so much for chatting with me. I really enjoyed our conversation and hope we can keep it going.
Sure, it was really fun, and I'm always up for discussing metadata and the state of the modern data stack. Super, thank you.
The Metaphor Metadata Platform represents the next evolution of the Data Catalog - it combines best in class Technical Metadata (learnt from building DataHub at LinkedIn) with Behavioral and Social Metadata. It supercharges an organization’s ability to democratize data with state of the art capabilities for Data Governance, Data Literacy and Data Enablement, and provides an extremely intuitive user interface that turns even the most non-technical user into a fan of the catalog. See Metaphor in action today!