Data Documentation Neglect: The Hidden AI Killer Lurking in Your Organization
Skip the painful guesswork in data documentation—Metaphor’s AI-powered platform keeps you compliant, collaborative, and innovative.
Join us for a discussion on data governance and engineering at companies of all sizes, exploring the differences, challenges, and more!
Unlock the secrets of data governance and engineering across different company sizes with insights from Hyde Park Digital’s Ariane Hoffenberg and Metaphor’s Kirit Basu in our latest conversation!
Speakers: Ariane Hoffenberg, Kirit Basu
So I'm from Paris, France, where I studied engineering, mathematics, and business. After that, for a while, I've been in data. Initially, I worked at Food Panda and Delivery Hero, which is a big food delivery company present mostly in Southeast Asia and the Middle East, but in a bunch of different countries. I started with BI and then moved more into data science and was leading data science for adapting our models for all the different countries from our headquarters.
Then I moved to a company called Sides, doing an application for HR recruiting, especially for short-term jobs, where I was leading all of the data team there. It was a smaller startup, so I had to start the team from scratch. It was a mix of data engineering, data science, analytics, and BI. Then I moved to NuBank, where I spent nearly four years, and I was leading the analytics engineers. A bunch of different teams focused on a mix of data engineering, data governance, data quality, and most recently, about seven months ago, I started working on my own company called Hyde Park Digital, where I help large corporations leverage their data and set up data governance.
Cool, that sounds really exciting, all over the place too. I'm curious, I'd love to get your take on how you see the difference between smaller, more agile places that you've worked at versus some of these really large companies that you're working with now. Obviously, there's a huge cultural difference, but what about from a data perspective? Are they getting to the next level? Are they aspiring to get there?
Yeah, there are a lot of differences I find between small startups, large tech companies, and large non-tech companies. In small tech companies or small startups, the data team is not so big, so everybody does a little bit of everything. Everybody starts with the modern data stack from day one with a centralized team. When the tech company starts becoming a bigger tech company, it starts having multiple teams working with data, and then you need better processes, better data governance, better data quality, and more structures in place. Different teams start working in different business units and groups, so it's important to have a connection between all of them.
In much larger corporations that didn't grow up as tech companies, it's usually more siloed, technologies are older, and people have made different technology decisions and ways of using data that they haven't updated yet. But in sectors like finance and energy, people are very enthusiastic about becoming more data-driven, using modern tools, and making sure everyone uses the data in the company.
Do you find these organizations are easier to move towards modern data stacks, or is it a big, long, drawn-out process?
It depends on the processes and use cases. Some things can start with proofs of concept and gradually move into production, which is important. But data sharing between business units is always a topic. How much do you share? What are the commitments of each data sharer to the data consumer? Especially in regulated markets like finance, it's not just a technical discussion; it's a legal and organizational one. Some use cases can go quickly, but some things require more time and discussion.
For use cases that can go faster, the technology is there, but do you run into political or people issues trying to break down silos where people don't want to?
Yes, for technology and digital transformation or data usage to be successful, there usually needs to be a senior-level C-suite person pushing for it. More and more large corporations have a chief data officer or chief digital transformation officer or a CIO. This neutral force can take political decisions and align everyone without being tied to a specific business unit. This helps move things forward faster.
In your day-to-day, do you interact at that level for the long-term vision, or is it more about architectural and process work?
It's a bit of both. There's the transition itself and ensuring it lasts for the future. Technical decisions on architecture are often not the blocker; it's the processes and alignment between people. I focus on making sure technical teams and business teams work together, follow good data governance practices, and have a plan for quick results without rethinking a new plan every time.
You mentioned governance. What is the full spectrum of governance that companies should think about?
Data governance includes definitions, tags, data stewards, quality, observability, access, and GDPR concerns. It's also organizational—who decides on data usage and is responsible for maintaining and sharing data. In large organizations, ensuring data lineage and interdependencies between teams is crucial. Everyone must be aligned on core concepts, definitions, and data processes to avoid miscommunication and ensure quality and structure. Data governance is about managing data effectively, ensuring quality, and getting the most value out of it.
You're referring to data literacy and enablement, right? Making sure everyone understands the value and parameters for getting value out of data?
Yes, and it's challenging, especially as organizations grow and people who created datasets leave. Managers and executives often struggle to understand the extent of their data assets. I've seen large corporations unsure about their data when needing to comply with new regulations. It's hard to figure out all data assets with many parallel systems and siloed data.
We talked about data contracts. Can you share your experience with them?
Data contracts are a big trend, often programmatic. In traditional companies, they are legal contracts defining data sharing, SLAs, and responsibilities. It's not as modern but still critical. There's much to learn from both modern tech companies and traditional corporate practices.
What are things that don't work in these scenarios? Things we aren't thinking about properly?
One common mistake is not making decisions based on actual needs. Some companies try to replicate practices from others without considering their specific context. There's no one-size-fits-all solution in the data world. It's essential to find the right balance of complexity and necessity. For example, a startup shouldn't adopt practices meant for a large company and vice versa. The solution should fit the problem, not just be the latest trend.
That's insightful because our industry often promotes the latest trends without considering if they fit every situation. Companies should tailor solutions to their specific needs.
Exactly. Companies should understand their maturity level and find the right solution for their phase. It's normal to have different approaches for different stages of growth. Being innovative and disruptive is important, but it should serve the company's needs.
The data industry is dynamic, and data professionals are passionate about new technologies. Bridging the gap between technical and non-technical people is exciting, especially with tools like data catalogs and LLMs. These tools make data more accessible and spark innovation.
If you had an unlimited budget to solve big problems, what would you focus on?
I'd focus on managing data at scale in large organizations, helping executives monitor and coordinate data use cases, and ensuring alignment between teams. There's a lot of potential for tools that help manage large data assets and scale use cases effectively.
From a product management perspective, helping people coordinate and manage data is crucial. It's an evergreen field for innovation.
Thank you so much. It was wonderful talking to you.
The Metaphor Metadata Platform represents the next evolution of the Data Catalog - it combines best in class Technical Metadata (learnt from building DataHub at LinkedIn) with Behavioral and Social Metadata. It supercharges an organization’s ability to democratize data with state of the art capabilities for Data Governance, Data Literacy and Data Enablement, and provides an extremely intuitive user interface that turns even the most non-technical user into a fan of the catalog. See Metaphor in action today!