Let Robots Do the Hard Work: Data Governance in the Age of AI
This talk explores how GenAI transforms data governance from compliance to innovation, driving data discovery, access, and automation at Metaphor
Documentation is a job that no one wants to do but everyone needs to do it.
This post is part 3 of our 'Data Documentation series.' In part 1, we wrote about how to make your company's data easy to understand and use, and in part 2 we went over how to approach your data using modern methods so that everyone in your company can use data to achieve business goals. In this post we will discuss how to best engage your team and broader organization in collecting and democratizing knowledge.
Documentation is a job that no one wants to do but everyone needs to do.
Some executives believe that the task of making the entire company’s data understandable and usable is impossible. They feel that if someone had a solution that could do that, they would be making billions of dollars. They were correct that, if pulled off, harnessing the full power of their company’s data would be a miraculous feat. A lot of work goes into understanding, processing, and using raw data for mighty insights that power your business.
Documentation is important, but not urgent. Wrangling your employees to take time away from executing their goals isn’t wise when you have to build and ship products at a rapid speed. Ensuring employees have a ubiquitous understanding of that documentation is even harder. Unless documentation is your thing, the only thing that stinks more than doing documentation is reading documentation.
Back before the mid-2000s, the world got used to collecting and managing data for specific purposes, such as online inventory management for e-commerce, malaria case number country profiles for general public health, or fraud detection in banking. When data drove value and measurable impact on business, we started taking once subjective fields and encoding human sentiment and behavior into more data. We turned subjectivity into a science. Then the world began to find that data became more powerful when connected to other sources of data. Big data became hot for insights it could bring, then deep data brought specialized (lucrative) decision-making power. The business landscape took to automation, interoperability, and observability, and potentially every industry could be considered the tech industry. If today you don’t have business metrics and underlying data down, you’re at risk to fail to reach your market. In modern times, even with better tech and bigger data, the issue of how to best manage your data about your business,for your business prevails.
In modern times, even with better tech and bigger data, the issue of how to best manage your data about your business, for your business prevails.
Startups all face the same set of classic problems as they cross “the chasm” (as Geoffery Moore would say), poised to become a real disruptor in their space. Say you’re a generic CEO whose hypothetical startup is starting to gain traction and momentum in your market. As quickly as you are landing big deals, your existing customers are threatening churn. When your pipeline and bookings grow, so too does the prolific list of requirements, demands, and deadlines. Pressure builds and you scale out your customer success department, maybe even bring in some sales or enablement professionals to retain business and those hidden costs start to get .. pricey. Maybe you even shift your best engineers to address issues like latency specific to your key accounts. Are your employees armed with the data to see customer pain points, new market opportunities, or innovative engineering solutions? Somehow, in the midst of all this, you’re expected to have killer positioning to enter new markets, build out your partnership ecosystem, and ship new products and features. Data documentation seems trivial when threats to your business keep you up at night, but without interoperable, clearly documented data to paint the bigger picture, it becomes difficult to monitor issues and problem solve at scale.
As you scale, every part of your business becomes foreign, complicated, and in the hands of someone new. Even though you have hired all the right professionals, the problems are all exacerbated by growth! How do you drive the adoption and stickiness of your product when those professionals you hired don’t understand your average user inside and out? The business doesn’t have the data to both make long-term, strategic decisions and do day-to-day jobs quickly, and accurately enough (be that understand the market, understand the customers and prospects, understand your operations, accelerate development, etc). By this point, it is too onerous to do all data documentation and socialization at once. Right? Wrong.
Old approaches to data management are stuck. The gap between those who understand the data in their respective functions and those who need to use the data from other functions gets wider and harder to bridge as the business scales. Connecting information makes it more powerful. Universal understanding of information doesn’t fall into place naturally- leaving data users and decision-makers on their own to interpret data without context can have ugly consequences for the rest of the business.
Modern data management is focused on data democratization: company-wide discoverable, understandable data, tailored to the information the specific data user needs to do their job. Modern data management should drive effectiveness, efficiency, and security across the whole business. It begins with empowering every data user to be an owner and communicating clear accountability over data assets. It is fostered by strong processes and consistency and is reinforced with rewards.
Modern data management is focused on data democratization: company-wide discoverable, understandable data, tailored to the information the specific data user needs to do their job.
Data users aren’t just your data team anymore. Every data user should be empowered and expected to own the company’s data or any data catalog approach will be next to useless. Documentation standards determine your data’s quality, freshness, and usability. It’s important to remember that there is no single omniscient knower of all of your data.
Humans understand the data they use often and know best, so a crowd-sourced effort will always be better as your data expands. Traditional data steward roles understand how data sources they own are collected, collated, and combined, but data users understand the nuances of what decisions are driven by the amalgamation of datasets and insights. Your approach to documentation solution should be able to encompass both the traditional, technical documentation and also all of your unique business context.
The metadata management platform you choose should be easy and fun to use for executives, engineers, and analysts alike. Capture all of the relevant information about your data that lives in legacy knowledge. Every data asset should communicate relevant information about itself and provide an incentive for the reader to keep the whole data ecosystem up to date.
Your data team often knows what they are looking for, but there really should be a packaged experience for those less experienced.
My advice for companies who want to unlock their data superpowers:
1. Don’t throw your data users in the deep end of the pool.
2. Don’t start from scratch with every data asset.
Data Analysts and Data Engineers are builders and value drivers. They didn’t sign up to be data support teams. Metadata platforms give engineers and analysts a seamless way to curate and communicate their knowledge without the time and headache of keeping technical and business contexts piecemeal. The gap between the data insight consumer and the raw data translator is like the game of telephone, except the stakes are potentially your revenue.
Metadata is not meant to be managed in a snapshot. Data discovery on information about what data you used to have years, months, weeks, or even days ago is irrelevant to the data that you have now. Everyone who should be able to access the data should also be able to understand it and trust it. Who has the time to play information scavenger hunt? We are past data-lore and information - metadata platforms should allow you to own and use your data, not let your data own you.
Most data users have a job title other than data steward - like Data Scientist, Business Development Representative, or CEO. If documentation is not easy for them to do, their core duties will, and should, come before documentation. Clearly communicated process around how and where the organization manages metadata is necessary for good documentation so that datasets are trustworthy and data users can make accurate conclusions from insights.
Incentivization of good documentation behavior should be through rewards, not mandates. Yes, making documentation easier for yourself and others is technically an incentive, but what if we could make it more fun? Data bingo could be a cool addition to onboarding new employees. Adding a game like a scavenger hunt with a small prize at the end would be a fun kickoff to get anyone acquainted with their data landscape.
You could even offer a knowledge reward system - where every addition of information to your metadata management platform accrues points. Employees could use points earned when doing activities, such as adding clarity to a metric, to spend on company merch or give to the charity of their choice. Whatever enablement plan you choose, adding a fun spin to it helps drive adoption.
When metadata exists in silos across the business, you want to make things as easy as a single button aggregating important datasets, dashboards, views, or tables.
Modern metadata platforms empower people at every level of the business to be data-enabled leaders.
Leaders who lead with data:
Leaders who don’t lead with data:
One solution to your data documentation used to be expensive, time-consuming, and impossible to do correctly. But that isn’t the case anymore. Everyone at every level of your organization should be empowered to lead with data because scaling a business sustainably in modern times requires modern solutions.
Metadata management enables better decision-making at scale. Metadata management is not easy, but it is also not rocket science. It is consistent, careful attention to your curated data experiences.
The Metaphor Metadata Platform represents the next evolution of the Data Catalog - it combines best in class Technical Metadata (learnt from building DataHub at LinkedIn) with Behavioral and Social Metadata. It supercharges an organization’s ability to democratize data with state of the art capabilities for Data Governance, Data Literacy and Data Enablement, and provides an extremely intuitive user interface that turns even the most non-technical user into a fan of the catalog. See Metaphor in action today!