What is the Best Modern Way to Approach My Data?

A good metadata management platform is a collaborative, living map of your data and data context.

4
 min. read
June 7, 2022
What is the Best Modern Way to Approach My Data?

This post is part 2 of our 'Data Documentation series.' In part 1, we wrote about how to make your company's data easy to understand and use. In this post we will discuss modern day approach to your data and how to best set it up so that everyone in your company can use data to achieve business goals.

Go beyond a data catalog to a collaborative, living space for data discoverability and understandability.

Transition to something that will serve your modern needs

A good metadata management platform is a collaborative, living map of your data and data context.

Maps have been around a long time. Heuristics advance efficiency, and generally make human life easier. Who would want to reinvent the wheel every time they travel, anyways? No paper map experience matches that of an all-inclusive, satellite nav, and community-driven platform for travelers, like Waze. Waze doesn’t just answer the question of how to get from A to B, it provides the whole picture of the traveling experience and allows you to communicate to others about obstacles you encountered along the way. No one would tell their newly licensed, 15-and-a-half-year-old daughter to use a paper map, a compass, and local lore when Waze exists. Just like Waze simplifies navigation for travelers with any experience, your metadata platform should be incredibly simple even for less technical users to operate with ease.

Data users should be able to follow a data asset’s dependencies, tag and rate assets, and see the asset’s popularity of use- all in real-time. Removing the friction from your data user's experience means not only will your data will be used more often and to its fullest potential, but also users are empowered to mitigate data issue risks on their own. Not all data can be accurate and up to date, but all data users should understand how old, how correct, how important, and how widely used the datasets they are building an understanding from are.

Every data user should play the part of a data steward. When you see a car broken down on the side of the road, drivers use Waze to alert other drivers. The people who know most about the data are the people that use it. Data users need to know:

  1. What data should they care about?
  2. Why should they care about it?
  3. How have others created assets from this data?

Answering these questions efficiently is what all data catalogs aspire to do - unfortunately, not all catalogs are built the same and too many of them focus on outdated ways to document and/or make the data useful for users. Good documentation is a combination of exposing both highly relevant technical context and social context of how trusted colleagues are using data and business context.

Good technical documentation consists of information about the data project and the data warehouse. Specific things like lineage, model code, tests, data types, schema, and table sizes give data teams the technical specs they are looking for. Technical documentation, for example, dbt or Snowflake, lives close to the data in documentation (doc) blocks, or comments on the code.

Helpful Context Checklist that extends beyond technical documentation

Anyone and everyone using the data for your business should understand:

Metaphor Data Context Checklist
What
  1. What does this metric mean? What does each column of this dataset mean?
  2. What goes into the calculation of this metric? What other sources of data does this dataset pull from?
Why
  1. Why was this dataset collected originally? (Bonus points: What business questions is it meant to answer?)
  2. Why should I trust this metric or dataset?
When
  1. When is this metric or dataset refreshed?
  2. When was the last time this dataset or metric was validated?
Who
  1. Who is using this metric or dataset and how often is it used and trusted?
Where
  1. Where is this data generated? Where does its source of truth live?
How
  1. How do others use this metric or dataset?

These questions are near impossible to answer without clear, current documentation, and yet a constant state of consumable documentation can be difficult, time-consuming, and expensive. Managing, modeling, merging, and analyzing data has gotten increasingly more complicated and costly as data has gotten, and will continue to get, bigger and bigger.

Previous static approaches to business glossaries, such as data dictionaries in google sheets, waste resources to create something that will most likely be either obsolete the moment they are created or costly to maintain. Data catalogs have more information about the data in real-time but still aren’t actionable for data teams. Making data actionable is empowering everyone to create and contribute to a living collection of information about data, allowing companies to manage data at scale.

Making data actionable is empowering everyone to create and contribute to a living collection of information about data, allowing companies to manage data at scale.

Context is key

Metadata

Metadata is by definition, data about data. Metadata platforms should capture all the data your business has about its data. Slack conversations about datasets are metadata. Use cases for data sets are metadata. Incident reports are metadata. Deprecation notices are metadata. A metadata platform worth your money will let data users organize, explore, and contribute to metadata flexibly. Enabling data engineers to create the hierarchy and roadmap of where information about your data lives makes life for your data users blissful, instead of overwhelming. Good ways to organize your documentation are tailored to your business and should serve data users at every level of understanding.

Data “Wikis” / Explainers

Data wikis are a one-stop-shop presentation of the metadata for specific data assets. Data wikis, also called data explainers, are editable by data users so that quality can be monitored in real-time by the consumers themselves. Data wikis capture everything in the above context checklist. They should be easy to find and share. Some data assets may even have accompanying explainer videos if the data assets are important, ambiguous, or difficult to understand. Data wikis and explainer content ensures a seamless onboarding experience for data users so that they can be confident in taking action on data-driven insights. Regardless of how in-depth your data wikis are, integrating them with the day-to-day work of data users saves pain and increases productivity.

Governed Tags

Tags activate your metadata. Tags make data discoverable and understandable. Tags both compliment in-depth explainers and do the heavy lifting in terms of functionality if you don’t have data wikis. How your company chooses to tag will be unique to your business. One example could be setting definitions for gold, silver, and bronze standards. Your finance team will be able to query and collate all data with a gold standard and cross-reference data tagged by domain to automate important metrics like rates of return on investments by the department. Your privacy and security team could have their own set of tags for saving queries or communicating flags to data consumers and users. Collaboration on the same platform, rather than presentation slides and hope that the message (or warning) gets across, simplifies data governance.

Creating data wikis may not be the best fit for you, but tagging assets can fill in understanding gaps at a glance. Tag management should be as flexible and lightweight as possible so that it naturally embeds within your data team’s roles and daily rituals, rather than being a scrambled afterthought. It is also crucial to embed tag functionality within your workflows. Even with the most fantastic metadata management platform, your source of truth needs to be part of your development process from the get-go.

Tags give your data engineers an “easy” button. If data quality fails, and it is a serious fail, the data engineers could change the status of the tag and your metadata management platform would propagate that change to all other dependencies with ease. While your data engineers do the work of mitigating the quality issue, tags serve as the autopilot to inform all business users of the data’s current quality construction state. That way, your decision-makers never miss a beat on what data and insights are reliable. A good metadata management tool provides the discoverability and understanding of your data landscape to aid in both impact analysis and change management.

Context around your data saves data users time and reduces decision-making errors. Context should be easy to consume and share with others. Everyone from your engineers and analysts to your decision-makers and leaders should be empowered to trust their data.

Helping your business wield your data with power gives you a sharp competitive advantage.

About Metaphor

The Metaphor Metadata Platform represents the next evolution of the Data Catalog - it combines best in class Technical Metadata (learnt from building DataHub at LinkedIn) with Behavioral and Social Metadata. It supercharges an organization’s ability to democratize data with state of the art capabilities for Data Governance, Data Literacy and Data Enablement, and provides an extremely intuitive user interface that turns even the most non-technical user into a fan of the catalog. See Metaphor in action today!