The Data Ownership Stalemate in Enterprise Data Orgs Today

This article describes the data ownership challenges that large data orgs experience today and provides a perspective for an ideal solution for data ownership

Co-Founder & CEO
Founding Engineer
9
 min. read
November 17, 2022
The Data Ownership Stalemate in Enterprise Data Orgs Today

Imagine this scenario. 

You lead HR strategy and operations for a tech agency of 400-500 employees. It's time to work on employee promotions. Your team has been tasked with benchmarking compensation data with the market and synchronizing the compa ratio to employee ratings. 

  • Your HR Ops team owns the benchmarking baseline data by employee level. 
  • Your employee engagement team owns the data policies and access to employee payroll data. 
  • The internal IT team owns access to the data pipeline implementation changes. 
  • A designated HR data product manager owns the backlog of the data product, allowing you to consume and report current employee data to the leadership team.
  • Your strategy team owns the overall HR data model implemented in your org.

Pop quiz: Who owns the compensation data in this scenario?

If your answer is 'It's complicated' you win this quiz. Not because it is the correct answer, but because it describes the predicament well. And you are not alone. Data organizations worldwide, especially large enterprise orgs, are divided on data ownership. 

But what does data ownership mean to you and in the context of your organization? Let's find out.

What is data ownership

David Loshin describes in his seminal book 'Enterprise Knowledge Management' that data ownership is a management issue. The principles of making org-wide decisions usually manifest in a centralized or decentralized format. And since most organizations today depend on data for critical decisions, the principles of running an org also follow through to the principles of data ownership. 

Let's look at the data ownership paradigms described in this ISACA report. We see that data organizations typically use the concept of single or multiple data owners to determine what their data ownership would look like within a data catalog. 

But describing multiple data owners and enabling accountability to each is easier said than done. As a result, most data organizations create a data owner who represents the business function and then assigns them the responsibility of chaperoning access, policies, and a resolution hub for all issues associated with data quality. As a result, the buck for maintaining and upholding data quality within the data catalog stops at this ownership team. At the same time, there is no accountability for deviant data-producing or consuming practices for the other stakeholders in the system. 

What does an ideal data ownership model mean for your org? 

If we were to start defining an ideal state of data ownership with multiple owners, we should first clearly articulate what it would enable as a future state for the org. For example, some of the objectives that an ideal ownership model would enable could look like the following:

Discover the right org-wide data sets

An ideal data ownership model would enable your data consumers to truly discover data sets across the data catalog, not just for the data sets that have secured the proper funding and leadership visibility. In addition, such a data ownership model would allow the business and tech teams to be indeed able to unlock value for the enterprise by leveraging suitable data sources.

Request access with the right approval controls

Once your data consumers have discovered the right data sets, they can also request access with the correct permissions and controls in place. Such a provisioning system would mean that your data ownership model provides a set of workflows to route such requests to the right business or technical data owners. It would also enable them to set up automated methods to approve them promptly without dealing with bureaucratic red tape.

Create a mechanism to request enhancements

Now that your data consumers have access to these data sets, they can request changes or enhancements to data and metadata attributes which would help the organization innovate at the right pace while still providing visibility to changes in an orderly manner. The correct data ownership model would not just pass these change requests to the right data owners but also inform the necessary data stakeholders to approve or provide business and technical context on what to watch when these changes are implemented. In the long term, this also keeps the data set cleaner, by avoiding unnecessary duplications and inconsistencies in the data set.

Audit and monitor regularly for compliance

Access and provisioning would always need to be time-bound and purpose-driven. The correct data ownership model would enable your data stakeholders to sleep peacefully at night, knowing that compliance and security are built intentionally into the model. When your data consumers are informed that their purpose for requesting access to the data set is complete, the right set of workflows should initiate an audit to ensure that access has been provisioned on a need-to-know basis, and trigger compliance notifications to the right data stakeholders to review offboarding or expiry for those past requests.

An Ideal Data Ownership Model: Federated Ownership With Shared Accountability

Let’s go back to the HR example. Given the ideal ownership model we described here, all the data stakeholders involved within the HR data catalog scenario should be able to discover, access, enhance, and then seamlessly relinquish access once their work is done. To manifest this ideal ownership model into reality, you should be able to unblock your stakeholder team to discover all these data sets within the data catalog in question. Finding the right set of data, followed by a shared understanding of how the data is produced, processed, and consumed throughout the data catalog, is key to bringing this model to life. 

The platforms built on today’s modern data stack fail to solve the discovery and shared understanding problem effectively because they only manage technical metadata (SQL code, schemas) for a given data catalog. Technical metadata, combined with business metadata (what do users call it?) and behavioral metadata (how, where, and who uses it), can help you discover new data sets and build a consistent shared language for your teams.

These data ownership principles with multiple owners might seem utopian at first. However, with the right platform, they can be realized quickly and painlessly to implement an model that federates ownership but does not pass the buck on data quality and governance accountability.   

Before Metaphor, the worldview was divided between Operations or Data/IT being the de facto owners and guardians of the data, depending on where the data consumed was being held or generated. However, regardless of how you consume the data, we want to create shared responsibility and accountability for the data as it flows downstream. This accountability is meant for compliance, understanding and fixing data lineage issues, and incentivizing good citizenship for the people who create, manage, and consume the data as it flows through your org landscape. As their purpose for consuming the data is fulfilled, Metaphor enables data users to review all the data assets across the stack regularly for periodic compliance review.

Conclusion: It Takes a Village

Data ownership is not a one-and-done problem. Nor is it a minor issue that can be relegated to one part of the enterprise organization, which relinquishes accountability from the rest of the organization. Instead, a reasonably federated ownership model would compel each data stakeholder to responsibly use and investigate data quality. It would also motivate each producer and consumer to play their part in enabling the one essential role they can play to perfection: 

Making trusted, reliable data available to anyone driving business value.

About Metaphor

The Metaphor Metadata Platform represents the next evolution of the Data Catalog - it combines best in class Technical Metadata (learnt from building DataHub at LinkedIn) with Behavioral and Social Metadata. It supercharges an organization’s ability to democratize data with state of the art capabilities for Data Governance, Data Literacy and Data Enablement, and provides an extremely intuitive user interface that turns even the most non-technical user into a fan of the catalog. See Metaphor in action today!