A Guide to Displaying Snowflake PII Tags in Metaphor

If your data team is grappling with data discovery of your sensitive data assets within your Snowflake warehouse, their search for a solution ends here.

Co-Founder & Founding Engineer
 min. read
April 10, 2023
A Guide to Displaying Snowflake PII Tags in Metaphor

If your data team is grappling with data discovery of your sensitive data assets within your Snowflake warehouse, their search for a comprehensive metadata platform ends here. Metaphor takes away the complexity of managing sensitive data assets by automatically pulling Snowflake tags at the time of data ingestion into the Metaphor app. Your business and technical users can access insights into how these data assets have been consumed and prepare for the proper certification and quality workflows with the data experts in the organization. 

In this article, we will identify and classify Snowflake PII (personally identifiable information) data assets and show how PII-tagged columns are easily surfaced within the Metaphor platform.

Before we show how these PII data assets are displayed in Metaphor, data admins are encouraged to classify data assets in Snowflake in these two steps:

Step 1: Identify and classify your PII data assets in Snowflake

Snowflake has built-in capabilities to identify and categorize information stored in the columns in tables and views. Snowflake uses two category types to accomplish data classification: privacy categories and semantic categories. 

The privacy category tags allow classifying columns that can uniquely identify an individual by their value (IDENTIFIER) or use a combination of such fields (QUASI-IDENTIFIER) to uniquely identify them. There is also a provision for a SENSITIVE privacy category tag value to identify columns that might not identify individuals uniquely. Still, it might contain information that an individual would choose to be discreet about, like salary. 

On the other hand, semantic categories allow the classification of columns containing personal attributes, like name or address. Snowflake thus allows you to combine these privacy tag values (IDENTIFIER, for instance) with the semantic category tags (US_SSN, for example) to create an assortment of PII-specific column attributes. 

Use the Snowflake functions EXTRACT_SEMANTIC_CATEGORIES and ASSOCIATE_SEMANTIC_CATEGORY_TAGS to identify and apply these tags to the concerned tables or views within Snowflake.

Step 2 - (Optional) Secure your PII assets within Snowflake 

Once PII is identified and tagged within Snowflake, data can also be masked or tokenized to ensure individuals not authorized to view PII cannot do so inadvertently. Set your Snowflake views with a Dynamic Data Masking policy to mask data selectively when queries run against these tables or views. Alternatively, users can replace sensitive data with tokenized text by enabling External Tokenization capabilities within Snowflake. Based on role-based access provisions, query results are accordingly made available to viewers as plain text, partially or fully masked.

Finally, let's look at how these two steps ensure that PII data assets are shown in Metaphor.

Step 3 - Viewing PII assets in Metaphor

Identifying potentially sensitive data assets in the data warehouse and applying policies on these data assets might only be the beginning of the data governance efforts. It's also essential to have a comprehensive view of all such sensitive data assets across the data stack. Moreover, discovering these data assets would lead to better data quality efforts for stakeholders to build a shared understanding of how these sensitive data assets could be used effectively across the organization.

When viewing a sensitive data asset, Metaphor surfaces the PII tags so that it is easily identifiable. It also allows searching across the entirety of the data stack for assets that contain PII tags. Users can use advanced search operators to perform coverage analysis of assets that contain or do not contain these tags.

As data stewards, discovering and securing sensitive data assets can be overwhelming. Metaphor can enable you in the data stewardship journey and bring order to your privacy and security workflows by organizing your sensitive data assets wherever they reside in your Snowflake warehouse.

For organizations who do not use Snowflake or have already adopted other PII detection tools, Metaphor's open data architecture allows for a straightforward mapping of PII tags from source systems into assets within Metaphor. 

If you work with custom PII data assets, you might be interested in experiencing what effective PII data management looks like. Schedule a demo with Metaphor today!

About Metaphor

The Metaphor Metadata Platform represents the next evolution of the Data Catalog - it combines best in class Technical Metadata (learnt from building DataHub at LinkedIn) with Behavioral and Social Metadata. It supercharges an organization’s ability to democratize data with state of the art capabilities for Data Governance, Data Literacy and Data Enablement, and provides an extremely intuitive user interface that turns even the most non-technical user into a fan of the catalog. See Metaphor in action today!