Anonymization/masking of data in SnowPark
by Mauricio Rojas, on Sep 28, 2022 11:41:16 AM
Data governance is a strategy for handling data within an organization. It is a set of rules, policies, standards, practices, etc. whose main purpose is to ensure data has high quality and integrity, is safely stored and there are no ambiguities in the meaning of common terms.
Applying this strategy is a long process, engaging the whole organization, especially IT and data-consuming departments. Certain data governance tools help apply these hypothetical plans in real life.
But.... it is something that Snowflake really takes into consideration, and there are some features to can leverage for this purpose.
Snowflake’s approach to access control combines aspects from both of the following models:
- Discretionary Access Control (DAC): Each object has an owner, who can in turn grant access to that object.
- Role-based Access Control (RBAC): Access privileges are assigned to roles, which are in turn assigned to users.
But you can also secure your data at different levels:
- Row Level Access Policies: with these policies, the data at rest is not masked, but at query runtime, Snowflake will generate the query output for the user, and the query output only contains rows based on the policy definition evaluating to
And it can be set on tables or views:
- Column Level Access Security: Also known as Dynamic Data Masking. In this case sensitive data in plain text is loaded into Snowflake, and it is dynamically masked/encrypted at the time of query for unauthorized users. You can learn a lot from this blog post by Vikas Jain
For example with Dynamic Data Masking you can ensure under which circumstances data is actually shown.
You can apply a masking policy directly on a column, or more recently with new feature like object_tagging, you can associate masking policies to a tag, so they will apply to object that have that tag applied to them.
How does this data-masking work?
I think is good to understand a little of how this works. In general Snowflake transparently rewrites the query at runtime wherever it finds a column with a masking policy attached. For authorized users, query results return sensitive data in plain text, while for unauthorized users sensitive data is masked, encrypted, or tokenized, as can be seen on the table from Vikas's article.
Your policy is not restricted to work on just one column. It can use several values a.k.a conditional data masking:
And to apply it you can use an statement like:
This is just a brief introduction. However it shows how sensitive information can be easily taken in Snowflake and protected with great ease. This can also apply to data shares eliminating the need to copy data.