Redshift data masking is a way to create a fake but realistic version of your organization’s data. The goal is to protect sensitive data while providing a functional alternative when real data is not needed, such as user training, sales demonstrations, or software testing. Redshift data masking processes change data values using the same format. The goal is to create a version that cannot be deciphered or reconstructed. There are several ways to change data, including character shuffling, word or character substitution, and encryption.
Why is data masking important?
Here are some reasons why data masking is important for many organizations:
- Data masking eliminates several critical threats – data loss, data theft, internal threats or account compromise, and insecure interfaces with third-party systems.
- Reduces data risks associated with cloud adoption.
- Makes data useless to an attacker while retaining many of its inherent functional properties.
- Allows the exchange of data with authorized users such as testers and developers without revealing production data.
- Can be used to clean up data – normal file deletion still leaves traces of data on the medium, while cleaning up replaces old values with masked ones.
Types of data masking
There are several types of data masking that are commonly used to protect sensitive data.
Static data masking
Static redshift data masking processes can help you create a cleaned up copy of a database. The process modifies all sensitive data until a copy of the database is secure to share. Typically, the process involves backing up the database in a production environment, loading it into a separate environment, deleting any unnecessary data, and then masking the data while it is in stasis. The cloaked copy can then be moved to the target location.
Deterministic data masking
Enables displaying two datasets that are of the same data type, so that one value is always replaced with another value. For example, the name “John Smith” is always replaced by “Jim Jameson” wherever it appears in the database. This method is convenient for many scenarios, but inherently less secure.
Data masking on the fly
Masking data as it travels from production systems to test or development systems before saving the data to disk. Organizations that frequently deploy software cannot back up the original database and apply masking — they need a way to continuously stream data from a production environment to multiple test environments.
On-the-fly masking sends smaller subsets of masked data when needed. Each subset of the masked data is stored in the development / test environment for use on a non-production system. It is important to apply masking on the fly to any channel from the production system to the development environment early in the development project to prevent compliance and security issues.
Dynamic data masking
Similar to on-the-fly masking, but data is never saved to secondary data storage in a development / test environment. Rather, it is passed directly from the production system and used by another system in a development / test environment.
Data protection for AWS Redshift
Organizations continue to move more data to the cloud to take advantage of storage scalability and cloud analytics. Data warehouse solutions like AWS Redshift provide flexible access to analytics for incredible amounts of data. With this shift to the speed and flexibility of the cloud, security can often be left behind or seen as secondary. And when data protection measures are discussed, they can often be seen as too destructive for applications and business intelligence efforts or too complex to implement.
Baffle Data Protection Services (DPS) for Redshift is a custom-designed software solution designed to simplify the end-to-end security of today’s data pipeline. Baffle DPS enables you to deploy a transparent data security mesh that de-identifies data migrated to cloud storage or staging environments, and supports redshift data masking and access control for AWS Redshift.