A common challenge is utilizing Vault to secure application secrets stored outside of Vault. In many cases, highly sensitive information such as credit card numbers, social security numbers, and other Personal Identifiable Information (PII) are stored in databases rather than within Vault. Applications need to securely access and store PII in a way that aligns with the organization's stringent security standards. This requirement necessitates balancing accessibility and adherence to security protocols for handling sensitive information.
This article explores common solutions and challenges around protecting application secrets, then details how you can use the Vault Transform Secrets Engine to address these challenges.
The following sections describe common solutions and processes for protecting sensitive data and the challenges around using these solutions. You'll learn the information you need to decide on the appropriate solution for protecting secrets stored outside of Vault, and why the Vault Transform secrets engine is a recommended solution.
Typically, data encryption is the initial solution for protecting secrets. While data encryption is a good starting point, it can be a complex and expensive solution to implement on your own.
That's where Vault's transit secrets engine can be helpful. Vault can encrypt your sensitive data and provide you with the resulting ciphertext. You can then store this ciphertext, and rely on Vault to manage the encryption keys. The following diagram illustrates a credit card number going into the transit system, and returning as encrypted data.
However, there is a tradeoff associated with this approach. The ciphertext is often much longer than the original data. Additionally, it may contain characters not allowed in the original data, which can cause your system to reject the encrypted data. As a result, you may need to modify your system to accept the ciphertext. These modifications could involve changing the database schema or adjusting the data validation system, among other possible changes.
Tokenization is another commonly employed method by companies to safeguard sensitive information. It involves substituting sensitive data with a randomly generated token through a one-way cryptographic hash function. The token is stored in an external database, while a secret management system maintains a mapping between the original data and the token.
Effective tokenization, combined with a robust random number generator, ensures protected data security regardless of location. However, tokenization may retain the length of the data but not necessarily its exact format. Consequently, format checkers might reject the tokenized value as "invalid data," preventing its acceptance by the system for use in transactions.
Another challenge arises as the number of tokenized values to be stored increases, leading to the expansion of the hash table. This scalability issue can be problematic. Moreover, performance becomes a critical concern, as it is necessary to balance the speed of hash searches while maintaining cryptographic data security.
Considering these challenges, implementing a scalable tokenization solution on your own is a complex task. Proper protection of the hash table is crucial for ensuring its integrity and security.
Vault's Transform secrets engine is our recommended solution for protecting secrets outside of Vault. The Transform Secrets Engine offers three types of transformations: format preserving encryption, data masking, and tokenization.
Vault Transform secret engine is available with Vault Enterprise and the Advanced Data Protection (ADP) module license.
Format Preserving Encryption (FPE) is a two-way transformation that allows you to encrypt external application secrets while maintaining the input's original data format and length. The following example shows that Vault transforms the credit card number into an encoded ciphertext that follows a similar structure.
Maintaining a similar data structure for encoded sensitive data allows for flexible storage without requiring extensive database changes. For instance, consider a scenario with a substantial customer database containing billing information that requires protection. By ensuring that the encoded ciphertext maintains a similar format, you can store the ciphertext back into your original systems with minimal changes to the database.
FPE, in contrast to conventional tokenization methods, produces encrypted text that preserves the original data's structure and format while ensuring the security of the encoded values. By utilizing FPE in Vault, users can create custom transformation templates using regular expressions. FPE enables users to specify the values they wish to transform and enforces a schema on the encoded value. In addition, there are pre-existing templates, such as those for credit card numbers and US social security numbers, readily available to help you get started quickly.
To ensure the security of the encoded ciphertexts, Vault utilizes the FF3-1 encryption algorithm, which has been vetted by the National Institute of Standards and Technology (NIST). This algorithm has been updated and tested to protect against specific types of attacks and potential future threats from future supercomputers.
FPE transformation is stateless, meaning that Vault does not store the protected secret; rather it protects only the encryption key material necessary to decrypt the secret's ciphertext. Storing only the necessary material to decrypt ensures maximum performance for encoding and decoding operations while also minimizing the possibility of exposure of that secret.
- Developer documentation
- Transform secrets engine tutorial
- Encrypting data with Transform secrets engine
- Transform secrets engine API
Data masking is a one-way transformation that performs the replacement of matched characters on the input value with a desired character. This form of transformation is non-reversible and thus does not support retrieving the original value back using the decode operation.
Utilize data masking when showing or printing sensitive data without full readability. For instance, online banking systems commonly mask users' account numbers to prevent potential security breaches by bad actors who might be observing the screen, unbeknownst to the account owner.
Similar to Format Preserving Encryption, the tokenization transformation is a two-way transformation that allows you to encode secrets as random tokens and decode tokens back to their original plaintext. In the following example, Vault transforms the credit card number into a token with no algorithmic relationship to the original value. Unlike Format Preserving Encryption, the token does not retain the format of the original input. Because the token is entirely random, it satisfies the PCI-DSS requirement for data irreversibility.
Vault protects a cryptographic mapping of tokens and plaintext values in its internal storage. The tokenization transform prevents attackers from recovering the plaintext, even if they steal the underlying transformation key and mapping values from Vault.
Since tokenization is stateful, the encode operation must perform writes to storage on the primary node. As a result, the primary node's storage performance limits the scalability of the encode operation. This differs from Transit encryption and FPE where the encode operation does not require write to storage and can be horizontally scaled using performance standby nodes.
By default, Vault stores the cryptographic mapping of tokens within its internal storage. For high-performance use cases, we recommend that you configure Vault to store this mapping in an external database. Users often prefer external stores because they can achieve a much higher performance scale and reduce the load on Vault's internal storage. Vault currently supports the following databases: PostgreSQL, MySQL, and MSSQL. For more information on external storage, visit the Tokenization transform storage documentation.
Similar to Transit encryption and FPE, tokenization requires keys and their management. Vault creates a new key for each tokenization transformation. This allows you to ensure a strong cryptographic distinction between different tokenization use cases. For example, a credit card processor might have different merchants for which it would like to tokenize credit card numbers, such that there is a strong cryptographic distinction between even the same credit card number used by different merchants.
Tokenization transform supports automatic key rotation based on a user-defined time interval. Each configured tokenization transformation keeps a set of versioned keys, such that when a key rotates, older key versions are still available for decoding tokens generated in the past. You can also specify the minimum key version Vault can use to decode value. Tokens that were generated below the minimum key version cannot be decoded.
Automatic key rotation requires Vault Enterprise 1.12.0 or later.
By default, tokenization transformation produces a unique token for every encode operation. The Vault tokenization solution also supports convergent tokenization, which allows you to have the same encoded value for a given input so that you can query your database to count the number of entries for a given token.
Convergent tokenization is useful when you want to do statistical analysis of the tokens as they relate to some other field in a database (without decoding the token), or if you need to tokenize in two different systems but be able to relate the results.