I recently had the pleasure of a stimulating dialog with an acquaintance, which all started from a simple question. He asked "When would I use tokenization and when would I use encryption?" His short question spawned several extended calls and emails, as we explored the implications together. I will share parts of the conversation over the next few posts, and invite you to join the dialog. Also, if you find this discussion interesting, please join me next week, when I will explore this topic during a webinar "Contrast & Compare: Tokenization vs. Encryption for Data Protection".
First, let's define what each of the technologies are -- both are means for protecting data. Encryption is the process of applying algorithms (sometimes referred to as ciphers) to readable sensitive data, referred to as cleartext or plaintext, so that the outcome is unreadable -- that is, ciphertext. Generally, these algorithms are used broadly, for the purposes of standardization. If only the plaintext was input to the same algorithm, the body of encrypted data would be subject to cracking via frequency analysis. This is avoided by adding cryptographic keys as additional input to the process, so that the outcome varies based on the key used. This process is, by definition, reversible, so that applying the appropriate algorithm and key to the ciphertext results in an exact copy of the original plaintext. Because of the implicit foundation of algorithmic calculation and reversibility, encryption has many uses for the persistent (i.e., data stored for long periods) and transient (i.e., data passing briefly over a communications channel) data protection. However, since the cryptographic key must be exchanged between parties, such exchanges are subject to man-in-the-middle attack and preventing compromise of the key during exchange is crucial.
[A future post will touch on Public Key Infrastructure (PKI) and public/private key pairs, but that is set aside for this post.]
Tokenization takes a different approach for protecting sensitive data. With that technology, a surrogate value is generated, a mapping of that value to the original sensitive plaintext data recorded, and the surrogate substituted for the original plaintext. There is no algorithmic relationship between the two values, so there is no means to reverse a process and derive the original plaintext from the surrogate token value. Since the surrogate token will only ever be useful in the context of its mapping to the original value, predominant use focuses on persistent data protection. Therefore, connectivity to the cross-reference table or database that holds the mapping of original plaintext-to-surrogate token is critical and, by extension, protection of that mapping source is crucial.
These two scales -- reversibility/non-reversibility and persistence/transience -- frame any discussion of the appropriate use of either technology. Since tokenization has limited value for transient data protection, determining the persistence/transience character of a given use case quickly dictates which to use. Consequently, virtually all of the communication channel data protection, such as secure sockets layer (SSL) or transport layer security (TLS) relies on use of encryption, using keys that are not expected to be used again ("session keys").
From this, it becomes plain that any debate over which technology to use applies only to sensitive data that needs to be persistently protected, and how reversibility applies to the projected use case scenarios. Since encryption is inherently reversible and is based on standard algorithms, only the ciphertext, algorithm identifier, and the cryptographic key are required to reverse the process. This make encryption ideal for the asymmetric exchange of data between parties -- that is, the exchange can be separated by time, by geography, and by lack of real-time network connection. Consequently, that's why encryption is so easily accessible -- even inexpensive desktop file managers and data compression utilities include strong encryption algorithms and can be used to persistently protect data.
Conversely, use of tokenization, for practical purposes, assumes network-connectivity between any process that needs to "detokenize" protected data and the central plaintext/token cross-reference. If there is no means to connect from the point-of-use to the token-to-plaintext mapping datastore, the token will remain unusable. This means tokenization is best used (if not only used) in situations that support real-time online transaction processing (OLTP). While encryption can be used in OLTP scenarios, too, and frequently is, tokenization has very limited use in any other environment. Moreover, since the token mapping datastore must be centralized to reduce its surface exposed to attackers, the detokenized plaintext must transit the network, directly implying that transient communications encryption must be applied to the transaction.
From this, already, you see that the dialog shifts from tokenization vs encryption to tokenization & encryption.
Next week, look for a post that extends this discussion, particularly on the implications of tokenization's network connectivity dependency. In the meantime, what do you think -- when would you use tokenization and when would you use encryption? Share your thoughts in the Comments section below.