Prime Factors' recently presented a well attended webinar "Contrast & Compare: Tokenization vs. Encryption for Data Protection." The audience was active, offering many questions - more than I had time to answer before the end of the session. As promised, we are responding to each of the questions in this blog, focusing today on the question "If it is accurate to say that tokenization is irreversible, then why is encryption reversible?" This excellent question cuts to the heart of one of the key differences between the two data protection strategies.
Summarizing the two approaches in just a few words, tokenization is an approach that substitutes a randomly generated & selected surrogate string with no extrinsic value for a target sensitive or regulated string that does have extrinsic value, such as a primary account number (PAN). In other words, a cardholder's PAN, which can be used to make purchase transactions via the payment networks, is replaced with a surrogate string of numbers that might superficially look like a card number but which cannot be used to pay for goods or services. The mapping of such pairs are maintained in electronic tables.
Encryption, by contrast, is a method of applying a mathematical algorithm and a selected cryptographic key to target sensitive or regulated data, such as a PAN, to generate an outcome that is different from the input. That outcome may have the same data type, string length, and even a Luhn check digit, but there is no practical means to determine what the original value was without knowledge of both the algorithm and the cryptographic key applied.
It is the latter relationship that marks encryption as a reversible process. With the knowledge of the algorithm and access to the decryption key, it is trivially easy to reverse the encryption process and derive the original value -- or PAN, in the example above. Equally, it is reproducible, in that the same data input encrypted using the same algorithm and the same key will always result in the same output. It is this aspect that makes encryption so useful in exchanging data between different parties -- they need only to both possess knowledge of the algorithm and the shared secret of the encryption key for secure data exchange. The data protection does not [necessarily] rely on any network connections, shared server access, or other systems integration. It also makes protection of encryption keys extremely important -- if a key is compromised, then all the data encrypted with that key is at risk.
Tokenization, on the other hand, when well implemented with cryptographically sound pseudo-random number generation support, does not have the same sort of predictability and reproducibility. Since the generation of a token is random, and the association of a token to a PAN is random, executing a tokenzation process for a given PAN at different times will result in a completely different outcome (unless controls are imposed to reuse previously generated tokens -- a topic for a later post). Moreover, the radomization that is the foundation of tokenization makes it impossible to reverse the process, outside of having access to the original value-to-token cross reference table. This is the burden that tokenzation brings with it -- that all parties participating in a tokenization scheme must share access to the same cross-reference tables, with real-time online transaction processing capabilities to support on-demand look up. Hence, tokenization does require network connectivity, shared server access and, frequently, other systems integration. It does, however, have a different profile than encryption with it comes to compromise-- if, through some means, one PAN-token pair is compromised, that does not yield the knowledge to reverse engineer all pairings in a given tokenizaiton system.
For additional discussion of tokenization and encryption, see the excellent white paper "Tokenization vs. Encryption: Options for Compliance" by our friend Adrian Lane, CTO of Securosis.