Prime Factors Blog

How Anonymous is that Anonymized Data in Your Testbed?

Posted by Jeff Cherrington on Feb 4, 2015 8:53:00 PM

Catching up in SlashDot late last week, I came across an article one of the contributors lifted up, reporting some surprising research on anonymization.  Researchers at MIT demonstrated that, starting from anonymized payment data, they required as few as four transactions to be able to identify a cardholder.  They could accomplish the same thing with as few as three transactions, if price was included.  This may have implications that impact best practices for manipulating copies of production data moved into quality assurance or development test beds.



Working from three months' worth of transactions for 1.1 million cardmembers, Applying big data technology and pattern recognition techniques, the research team was able to identify individuals based on date and location of purchases.  The article draws the conclusion that current approaches to anonymization of payment data, whether shared with third parties for marketing purposes or for use in testbeds, are insufficient.

Many organizations currently take steps to anonymize copies of production transaction data pulled into testbeds, principally focusing on the primary account number (PAN), and some even going to the extent of substituting random names.  What this research shows, however, is that more steps are required to ensure consumers' privacy, more specifically devising the means to randomize dates, locations, and payment amounts.  As the research is independently verified and the technical community becomes more broadly aware of it, there is every chance that more diligent anonymization across a wider range of fields will become the best practice expected by internal and external auditors.

Learn more about how Prime Factors' EncryptRIGHT offers you the means to anonymize data in testbeds, even fields never considered before, here.