When personal consumer data becomes a liability instead of a competitive advantage

The Holy Grail for brands and retailers is having great customer data that they can act on. Understanding consumer shopping habits, searches, and interests are a serious competitive advantage that tech giants like Facebook, Google and Amazon have already mastered. As big data becomes more accessible, more companies are able to leverage consumer insights to drive their business forward, personalize customer experiences, and differentiate their brand from competitors.

But storing, maintaining, and accessing the same data that drives business is also a major liability for businesses of all sizes. The minute companies start acquiring consumer data, they start a ticking time bomb for major security breaches that can involve names, DOB, SSNs, and emails/passwords. And once customer data is breached, whether by an external bad actor or by an employee unintentionally, consumers lose trust in the brand, which can be devastating to smaller businesses that may never recover.

The California Privacy Act has outlined demands for companies collecting customer data, requiring them to disclose the kind of personal data they are harvesting. As regulations like GDPR make their way to the U.S., companies are going to be held more accountable for how they are collecting and storing consumer information and what they need to disclose to the public.

How then in today's era of personalized shopping do companies balance the liability they face by having the very information they need to stay competitive and win over customer business? How can they maintain customer trust, protect themselves from security incidents while knowing enough about their customer to sell to them better, smarter and faster than their competitors?

The simple answer is that businesses are collecting far too much personal information from consumers, information that they do not need to operate or sell competitively and only puts them and their customers at risk for security breaches.

First, let's examine what businesses really need when it comes to consumer data to build good products and services and better understand their customers. Artificial intelligence (AI) or more accurately machine learning (ML) relies on good training data to build up its knowledgebase. Traditionally, companies building products relying on ML will acquire fully identified data containing both identity and behavior data to train its model.

In reality, ML only needs behavior data to train its model. Identity data is pure risk. For example, retailers do not need shopper identity data to optimize product pricing and placement in stores and optimize consumer experience. Credit card fraud detection ML products do not need consumer identity data to help train their models to spot fraudulent charges. Personal finance ML products do not need consumer identity data to train their models to recommend ways consumers can be more financially responsible.

Similarly, data scientists for brick & mortar and online retailers do not need consumer identity data to spot macro shopper behavior trends or product trends. Data scientists for game developers do not need gamer identity information to understand where gamers are getting stuck on a level or why they are spending more or less on in-app purchases at a particular point in a game.

Since it's evident that ML companies and data scientists don't need identity data to train their models and spot behavior trends, the question becomes: What should companies do with that consumer identity data?

The answer is, think long and hard about how it's being collected, stored, and used. Before this data is collected, companies should think: How will I specifically use this information? And, how will I protect this information? If you're not sure how you will use the data beyond generally thinking “one day someone may find this valuable,” then it's best not to collect the information. And, if you do know how you're going to use consumer identity data, then you should defang it, while still maintaining its utility, by both anonymizing it and storing it separately from consumer behavior data.

This will allow companies to still act on customer information, like shopping patterns, while still maintaining the option to re-connect behavior data and identity data when needed. Keeping the databases separate and anonymizing personally identifiable information (PII) greatly diminishes the value of either one individual database to malicious actors. There's no reason that personal customer data needs to become a liability if businesses understand what data they actually need to make decisions and how to separate consumer identity and behavior data and store it safely.

Changing to this type of behavior is going to require discipline and commitment from companies. It's going to require a real shift in how we think about technology like ML and how it should train models, and what data that data scientists actually need to analyze and spot product trends. But, this shift will be necessary for companies like brands and retailers to remove risk while staying competitive.