With machine learning, AI and privacy all becoming priority initiatives for companies, why has the data tug-of-war between IT and developers become such a challenge?
The tension between data science teams and IT departments stems in part from cultural differences. Traditionally, data science is rooted in a “hacker,” outside-the-box, experimental culture while IT professionals come from a process-oriented “design & build” mindset.
The tools most data scientists use are open workbenches, with many different programs, mostly open source, and a ‘the more data the better’ orientation. Data scientists usually want ALL the data, and need to explore it before they even know what to try. IT must provide only the necessary data and ensure it is private and safe.
GDPR and ML – a tricky combination
GDPR has been called one of the most important, and rigid, data privacy standards of the last two decades. Its requirements sent shockwaves through the technology sector leading up to the May 25, 2018, implementation date. The need to comply with GDPR played a strong role in the development of new solutions, especially those powered with ML and AI.
The biggest hurdles for ML technologies relate to GDPR’s requirements of “explainability” and “transparency.” However, in the current state of ML, the models that typically have the best outcomes employ deep learning and deep neural networks that are traditionally opaque. While this is ideal for the majority of desired ML-powered applications that should work discreetly in the background, it directly infringes upon GDPR transparency and explainability requirements.
As a result, the most effective ML algorithms have to be passed over, in favor of algorithms with more transparency, such as decision trees. Data scientists that are unaware of this naturally use the more powerful, more opaque algorithms, resulting in a system that is out of compliance and poses a significant risk to the organization.
A lack of awareness is prone to causing friction between the data science teams leading ML initiatives and the IT departments keeping track of compliance.
So, how can organizations and developers continue to take advantage of the rich data available while adhering to the data privacy and transparency standards in place?
ML innovation & compliance don’t have to be at odds
While data scientists will always push back on IT, a shift has begun between the two groups as they look to adopt tools that will make business processes easier. Luckily, technology companies have recognized this disconnect and are actively rolling out solutions with risk management, security, and privacy standards built in. They are essential components to the technology solutions, rather than an afterthought.
For example, a data platform that maintains source data and presents an anonymized view to standard ML environments can resolve the impasse around access. A tool that takes GDPR into account and only presents algorithms that are explainable in the ML workbench can alleviate the risk of noncompliance.
Finding a balance with the right tools
In an age of rapid innovation fueled by the valuable currency of data, it is increasingly important to identify comprehensive, enterprise-wide platforms to manage information. Organizations then can gain a more accurate and complete view of data, identify operational and compliance risks and suspicious activities faster and more thoroughly than ever. And IT departments can more easily address data compliance issues and work more seamlessly with data science teams without completely derailing projects, opening up opportunities for machine learning-driven innovations.
Jeff Fried is director of product management for InterSystems.