Hao Li, associate professor of computer science at the University of Southern California, speaks at a deepfakes session at the 2020 World Economic Forum Annual Meeting in Switzerland last January. (Copyright by World Economic Forum/Jakob Polacsek.
"World Economic Forum Annual Meeting" by World Economic Forum is licensed under CC BY-NC-SA 2.0)

Audio deepfakes currently represent the greatest social engineering threat involving the misuse of synthetic media, but live video-based deepfakes over Zoom and other visual platforms are not far away and may necessitate a mix of high- and low-tech countermeasures, according to a Black Hat presentation this week.

Matthew Canham, CEO of consultancy Beyond Layer 7, and research assistant professor of cybersecurity at the University of Central Florida, envisioned scenarios where scammers could create convincing audiovisual deepfakes of bosses asking employees to execute fraudulent wire transfers or kidnappers holding a loved one’s relative hostage.

The predictions came about as a natural extension of Canham’s discussion around a brand-new, work-in-progress Deepfakes framework that he created to help researchers describe and categorize synthetic media attacks, and help security practitioners enhance their threat modeling such that they might anticipate future attacks before they actually happen.

The framework is broken down in five segments: medium (text, audio or video), control (human- or software-powered), interactivity (pre-recorded vs real-time), intended target, and the familiarity of the deepfaked person to the target. Canham demonstrated how a number of attacks, including the aforementioned kidnapping and Zoom-based phishing scenarios, could be mapped to his framework.

“Obviously, [a] synthetic media deepfaked video would be very effective in conveying the legitimacy of this kidnapping, really upping that emotional aspect and forcing that victim to… maybe not think very critically about the situation,” said Canham.

In their haste and panic, victims could easily be compelled to pay the so-called kidnappers or, in the Zoom-based phishing case, initiate a wire transfer that was ordered by what seemed to be your CEO. “This ability to impersonate those that are close to us is going to be really critical going forward,” Canham continued.

After all, the scam already works nicely with just audio alone, as evidenced by the 2019 case of fraudsters tricking a U.K.-based energy firm’s CEO into initiating a $243,000 money transfer, after using AI to impersonate the sound of his boss on a phone call.

Real-time, human-controlled interactivity with the target such as this is especially convincing compared to, say, a pre-recorded voice message. So imagine just how believable a real-time deepfake video would be, particularly in the aforementioned kidnapping scenario.

“I am only aware of pre-recorded synthetic media being used in these virtual kidnappings, I expect that to change very quickly,” said Canham. “I think we're going to start to see some instances of virtual kidnappings in which the victim is able to interact with the criminal,” who appears to be in the same room with someone the targeted individual knows well.

Matthew Canham, CEO of consultancy Beyond Layer 7, and research assistant professor of cybersecurity at the University of Central Florida.

Canham also speculated on the impact deepfakes could have to imitate individuals on Zoom and similar video-based collaboration platforms.

“We've already seen Zoom bombing and other sorts of things happening, and I think it's just a matter of time before social engineers start to take this to the next level and use this to social engineer people in Zoom rooms,” said Canham. The deepfaked individual would likely ”be somebody who we're at least familiar with,” which in a business compromise case would likely include a co-worker or boss.

Canham also foresees deepfakes eventually being used to fool not just people, but biometrics-based authentication systems.

While deepfakes AI is advancing, so are technologies designed to sniff out synthetic media-based fraud. Camham cited Facebook’s announcement last June that they developed a research method for detecting deepfakes through reverse engineering and tracing it back to the generative model that created them. But while tech will play a part of the solution, Canham largely lobbied for more low-tech, human-centric policies to help protect potential victims.

This is will be especially important because, as Canham noted, cybercriminals have already found ways to circumvent cyber solutions such as email filters and gateways by tricking targets into placing a phone call to a malicious number where the scam can continue outside of the original email.

“This is where I see synthetic media is posing a very serious threat… and as we're transitioning more and more to online work I think these out of band comms are going to be much more significant,” he noted.

One low-tech solution is a shared secret policy. Essentially, users within a trusted group would share a special code phrase that could prove that the person you hear over the phone or see on your screen is actually who he or she claims to be. 

And that code “should be something that is not something that you often talk about. If you never talk about purple unicorns, then purple unicorn is a great shared secret to have,” said Canham.

Then there’s what Canham calls the “never do” policy, whereby executives or managers within your organization make clear to the workforce that there are certain types of requests that they would never ask of another employee, such as sharing their passwords or purchasing gift cards as part of a financial transaction.

“By having that established, then when employees do receive text messages or emails or some other sort of communication purporting to come from someone in a position of authority, they're not going to have that question. They're going to know right away that these are fake, and they're going to ignore them,” said Canham.

Another helpful policy is to require a second person to sign off on any major financial or data transfer before it is executed, to double the chances that someone might catch a scam.

“This can be difficult sometimes with situations or companies that are doing multiple transfers in a short amount of time, but I've actually investigated cases where $4 million have been transferred [fraudulently],” said Canham. “It may be worth taking that extra minute or two to have the second person review that transfer before it's authorized.”

Finally, organizations can require employees to confirm a requested data or money transaction through a secondary channel. So if the initial request came over the phone, the employee can log into a verified email account to confirm the order was a genuine one.

Future research in the deepfakes space may also yield further protections. For instance, Canham said he’s currently working with a Ph.D. student looking at neural signatures in humans as they watch deepfakes, to determine if, subconsciously, their brains are able to detect a fake.

“Our hope is just by looking at some of the electromagnetic signals that we may actually see that there are points where people are picking up on these, even if they're not consciously recognizing it,” said Canham, noting that the experiment will compare and contrast reactions when a test subject is looking at a deepfaked stranger and a deepfaked person they know.