Cybercriminals could potentially develop malicious voice apps that turn Amazon Alexa devices and Google Home smart speakers into spy equipment that eavesdrops on users and even phishes for passwords, according to a new report.

The report, from Germany-based Security Research Labs (SRLabs), warns that security lapses in the way Google Home and Alexa devices (such as the Echo smart speaker) accept voice commands and communicate with users could allow adversaries to specifically craft “Skills” (the official term for Alexa apps) and “Actions” (the official term for Google Home apps) that turn the devices into “Smart Spies.”

Among the biggest deficiencies, according to SRLabs, is the way that Amazon and Google allow Skill and Action developers to make certain functionality changes after the voice apps have already gone through the security review and approval process, without any further scrutiny.

SRLabs also identified two so-called “intents” as additional problems. Intents are the tasks that Alexa and Google Home are routinely asked to perform by users, such as looking up the local weather forecast, for example. When a device’s virtual assistant is unable to match up user’s spoken command with a given function, it defaults to a pre-programmed “fallback intent.” This fallback intent is the first of the two exploitable intents. The other is the “stop intent,” which reacts to when users say the word “stop.”

A third problematic issue that the researchers discovered is that adversarial app developers can add unpronounceable characters to Alexa’s and Google Home’s text-to-speech engine that makes it sound like the virtual assistant is has stopped speaking and gone dormant, when in fact it remains active and is actually just taking a long pause.

By combining these various issues together, attackers can trick users into revealing their sensitive data or allowing outsiders to spy on their communications.

To get smart speaker users to give away their passwords, attackers can create for their voice apps an intent that is launched by saying the word “Start” and then treats whatever words are spoken next as variable user input (aka a slot value) that must be recorded and forwarded to the application’s back end. In other words, as soon as users say “Start,” whatever they say next is recorded and sent directly to the malicious app’s developers.

The trick, of course, is to get users to speak their passwords after saying “Start.” The attackers can do this by changing their voice app’s welcome message to a phony error message, which fools users who launch the voice app into thinking it doesn’t work. The adversaries simply wait until after Amazon and Google have reviewed and approved the app before replacing the greeting message with the fake error message.

The attackers then make the virtual assistant recite a long list of unpronounceable characters to feign silence and inactivity when in reality the smart speaker is still very much active and listening. At the end of this silence, the attackers have the device speak a phishing message asking users to install an important security update by saying “Start update,” followed by their passwords. Everything spoken after the word “Start” is saved as a slot value and sent to the attackers.

In its report, SRLabs also revealed how Alexa’s and Google Home vulnerabilities can also be leveraged to eavesdrop. This process is different for Alexa and Google Home.

To use Alexa as a spying tool, attackers could create a Skill with two intents: one that is launched by the word “stop” and another that acts like a fallback intent and is triggered by a very common word such as “the” or “I.” After Amazon’s app approval process, the adversaries change the first intent to say “goodbye,” and change the second intent to not offer any audible reaction. The attackers next use a long string of unpronounceable characters to make it appear as if the Alexa device is inactive, although it is still listening. If users talk during this time and happen to speak the common word selected to trigger the second intent, then whatever they say will be recorded and sent to the attacker.

“For Google Home devices, the hack is more powerful: There is no need to specify certain trigger words and the hacker can monitor the user’s conversations infinitely,” the report states. “This is achieved by putting the user in a loop where the device is constantly sending recognized speech to the hacker’s server while only outputting short silences in between.”

The report offers a full step-by-step description of this malicious technique, as well as a series of videos showing off demonstrations of all three hacks.

Amazon sent SC Media the following statement: “Customer trust is important to us, and we conduct security reviews as part of the Skill certification process. We quickly blocked the Skill in question and put mitigations in place to prevent and detect this type of Skill behavior and reject or take them down when identified.”

As part of these mitigations, Amazon now is preventing Skills from asking customers for their Amazon passwords. Amazon says it is not aware of any past Skills that have exhibited this or other malicious behaviors demoed by SRLabs.

Meanwhile, Google sent SC Media the following statement: “All Actions on Google are required to follow our developer policies, and we prohibit and remove any Action that violates these policies. We have review processes to detect the type of behavior described in this report, and we removed the Actions that we found from these researchers. We are putting additional mechanisms in place to prevent these issues from occurring in the future.” Google also noted that Google Home’s virtual assistant (Google Assistant) never requests account passwords from users.  

“Although it is questionable how effective such [attacks] might be, it’s a good reminder of the risks posed by filling our homes with always-on microphones and cameras,” said Craig Young, computer security researcher with Tripwire’s vulnerability and exposure research team (VERT). “The attacks detailed in this paper are effectively UI redressing attacks for voice-first user interfaces. One of the things I recommend to anyone with a Google Home product is to enable accessibility tones. With this setting enabled, the Google Home devices will always play a chime to let you know it is listening and then another chime to let you know it has stopped listening.”

SRLabs had recommendations for Amazon and Google as well: “To prevent ‘Smart Spies’ attacks, Amazon and Google need to implement better protection, starting with a more thorough review process of third-party Skills and Actions made available in their voice app stores,” the report states.

“The voice app review needs to check explicitly for copies of built-in intents,” SRLabs continues. And “unpronounceable characters… and silent SSML messages should be removed to prevent arbitrary long pauses in the speakers output.”