Sep 18th, 2014

Android users who have grown to love their voice commands could be in for a treat: Google’s been working on some crazy improvements. Recent updates have already improved how voice search works with Google Now, tightly integrated voice commands with Android Wear, and added an always listening “Ok Google” voice command to all search screens, but perhaps the best is yet to come.

Deep inside a September 4th patent – titled “detecting the end of a user question” – we find some hidden gems that show us how Google could soon add “active watching” technology to futurize Google Search and beyond. But at what cost?

The patent starts off fairly straightforward, describing some features already available in some Android devices such as the new Moto X (2nd gen). The phone uses multiple microphones to detect the location of voices, enabling it to ignore speech input coming from unintended sources. But the patent takes an interesting turn when it mentions capturing visual indicators, and not just any visual indicators- video.

The digital capture device may be a digital video recorder, digital camera, a webcam, etc. The visual capture device may capture visuals and represent the visuals as a stream of images that may form a video.

On the surface (at least to some) this might seem innocent and run-of-the-mill: earlier this year we showed you some Galaxy S5 Tips & Tricks which include a Samsung feature called Smart Stay that keeps the screen on whenever it detects you’re looking. As depicted below, Google is suggesting combining visual indicators with audio indicators to improve Google Search functionality. It doesn’t take long for things to get a lot more interesting, though.


Google attempts to detect “deliberation” between people with audio/visual indicators, and depending on what their algorithmic statistical mojo recommends, can offer answers as if engaged in an ongoing dialogue or choose to stop actively listening altogether.

Capturing video and identifying people

This Google patent goes deeper though, not only actively watching and capturing visuals, but collecting a bunch of other information along the way.

“the visual analyzer may determine the number of people in an area represented by the visual data, the identity of the people, the vertical and horizontal angles of the heads of the people, and lip movement of the people”

Determine the identity of people? Yes… and then using audio and video together they can further:

“determine the identity of the person providing the voice input based on the lip movement of people and the acoustic characteristics of the voice”

If you thought Google only wanted to extrapolate this data to differentiate between the different speakers it was hearing, you’d be wrong… it doesn’t stop there. Not only does it watch, listen, capture information about people it successfully identifies, it then stores this data in user profiles. Directly from the patent:

“the system may analyze audio and visual data and store information in a user profile…”

Google even describes some examples of the data it may want to collect and store, further explaining the data would be used to serve content that is more relevant to the user:

  • User’s social networks
  • Social actions or activities
  • Profession
  • User preferences
  • Current location

My reaction when reading this? Awesome!

What’s your reaction? Today’s Google searches at are already able to identify this type of information and store it in your Google Account history to improve performance and features, but that won’t stop some people from feeling paranoid. This patent is open ended and far reaching; I’m sure plenty of freedom fighters have already rushed to the comments in defense of our civil liberties and protection of our privacy.

Ok Google… stop reading my lips

There are often two opposing camps in the privacy debate:

  1. If you’re not doing anything wrong you shouldn’t have to worry
  2. Private life should be private. After all, what happens if Google gets hacked, or shares my info with law enforcement, or people with access to this information abuse their privileges?

Regardless to which camp you belong, Google has made it clear in their patent that these settings will be optional, allowing the user to choose whether or not their personal information is stored:

“For situations in which the systems discussed here collect personal information about users, or may make use of personal information, the users may be provided with an opportunity to control whether programs or features collect personal information”

Notice the users “may” be provided with- it’s not necessarily guaranteed. They also mention anonymizing data so that they can still collect it in aggregate without connection to personally identifiable information.

Patents serve to protect intellectual property, not as an operating guide, so critiques in advance of a formal Google announcement should be hypothetical. Google would attempt to address privacy concerns once implementing, and if overlooked, they’d be sure to face backlash.

What type of backlash? Probably the type Microsoft faced earlier last year, leading up to the launch of the Xbox One and Kinect. The main issue? Kinect was always listening, always watching, and you couldn’t turn it off. Sound familiar?

Microsoft later backtracked on that demand and most of the Kinect privacy hysteria has subsided. Dissenters likely purchased the Playstation 4 or Nintendo Wii U instead… and the world went on.

Whether the tech from this patent sees the light of day remains to be seen. If it does, privacy will certainly be an imperative issue to discuss, and I urge you to begin that discussion in the comments below. But I’m hopeful that Google would implement it responsibly. A more interesting discussion, I believe, is what this patent could mean for the future of Android devices.

How will Google use this data?

I love Google voice commands and it’s a feature I use daily. If you don’t, you really need to try it. It’s not perfect though and can be especially irritating when you’re not the only person in the room. At the very least, Google’s hopes of improving voice commands through visual indicators is promising.

This patent could mean much more than voice search improvements and its parallels with Kinect aren’t only in the privacy department. When most people think “Android” they think “smartphone” but Google’s scope is much broader and motives more sweeping. The “detecting the end of a user question” patent may feature a mobile phone in its illustrations, but it explicitly mentions computers, web cams, and other types of video and audio equipment that can collect information from what seems like a much larger and more physically static area than is likely with your phone.

Three obvious places (beyond smartphones and laptops) where Google could awesomely employ these features to create stunning new experiences:

  • Android TV
  • Android Auto
  • Android @Home

Television is still in the stone ages, begging to be revolutionized. Google’s initial attempt – Google TV – failed quite miserably, but they’ve since announced Android TV. In its current alpha form it’s a direct competitor to products and services like Apple TV, Amazon Fire TV, and Netflix, but it could be so much more. Advanced voice operation that smartly “lives” with you and the people in your house could make the difference between a cheap set top box and truly next generation multimedia solutions. At the ground level, consider an Android TV that greatly improves upon the voice functionality already found on Xbox and PS4.

The auto industry hasn’t changed a whole lot in the past few decades, either. Google isn’t waiting two decades for their self-driving cars to become a reality, they’re launching consumer vehicles with Android built-in later this year through Android Auto. What interesting experiences, apps, and games could Google create by knowing who is in the car, where each person is sitting, when each person is talking, and what they’re each saying? That’s a challenge I’m sure developers would love to tackle.

Then we’ve got the ever-cliche, George Jetson style “home connectivity” vision. We’ve been hearing about and seeing Android appliances since 2010, but even with the advent of Android @Home, truly connected homes have made few inroads into your typical homes. Google has shown a recommitment to home connectivity after buying Nest for $3.2 Billion. Being able to communicate with your home, hands-free and with great accuracy, could be the missing link in helping the connected home emerge as the next cultural revolution.

Creepy? Awesome? Or both?

The three main takeaways from this article (and Google’s patent):

  1. This could help immediately improve Google Voice Search
  2. Extending the idea could revolutionize voice commands across many devices
  3. There will be no lack of privacy concerns

Using visual queues and pairing them with audio queues is a brilliant way to improve an already wonderful product, but is capturing video, listening to voices, watching lips, identifying real people, and correlating it with personally identifiable information going further than you want your relationship with Google to go? Let us know in the comments!

Note: the term “active watching” is not used by Google in this patent. They do, however, call the existing audio functionality “active listening”. I’m using the term “active watching” for this article as a logical extension of an already understood and well accepted concept. In reality, I’d hope Google would announce this feature using a term that seems less intrusive, such as “active aware” or “always aware” (which could include both audio and video).

local_offer    patents  USPTO