Google AI Blog: Developing Real-Time, Automatic Sign Language Detection for Video Conferencing

Ok, this is cool. A team at Google built a detector for sign language—not to translate that sign language into audio or text, but just to detect that a given set of frames was in fact sign language. Then, they do something that I wouldn't have thought of:

When the sign language detection model determines that a user is signing, it passes an ultrasonic audio tone through a virtual audio cable, which can be detected by any video conferencing application as if the signing user is “speaking.” The audio is transmitted at 20kHz, which is normally outside the hearing range for humans. Because video conferencing applications usually detect the audio “volume” as talking rather than only detecting speech, this fools the application into thinking the user is speaking.

How frustrating it must be to be on a call with a bunch of folks signing only to have the camera never focus on the correct person! I like this solution so much because it's excellent problem and intervention selection: a beautiful example of how to improve a product using ML.


Want to receive more content like this in your inbox?