Oct 15 2018
Smart devices can sometimes come across as stupid; they don't understand what people around them are doing, or even where they are. Researchers at the Carnegie Mellon University say that this may soon change. Devices may soon possess 'environmental awareness', which could be optimized based on acoustic and sensory input.
A smart speaker sitting on a kitchen countertop cannot figure out if it is in a kitchen, let alone know what a person is doing in a kitchen. But if these devices understood what was happening around them, they could be much more helpful.
Chris Harrison, Assistant Professor, Human-Computer Interaction Institute (HCII), CMU
Harrison and his colleagues at The Future Interfaces Group are currently presenting their findings at the Association for Computing Machinery's User Interface Software and Technology Symposium in Berlin. Until the 17th of October, the team will present two of their big ideas regarding context awareness in smart devices. One makes the most of one of our most ubiquitous pieces of tech: the microphone; and another utilizes a contemporary version of eavesdropping technology employed by the KGB in the 1950s.
In the first scenario, the team has sought to build a sound-based activity recognition system, which they've termed 'Ubicoustics'. This system would use pre-existing microphones found in smartphones, smart speakers, and smartwatches, and enable them to identify sounds related to places, such as kitchens, entrances, workshops, bedrooms, and offices.
“The main idea here is to leverage the professional sound-effect libraries typically used in the entertainment industry,” said Gierad Laput, a Ph.D. student in HCII. “They are clean, properly labeled, well-segmented and diverse. Plus, we can transform and project them into hundreds of different variations, creating volumes of data perfect for training deep-learning models.”
“This system could be deployed to an existing device as a software update and work immediately,” he added.
The plug-and-play system could function in any environment. For example, it could alert the user when a person knocks on the front door, or it could move to the next step in a recipe when it picks up the sound of a blender, chopping or any other sound relating to cooking.
The scientists, Karan Ahuja, a Ph.D. student in HCII, and Mayank Goel, an assistant professor in the Institute for Software Research, started with a current model for labeling sounds and tweaked it using sound effects from the professional libraries, such as kitchen appliances, keyboards, hair dryers, power tools, and other context-specific sounds. They then synthetically changed the sounds to form numerous variations.
However, Gierad Laput admits that identifying sounds and placing them in a precise context is tough, partly because sometimes numerous sounds happen at once, and this can influence the smart devices' performance. In their tests, Ubicoustics showed an accuracy of around 80%, which is on a similar level to human accuracy, but this is still not good enough to support user applications. Improved microphones, higher sampling rates, and different model architectures all might boost accuracy with additional research.
In a separate paper Ph.D. student Yang Zhang worked with Laput and Harrison to develop the idea of 'Vibrosight', which can feasibly detect vibrations in precise locations using laser vibrometry in different spacial environments. It is similar to the light-based devices the KGB once used to sense vibrations on reflective surfaces (such as windows), enabling them to listen in on conversations.
Ubicoustics: Plug-and-Play Acoustic Activity Recognition
“The cool thing about vibration is that it is a byproduct of most human activity,” Zhang said. Pounding a hammer, running on a treadmill, or typing on a keyboard all generate vibrations that can be sensed at a distance. “The other cool thing is that vibrations are localized to a surface,” he added. In contrast to microphones, the vibrations of one activity do not interact with vibrations from another. Also, unlike cameras and microphones, monitoring vibrations in specific locations makes this method unobtrusive and preserves privacy.
This technique requires a special sensor, which is a low-power laser integrated with a motorized, steerable mirror. The scientists constructed their experimental device for just $80. Reflective tags—the same material used to make pedestrians and bikes more visible at night—were applied to the objects to be tracked. The sensor can be positioned in a corner of a room and can monitor vibrations for several objects.
Zhang stated the sensor can sense whether a device is on or off with 98% accuracy and identify the device with 92% accuracy, based on the vibration profile of the object. It can also detect movement, e.g. the vibrations of a chair when a person sits in it, and can figure out when a person has obstructed the sensor's view of a tag, such as when a person is using an eyewash station or a sink.