Toggle light / dark theme

Google AI can pick out voices in a crowd

Humans are usually good at isolating a single voice in a crowd, but computers? Not so much — just ask anyone trying to talk to a smart speaker at a house party. Google may have a surprisingly straightforward solution, however. Its researchers have developed a deep learning system that can pick out specific voices by looking at people’s faces when they’re speaking. The team trained its neural network model to recognize individual people speaking by themselves, and then created virtual “parties” (complete with background noise) to teach the AI how to isolate multiple voices into distinct audio tracks.

The results, as you can see below, are uncanny. Even when people are clearly trying to compete with each other (such as comedians Jon Dore and Rory Scovel in the Team Coco clip above), the AI can generate a clean audio track for one person just by focusing on their face. That’s true even if the person partially obscures their face with hand gestures or a microphone.

Google is currently “exploring opportunities” to use this feature in its products, but there are more than a few prime candidates. It’s potentially ideal for video chat services like Hangouts or Duo, where it could help you understand someone talking in a crowded room. It could also be helpful for speech enhancement in video recording. And there are big implications for accessibility: it could lead to camera-linked hearing aids that boost the sound of whoever’s in front of you, and more effective closed captioning. There are potential privacy issues (this could be used for public eavesdropping), but it wouldn’t be too difficult to limit the voice separation to people who’ve clearly given their consent.

Low-latency JPEG XS format is optimized for live streaming and VR

You might only know JPEG as the default image compression standard, but the group behind it has now branched out into something new: JPEG XS. JPEG XS is described as a new low-energy format designed to stream live video and VR, even over WiFi and 5G networks. It’s not a replacement for JPEG and the file sizes themselves won’t be smaller; it’s just that this new format is optimized specifically for lower latency and energy efficiency. In other words, JPEG is for downloading, but JPEG XS is more for streaming.

The new standard was introduced this week by the Joint Photographic Experts Group, which says that the aim of JPEG XS is to “stream the files instead of storing them in smartphones or other devices with limited memory.” So in addition to getting faster HD content on your large displays, the group also sees JPEG XS as a valuable format for faster stereoscopic VR streaming plus videos streamed by drones and self-driving cars.

“We are compressing less in order to better preserve quality, and we are making the process faster while using less energy,” says JPEG leader Touradj Ebrahimi in a statement. According to Ebrahimi, the JPEG XS video compression will be less severe than with JPEG photos — while JPEG photos are compressed by a factor of 10, JPEG XS is compressed by a factor of 6. The group promises a “visual lossless” quality to the images of JPEG XS.

MIT Unleashes a Hypnotic Robot Fish to Help Save the Oceans

Like a miniaturized Moby Dick, the pure-white fish wiggles slowly over the reef, ducking under corals and ascending, then descending again, up and down and all around. Its insides, though, are not flesh, but electronics. And its flexible tail flicking back and forth is not made of muscle and scales, but elastomer.

The Soft Robotic Fish, aka SoFi, is a hypnotic machine, the likes of which the sea has never seen before. In a paper published today in Science Robotics, MIT researchers detail the evolution of the world’s strangest fish, and describe how it could be a potentially powerful tool for scientists to study ocean life.

Scientists designed SoFi to solve several problems that bedevil oceanic robotics. Problem one: communication. Underwater vehicles are typically tethered to a boat because radio waves don’t do well in water. What SoFi’s inventors have opted for instead is sound.

A Robot Does the Impossible: Assembling an Ikea Chair Without Having a Meltdown

And just like that, humanity draws one step closer to the singularity, the moment when the machines grow so advanced that humans become obsolete: A robot has learned to autonomously assemble an Ikea chair without throwing anything or cursing the family dog.

Researchers report today in Science Robotics that they’ve used entirely off-the-shelf parts—two industrial robot arms with force sensors and a 3D camera—to piece together one of those Stefan Ikea chairs we all had in college before it collapsed after two months of use. From planning to execution, it only took 20 minutes, compared to the human average of a lifetime of misery. It may all seem trivial, but this is in fact a big deal for robots, which struggle mightily to manipulate objects in a world built for human hands.

To start, the researchers give the pair of robot arms some basic instructions—like those cartoony illustrations, but in code. This piece goes first into this other piece, then this other, etc. Then they place the pieces in a random pattern front of the robots, which eyeball the wood with the 3D camera. So the researchers give the robots a list of tasks, then the robots take it from there.

Artificial intelligence can scour code to find accidentally public passwords

Sometimes sensitive data, like passwords or keys that unlock encrypted communications, are accidentally left open for anybody to see. It’s happened everywhere from the Republican National Committee to Verizon, and as long as information can be public on the internet the trend isn’t going to stop.

But researchers at software infrastructure firm Pivotal have taught AI to locate this accidentally public sensitive information in a surprising way: By looking at the code as if it were a picture. Since modern artificial intelligence is arguably better than humans at identifying minute differences in images, telling the difference between a password and normal code for a computer is just like recognizing a dog from a cat.

The best way to check whether private passwords or sensitive information has been left public today is to use hand-coded rules called “regular expressions.” These rules tell a computer to find any string of characters that meets specific criteria, like length and included characters. But passwords are all different, and this method means that the security engineer has to anticipate every kind of private data they want to guard against.

Artificial intelligence is writing fairy tales now, and humanity is doomed

If it’s started to feel like all summer blockbuster movies are being written by robots [INSERT FORMER PRO WRESTLER, INSERT GIANT CGI ANIMAL], you’ll be disquieted to learn that that future may not be too far off.

The meditation app Calm teamed up with the tech team at Botnik to write a new Brothers Grimm-style fairy tale entirely through artificial intelligence. By inputting the data from existing Brothers Grimm stories and using predictive text technology (and with a few human writers stitching things together), the group at Botnik crafted “The Princess and the Fox,” a story about “a talking fox [who] helps the lowly miller’s son to rescue the beautiful princess from the fate of having to marry a dreadful prince who she does not love.”

“We’re doing for the Brothers Grimm what Jurassic Park did for dinosaurs,” says Michael Acton Smith, co-founder of Calm, in a press press release. “We’re bringing them back from the dead, with modern science.” (It perhaps bears remembering here that Jurassic Park famously did not end well.)

Discovery VR, Oculus Veterans Launch New AR/VR Studio Tomorrow Never Knows (EXCLUSIVE)

Four virtual reality (VR) veterans from Discovery Digital, Oculus Story Studio and Lightshed officially launched their new company out of stealth mode in San Francisco this week. Dubbed Tomorrow Never Knows, the new studio aims to use virtual and augmented reality as well as other emerging technologies including artificial intelligence for groundbreaking storytelling projects, said co-founder and CEO Nathan Brown in an interview with Variety this week.

“The thesis behind the company is to consistently violate the limits of storytelling, forcing the creation of new tools, methodologies and workflow and to do this intentionally so we create original creative and technology IP,” he said.

Before founding Tomorrow Never Knows, Brown co-founded Discovery VR, which has become one of the most ambitious network-backed VR outlets. Also hailing from Discovery VR is Tomorrow Never Knows co-founder Tom Lofthouse. They are joined by Gabo Arora, whose previous work as the founder of Lightshed included VR documentaries like “Clouds Over Sidra” and “Waves of Grace,” as well as Oculus Story Studio co-founder Sachka Unseld, the director of the Emmy Award-winning VR animation short “Henry” and the Emmy-nominated VR film “Dear Angelica.”

What Will the Automated City of the Future Look Like?

Many large cities (Seoul, Tokyo, Shenzhen, Singapore, Dubai, London, San Francisco) serve as test beds for autonomous vehicle trials in a competitive race to develop “self-driving” cars. Automated ports and warehouses are also increasingly automated and robotized. Testing of delivery robots and drones is gathering pace beyond the warehouse gates. Automated control systems are monitoring, regulating and optimizing traffic flows. Automated vertical farms are innovating production of food in “non-agricultural” urban areas around the world. New mobile health technologies carry promise of healthcare “beyond the hospital.” Social robots in many guises – from police officers to restaurant waiters – are appearing in urban public and commercial spaces.


Tokyo, Singapore and Dubai are becoming prototype ‘robot cities,’ as governments start to see automation as the key to urban living.

/* */