Skip to main content

Robot Learns to Cook by Watching YouTube Videos, Just Like Us

By watching YouTube videos about cooking, I mean.


Unlike human cooks, when robots break a glass they never accidentally cut themselves and have to go get stitches, because they’re made of indestructible titanium frames that they would never use for evil.

But like many human cooks, learning to cook from YouTube videos is more of an aspiration rather than an accomplishment. University of Maryland graduate students Yezhou Yang, Yi Li, Cornelia Fermüller, and Yiannis Aloimonos have successfully programmed a robot with the ability to watch YouTube videos of people cooking, and then mimic the actions that their hands performed on screen with the corresponding kitchen implements. Specifically, their robot first attempted to determine what kind of grip the hands on the screen were using, and then identify the sort of implement being manipulated. Then it chose its own matching implement and attempted to mimic the action.

In their tests, the robot managed to pick the right grasp 90% of the time, and the right implement and action 80% of the time. This is pretty impressive, considering that the researchers just showed it random videos, not a curated playlist chosen for visual clarity or other factors that would play better to the robot’s programming.

So we made a robot that’s kind of good at a very specifically constrained version of Simon Says, and it still sometimes mistakes tofu for a plastic bowl. Why is this a big deal? Well, like most evolving intelligences, it’s all about the baby steps. From Yang, et al.’s abstract:

In order to advance action generation and creation in robots beyond simple learned schemas we need computational tools that allow us to automatically interpret and represent human actions. This paper presents a system that learns manipulation action plans by processing unconstrained videos from the World Wide Web. Its goal is to robustly generate the sequence of atomic actions of seen longer actions in video in order to acquire knowledge for robots.

If robots are going to respond to their environment in ways that are easy for humans to understand and interact with, they’re gonna have to have programs in place that allow them to interpret and “learn” from human behavior, starting with visually interpreting it well enough to mimic it. And then maybe some day we’ll have a whole new class of beings that we can offensively tell to get back in the kitchen. In the meantime: teaching a robot to put on makeup by watching YouTube videos. Then, maybe, we can tackle nail art tutorials or ASMR videos.

(via Discover Magazine)

Are you following The Mary Sue on Twitter, Facebook, Tumblr, Pinterest, & Google +?

Have a tip we should know? [email protected]

Filed Under:

Follow The Mary Sue:

Susana Polo thought she'd get her Creative Writing degree from Oberlin, work a crap job, and fake it until she made it into comics. Instead she stumbled into a great job: founding and running this very website (she's Editor at Large now, very fancy). She's spoken at events like Geek Girl Con, New York Comic Con, and Comic Book City Con, wants to get a Batwoman tattoo and write a graphic novel, and one of her canine teeth is in backwards.