“If I plunk you down in front of the Atari 2600, and I say, ‘Here you go! Here’s a game you’ve never played,’ what do you do?”
This is Associate Professor of Computer Science Erik Talvitie, who’s coaching me through his research in the field of machine learning—a rapidly advancing sector of artificial intelligence—on the phone.
Lucky for him, I’m a child of the mid-1980s and early 1990s, so I’ve played my fair share of video games—from Nintendo cornerstones like Super Mario Brothers to arcade games such as PacMan and Teenage Mutant Ninja Turtles. I’m terrible at all of them.
“Usually, I find out how quickly I can make a mistake, which is pretty fast,” I quip.
Talvitie is thorough and patient—a trait his students fondly emphasize—but I’m not helping him make the point he wants to make.
“Well, how do you know what a mistake is?” he challenges. “What is literally the first thing that you do?”
He’s a good teacher, and the jokester in me relents—sort of. “You push a button,” I say. “Or if you’re me, you wait too long and something bad happens.”
“But not pressing a button is also a decision, right?” confirms Talvitie. “So, you start making decisions about what to do, and you start seeing how the screen changes, and you start making predictions about what’s going to change and what the consequences of your actions are going to be,” he explains.
“Based on those predictions, you start to think about the future and say, ‘All right. I need to move this paddle over so it catches the ball, so I don’t lose points.’ I would call that model-based reinforcement learning.”
This summer, along with his two Hackman Scholars, Daniel Foley ’17 and Zhiru Zhu ’17, Talvitie is trying to figure out if he can design a piece of code that will help computers learn to anticipate what’s coming in the next frame of an Atari game like Pong or Space Invaders—and make a decision that will score points.
That’s right—the model of the video game Talvitie’s encouraging me to build in my mind’s eye can, after a fashion, be reproduced in code.
Unlike humans, though, computer programs have trouble learning how to make decisions based on models. If the model is even slightly wrong, the program makes terrible decisions. It turns out that computers make better decisions when they don’t use a model to make predictions, which Talvitie calls “model-free” reinforcement learning.
“In model-free reinforcement learning, you don’t actually attempt to figure out what’s going to happen in the world,” says Talvitie. “You only attempt to figure out how good different decisions are in different situations. There’s actually very good evidence that humans and other animals use both model-based and model-free reasoning.”
As an example of model-free reasoning, Talvitie describes something that happens to everyone: the moment you take a wrong turn on a familiar route because you’re thinking about heading to work or school, rather than a new, unfamiliar destination.
“You weren’t really thinking about what would happen if you turned left,” he says. “And then you turned left, and your model-based reasoning kind of caught up with you, and you’re like, ‘Oh, no, wait! This is not the consequence that I wanted to create.’”
When a computer program is given the loose guidance of model-free reinforcement learning, it tends to be more successful at navigating unknown worlds—something that Talvitie says is counterintuitive. “The project is in large part trying to untangle this mystery,” he says of his work with the Atari 2600.
“We do not make perfect predictions about the world, and somehow we still get around,” Talvitie says of human reasoning. “We still function. So what the project is really about is asking the question, ‘Let’s just take it as given that [computers will] learn an incorrect model. It’s going to be wrong. It’s going to have errors. How can we get ourselves into a situation where it’s wrong in the right ways, that it’s still useful for making decisions?’”
While plenty of research advances have been made in model-free reinforcement learning, Talvitie is committed to understanding how model-based reinforcement learning might be more successful—and how these two forms of reinforcement learning might work together to help computers make informed predictions.
The National Science Foundation recently awarded Talvitie $500,000 to pursue this line of inquiry. And while Talvitie’s research begins with a hypothesis he’s testing on an Atari 2600, his findings may just have wide-reaching implications for the computer programs that will come to define our future.
Over the past five years, major tech companies like Google, Facebook and Apple have all invested incredible sums in machine learning. In 2014, according to WIRED, “Google shelled out an estimated $400 million for a little-known artificial intelligence company called DeepMind.” Based in London, DeepMind is at the forefront of “deep learning,” a specific area within machine learning. “Deep learning” uses sophisticated computer software called “neural networks,” which deliberately emulate the inner workings of the human brain.
Last February, DeepMind published a paper in Nature that describes its success with the Atari problem using model-free deep learning—a success that prompted Talvitie to pose some of the questions that now guide his own research.
“This past year, my student Yitao [Liang ’16] and I, along with our collaborators at the University of Alberta, published a paper in which we achieved very similar performance in the Atari problem, but with a far, far simpler, far more computationally practical approach,” Talvitie says.
Still, the failures of model-based learning in the Atari problem perplex Talvitie, which is why he’s tackling the problem again this summer.
To the untrained eye, it might not seem like a big deal that a computer can become an Atari whiz over the course of a few months. We’ve even gotten used to the idea of computers winning against human chess players, such as the famed 1997 match between the supercomputer Deep Blue and grandmaster chess player Garry Kasparov.
But, according to Talvitie, what he’s attempting to do with the Atari problem is completely different.
“Computer chess programs know the rules of chess. But in the problem we’re facing, that’s not true. The [computer] is starting a game that it doesn’t know about,” Talvitie explains. “Just imagine a 4-year-old sitting down to play their first Atari game—that is what it’s like. There’s no context. Here’s the joystick, points are good. Go. And only through experience does the thing start to learn how to manipulate the world.”
More accurate machine learning algorithms will impact some of the online tools we use most often, including our inboxes, Facebook accounts and smartphones. Think about how often Google’s Gmail flags messages as SPAM, or how Facebook uses facial recognition to suggest tagging friends in your vacation photos. Each of these tasks is powered by machine learning algorithms.
If computer programs can be trained to make more effective decisions, newer technologies will be even more customizable. For instance, a driverless car could account for differences in driving habits throughout the country rather than stick to the strict rules guiding its software, or a personal assistant on your phone—such as Apple’s Siri—could get far more personal.
“Siri can remind you of things on your calendar, but Siri isn’t learning about you and what you need, the way an actual human assistant would,” says Talvitie. “It’s pretty easy for me to imagine that kind of service would become much more effective if we could get to a point where these things could adapt what they’re doing based on feedback.”
As useful as the future applications of Talvitie’s research might be, training a computer takes a lot of time and processing power. In order to get enough data to evaluate the first phase of his research, says Talvitie, the team ran 24 trials of 50 Atari games.
“For each trial we trained [the computer program] for 200 million frames,” Talvitie explains as he crunches numbers in his office. He whips through the math, murmuring steps out loud. After a minute, he responds.
“If you were to sit down at an actual Atari,” he says, “it would take about 127 years to get all that data.”
Because their computers run faster than real time, Talvitie and Liang did it in a matter of weeks.
The Pattern-Analysis Problem
In the age of big data, training computers to sift through massive amounts of information and quickly attach a meaningful label or classification is a huge boon. It’s what makes our automated SPAM folders and Google’s search engines so powerful. But as the amount of information that’s catalogued and indexed continues to increase, complex sorting and classification tasks would simply take too long for a human to accomplish. That’s why machine learning continues to rule so much of our lives online. No human on Earth could possibly watch and classify billions of YouTube videos.
As it happens, machine-learning algorithms also can help us understand more about how the human brain works—a problem that’s taken scientists centuries to unravel. According to Michael Anderson, a cognitive scientist and chair of F&M’s scientific and philosophical studies of mind program, machine learning makes a major impact on his ability to study the inner workings of the brain. Trained in both philosophy and computer science, Anderson depends on classification algorithms to help him look for patterns in fMRI scans—images of the brain snapped in the moment of accomplishing a task. (For more on Anderson’s research, see the Winter 2015 issue of Franklin & Marshall Magazine at fandm.edu/magazine).
To write his book, “After Phrenology: Neural Reuse and the Interactive Brain,” Anderson—whose research is upending old models of how the brain works—needed to sort through more than 4,000 fMRI scans to look for specific patterns of brain activity. While many of us have grown up with the notion that each area of the brain is highly specialized, Anderson has unearthed data that suggests this model isn’t quite right.
“In the classic picture of brain organization, each part is highly specialized,” explained Anderson. “The back of the brain is supposed to be all about vision, your left medial temporal lobe is supposed to be all about language, right frontal areas are about attention and executive control. And when you look at neuroimaging studies, like fMRI, one study at a time, that’s pretty much how it looks.”
But when you look at thousands of fMRI images at once, as Anderson did, the picture’s quite different. “We can show that for each piece of the brain, it’s active across multiple different kinds of tasks and multiple different kinds of contexts. Each part has a unique fingerprint, but it’s nothing like the super-specialized functional attribution that we might have expected 10 or 12 or 20 years ago,” he added.
Without a machine-learning algorithm to help sort through this complex data, Anderson would have fewer pieces of evidence in an already complicated puzzle. “What you really have to do is to look for complex patterns of activation, and it’s the complex patterns of activation that will tell you what’s going on in an individual brain,” said Anderson. “And so this is an application for machine learning. It’s essentially a classification, or pattern-analysis problem.”
But it’s a problem that happens on a massive scale. Instead of three-dimensional space, Anderson encourages us to think of a 20,000-dimensional space—the complicated network of neurons and synapses that makes up the human brain. When your research involves thousands of images, complex pattern recognition, and high-dimensional space, strong computing power is a must—which is why machine-learning algorithms play such an important role in Anderson’s research.
Before Anderson started using machine learning to collect data for his neurological research, some of his questions about artificial intelligence overlapped with Talvitie’s. Anderson noted that autonomous systems—such as robots—didn’t function well when their model of the world failed to match up with the real world. And, like Talvitie, Anderson wanted to figure out a way around this problem.
Anderson’s solution was to design “a kind of metacognition...a kind of self-monitoring” system. “It would actually recognize when things weren’t going the way that it expected things to go, and then it would make adjustments based on that. Humans do that quite naturally, but AI systems didn’t,” said Anderson.
Beyond the Code
Both Anderson and Talvitie are working at the top of their fields, conducting complex, interdisciplinary research into problems that will define how we understand the way the world works. In fact, each of their projects sounds like it could be incubating in the lab of a major research university, rather than a small, liberal arts college in Lancaster.
According to Daniel Foley ’17, that’s part of what makes his time in the lab with Talvitie all the more meaningful, especially as he prepares to enter graduate school or a tough job market next year.
“The idea of doing research is an entirely new concept to me. I’m used to being in a classroom setting, where there’s a specific problem, and I’m able to solve it,” he says. He’s a first-generation college student and excited about the prospect of potentially going to graduate school—an option he’d never considered before his time at F&M and in the lab.
As for Talvitie? He’d be hard-pressed to exchange a fancy lab and major funding from Google for helping his students find their way in computer science. “When a student clicks, when they see the beauty of a problem, or when they have a realization, and I feel like I was a part of that—that’s really gratifying to me. I wouldn’t want to give that up.”
The computer science major has grown exponentially in recent years. This year, the number of graduates majoring in computer science is expected to more than triple.
Talvitie came to F&M in 2010. “At the time we were offering two sections of computer science, they had 20 seats apiece, and they were each about half full. We had maybe 20 students total,” the professor says of the fledgling department, which officially started offering a major in 2012. “This past year we offered four sections of Computer Science I, and they were all full with a wait list.”
He attributes student interest to an increase in computational thinking across disciplines—professors are encouraging students to take the course—as well as student satisfaction in solving a problem and building something tangible.
This is certainly what caught the interest of both Foley and Zhu, who are warming up to the idea of long-term research, too.
“Research is not always about outcomes; it’s about experience,” said Zhu, who is applying to doctoral programs this fall. “I like discovering new ideas, even though that idea might not be working,” she said. “I like the idea of wandering in the wild.
Human vs. Machine
- Current computer players usually do better than human players in games where precision of movement is the biggest challenge. For example, state-of-the-art computer players rarely lose a point in Pong after enough experience.
- Computer players often struggle when they have to perform long, specific sequences of actions before any points are gained or lost. This makes it difficult to discover how to get points at all, let alone how to get them consistently. Games such as Pitfall and Montezuma's Revenge require the navigation of obstacles before obtaining any points.
- Pursuing long-range goals also poses problems for computer players. In Ms. Pac-Man, computers are often too eager to eat the power pellets because they give large point bonuses, while human players wait until they need to use them to avoid losing a life. As a result, computer players tend to get their points quickly and then die, while humans go for a more effective “slow-burn” strategy.
Atari Fun Facts
- Atari 2600 games are not random. If you repeatedly unplug and plug in the machine and perform the same particular series of joystick movements, the exact same things will happen in the game every time.
- People generally think that in Pong, the joystick moves the paddle up or down; but the movement of the paddle actually depends on the last 16 positions of the joystick.
- You can do well in some games by using counterintuitive strategies. For example, you succeed in Centipede by holding the joystick to the right with the button pressed.