Neal Swisher

Learning On-Board an Autonomous Mobile Robot

  

Summer, 2003

The goal of this project was to create a robot that was capable of learning how to respond to its environment and to coordinate its sensors and motors to display goal-oriented behaviors. Both the design and the programming of the robot were stressed throughout the project.

The robot was programmed, and allowed to move around in its environment in order to learn how to navigate towards a light source placed at one end of the arena, while avoiding the walls, and other obstacles that were placed in the arena. Artificial life generally concerns itself with simulated robots in simulated environments. This project combined artificial life and mobile robotic, so it was important that the robot physically move around in its environment, and that we not concern ourselves with simulated robots in simulated environments. To accomplish this, we had to construct a robot that had a relatively simple structure.

 

 

  

Figure 1: The robot used in our experiments

 

 

The robot’s body consists of an aluminum disk, 10” in diameter, as the base, with three metal rods placed in a triangle formation, sticking up about 5”, and oriented towards the front of the robot (see Figure 1). We mounted one photocell sensor and one infrared sensor on the two rods that are left and right of center, if you are looking at the robot head-on. The handyboard, the brains of the robot, is located directly behind the sensors and is mounted on top of the rechargeable battery pack. The handyboard features a serial port that we used to communicate between the computer and the robot.

 

 

 

 

 

Figure 2: Infrared and photocell sensors

 

 

The infrared sensors (see Figure 2) detect proximity, and the raw values are represented between 1 and 145, with higher number indicating close proximity to an object. The photocell sensors (again, see Figure 2) are light-detecting sensors, with raw values that are represented between 5 and 225, with lower number indicating a high light level, meaning that the sensor is oriented towards the light. The light source located at one end of the arena is polarized light so that the photocell sensors only detect light that travels on one plane. Light traveling on other planes is ignored, and thus a more accurate reading results The made it possible to not concern ourselves with ambient light in the room.

 

 

 

 

 

Figure 3: Bumper detail

 

 

Six wooden bumpers are positioned around the circumference of the metal base. The bumpers are attached to switches that indicate if the robot is hit. The switches are digital sensors, and return either an on or off signal to the handyboard (see Figure 3). There are two slots cut in the metal base for the wheels, which stick up slightly on the top of the robot. The two motors are located on the underside of the robot. The motor values are represented as values between -100 and 100, with 100 being full power. (Negative numbers indicate that the motor is going in reverse.) A wooden ball is placed on the underside of the robot, towards the back, to provide balance.

 

The specific design of the robot is important for our goal of the project to combine artificial life and mobile robotics. Because this robot is interacting in a real environment and not in a simulated computer environment, the sensors had to be limited to two of each type. In order to successfully maneuver the arena, the sensors have to be coordinated with the motors, and this gets increasingly difficult the more sensors are available. Therefore, only four sensors were used in total. All of the sensors face a relative forward direction, to encourage the robot to move forward in navigating its environment.

 

 

 

 

 

 

Figure 4: Neural network. Click here for a cleaner view (PDF file).

 

 

To successfully navigate its environment, the robot has to avoid obstacles placed in the arena, avoid the walls, and orient itself towards the light and eventually seek it, and all of this must occur as a result of learning. To accomplish this, we utilized a neural network (see Figure 4). Neural networks are based on the way in which our brains work. A network consists of layers of units, and connections between those layers. The units are simple processing elements that have an activation level. The collection of units and the connections between the units form a pattern of activation. The layers in a neural network resemble neurons and the connections resemble synapses. The connections are weighted, first randomly and then change according to the learning that takes place. The changing of the connection weights is what makes our robot an example of artificial life. The weights adjust themselves to better suit the robot, and thus control the behavior of the robot.

 

Neural networks, also referred to as Parallel Distributed Processing (PDP) models, are a way to understand intelligence as emerging from the interactions of a large amount of simple processing units. In a neural network, intelligence comes from all of the connection weights combined, not any single connection. Neural networks are founded on the idea that the ket to understanding our intelligence as humans is in the architechture of the brain. Therefore, if a computer program was created that mimicked the architecture of the brain with different units and layers and connections, then each small, unintelligent part of the network would combine to form the complete, intelligent network. The intelligence is in the interaction of the simple units, not the units themselves. The processing units of a neural network have activation levels that reflect the state of the network at any one time. The activation level of any one unit is calculated according the previous units in the network using he units and their connection weights.1

 

For our purposes, choosing a neural network was the best option because of the great possibilities for learning inherent in a neural network. Knowledge is stored in the connections between the units, therefore learning could be accomplished by finding the right connection weightsthat produce the pattern of activation that produces the correct behavior. In trials, the robot randomizes the connection weights, then by a process of trial and error, determines which connections between activations form a network that suits the robot’s needs.2

 

Our robot needed to complete a relatively complex task—to learn how coordinate its sensors to move appropriately in its environment. The best way to do this is via a network in which many processing elements combine to accomplish the task. With the network of connection weights, the connections can be changed and then the change can be evaluated by the robot and then determined to be beneficial or not. Making small changes in the network over time will produce connection weights that lead to better behavior. Each unit in the network has an activation level that

 

The first layer of the network is the input units, in our case the photocell and infrared sensors (the bumpers are not considered inputs because they directly influence the motors, without passing through the network). Learning works by introducing a random component into the network in the Stochastic Real Value Units (SRVU). This random element makes up one connection weight and therefore shows its influence on the rest of the network when the activation levels are calculated. Once the wheels turn, sensor readings are taken and compared to the sensor readings before the wheels turned. If the move was a “good” change, then the change is reinforced by adding or subtracting a small amount from the connection weights. At the beginning of a trial, the network is random, and this randomization is combined with the additional random element of the SRVUs. As learning progresses, the robot’s movement gets increasingly less random and more oriented to its goals of seeking the light and avoiding obstacles.

 

After the specifications of the trial are selected, the code enters into a loop that repeats until the stop button on the robot is pressed. First, initial sensor readings are taken. Then the random values for the SRVU connection weights are set. Then, starting at the first layer(the input layer), the code runs through the network. The unit irhidden1 for example, is calculated by taking ir1, multiplying it by the ir1_irsrv1 connection weight, then multiplying ir2 by the ir2_irsrv1 connection weight, then adding those two sums together. That unit is then multiplied by the irsrv1_irhidden1 connection weights (a random weight) to get the irhidden1 unit. After the hidden layer, the photocell and infrared sections of the network join to calculate the left and right motor values. It is in this way that the code runs through the network, and arrives at the final two outputs. After the motors turn, the sensor readings are taken again. If the difference in the readings reflects a good move, then the random element that was introduced into the network resulted in a good change, and that change is therefore reinforced by adding or subtracting a small amount to the connection weights. If the connection weight had a previous negative value, then the connection is made more negative by subtracting a small amount from it. If the connection weight had a previous positive value, then the connection is made more positive by adding a small amount to it. These small changed gradually lead to better behavior. The SRVU connection weights are randomized again after learning takes place(or doesn’t take place), and the cycle continues.

 

If one of the bumpers is hit, the code takes that into account when calculating the motor values. If the front bumper is pressed, then both motors are forced to turn backwards. If the back bumper is pressed, both motors are forced to turn forwards. If one of the side bumpers is pressed, then the motor values reflect this, and the motors turn in a way that avoids the obstacle. Once a bumper is pressed, it stays pressed for a short interval afterwards. During this interval, no learning can take place. This is to prevent the robot from reinforcing behavior in which it ran into something. The robot can not learn, for example, to avoid a wall because a bumper forced it to back up. Reinforcing this behavior would actually reinforce the behavior that led it to run into the wall, not the backing up after the collision. Therefore, learning is forced off for a short period of time after a bumper is hit, enough time for the wheels to turn away from the obstacle, and for the robot to continue it movement.

 

In each cycle of the code, the SRVU is changed randomly, according to our specified range of random numbers (we started it between .9 and 1.1, but later made the range larger). The robot takes the raw input values of the sensors, runs calculations on them according to the weights of the connections, including the random weights, and then turns the motors accordingly. If the movement of the robot is beneficial, then the random change that was made in the SRVU connection weight is made a permanent change by adding to or subtracting the “Learning Rate” from the other connection weights. This type of learning uses positive reinforcement—if the robot does something “good,” it is rewarded. If it does something bad, nothing happens. The robot does not learn from its bad behavior, only its good behavior. Over time, the robot should learn to exhibit only good behavior.

 

The use of “persistent variables” in the code allowed us to turn the robot off without losing the data of the connection weights. This allowed us to have extended trials during which we could turn the robot off and back on again without losing any data, and it gave us the ability to record the values of the connection weights for better observation of the learning that was (or was not) taking place. The change to persistent variables for the connection weights gave us data that was more meaningful because of our ability to follow its changes it, and made the trials of the robot more meaningful because they no longer had to be limited to the duration of the batteries.

 

When the robot is first switched to “on,” several questions appear on the screen that determine the specification of the trial. Infrared and Photocell learning is separate in the neural network, and thus one can occur without the other. The first set of questions deal with the distinction between these two types of learning. Infrared and Photocell learning can both be on. Infrared learning can be on without the additional Photocell learning; likewise, Photocell learning can be on without Infrared learning. We can run a trial in which Infrared learning is off, but the Infrared sensors still affect the motors, and a trial in which Photocell learning is off, but the Photocell sensors still affect the motors, or any combination of the two. We can also run a trial in which no learning takes place, and the robot behaves according to its static connection weights. This is useful to determine if learning is resulting in appropriate behavior. The last question is where we specify if we want a random network, or we want to keep the same connection weights as the previous trial, using the persistent variables.

 

These questions in the code allow us to control the trial of the robot to our exact specifications. One of the challenges of running trials with the robot is that it is difficult to causally link the behavior of the robot with the sensor readings and the connection weights. There is always a random element in the robot’s behavior, and sometimes it is difficult to observe the magnitude of the randomness. When we can specify exactly what is controlling the behavior of the robot, and which sensors are in use, we can more accurately observe if appropriate learning is taking place. A beep was placed in the code every time the robot ran through one learning cycle, so we could tell simply by listening if the robot utilized the learning portion of its code on any given cycle.

 

A normal trial lasted approximately 25 minutes, after which data was collected. The trial was also stopped at various times to collect data. To determine the success of the trial, the change of the connection weights throughout the trial was analyzed and combined with observation of the robot’s behavior. In a neural network, it is the combination of all of the connection weights that produce the behavior of the robot. Therefore, it is impossible to look at individual connection weights and identify them as good connection weights. Rather, the change in the connection weights over time must be analyzed, combined with the observable behavior of the robot.

 

The ultimate goal of this project is to get to a point where the robot learns how to seek the light and avoid obstacles. Besides controlling the trials to our exact specifications, it was important that the code be designed to give us the most meaningful trials possible. The code includes various types of variables that we changed to make learning easier and more productive. One such variable is the maximum change for the SRVUs. Originally, we set the SRVUs to be a random number between .9 and 1.1. This range turned out to be too small, and was later increased to include some negative numbers as low as -.1 and as high as 2.2. This range gave us a large variation in the random component of the code, and allowed to robot to move randomly enough to accidentally exhibit good behavior, and thus learn. The Learning Rate was a variable we specified to control by what margin the robot learned. The Learning Rate was the variable that was added to or subtracted from the connection weights every time the robot learned. This value we set at .1. We did not want this value to be too high or too low because we wanted the robot to learn gradually, but not too slowly. The Learning Interval was the variable that specified the amount of time between changing the motors and taking new sensor readings. The motors would turn for the duration of theLearning Interval, after which new sensor readings would be taken, and compared with the previous sensor readings to see if the behavior was beneficial. This value we set at .3. As in the Learning Rate, we had to balance this variable between too slow and too quick. A slow Learning Interval would mean the robot would miss meaningful sensor readings because it was still turning its motors. A quick Learning Interval would not give the motors enough time to turn, therefore making the new sensor readings not as meaningful.

 

Finding a balance between movements that were too quick and ones that were too slow was a consistent problem throughout the project. This problem led us to place a threshold on the motor values in an attempt to slow down the movement of the robot. Often, because of the initial randomization of the neural network, the robot would spin, with one motor on full speed forwards, and the other motor on full speed backwards. With the robot spinning so quickly in place, the sensor readings were not being translated into behavior that led the robot to escape the spinning. Since the neural network is often randomized at the beginning of every trial, the spinning problem was as frequent as it was frustrating. After thresholding the motor values between -60 and 60, the robot slowed down and the sensor readings were more meaningful and allowed the robot to escape the spinning.

 

Another consistent problem occurred when the robot did not move fast enough. Also as a result of randomization, the connection weights were such that the wheels of the robot did not run very much, if at all. This was unacceptable because if the robot never moved, it never had a chance to experience positive reinforcement and therefore learning never occurs. To combat this problem, we included a portion of code that increased or decreased the connection weights by a small margin in order to discourage the robot from sitting still. If the program ran through one cycle and no learning took place, we assumed the robot needed “help,” and help flags were activated and the code took effect. This achieved the desired effect, and in time the robot would begin moving. However, we discovered that by increasing the range of the SRVUs by making the SRVU Maximum Change variable higher, we could produce the same effect in the behavior of the robot. We chose the latter approach over the former because we achieved the same result without explicitly helping the robot. The robot made the necessary changed itself, through the SRVUs, rather than by explicit increasing of the connection weights.

 

We included the “help” code and later the increased SRVU range to help the robot when no learning was taking place, but there were also times when the robot was learning too much, and was being reinforced for its behavior at inappropriate times. The fix for this problem came in the form of thresholds on the change in PC and IR sensor readings that had to occur for learning to take place. For IR learning to take place, only a change of 2 or greater in the raw sensor readings would result in learning. Also, the sum of the IR sensors had to be greater than 55 in order to learn. This ensured the fact that IR learning would only occur when it was actually close to an obstacle, not when it was merely in the middle of the arena. Placing the threshold at 2 was a good value because it ensured that a large enough change occurred to warrant learning. Without the threshold, random variations in the readings of the sensors can result in learning, when learning is not appropriate. The threshold on the Photocells was set at 1, in order to guard against random fluctuations in the photocells.

 

All of these changes were the result of observing the robot and analyzing data. Observing the robot was made easier by isolating which parts of the code and which parts of the robot were active. Analyzing data, however, remained a difficulty throughout the project. Physically, it was tedious to collect the data from the robot. The robot had to be connected to the computer, and each individual variable had to be typed in and recorded into a spread sheet. To make this process more effective, we attempted to implement a radio link to automatically send data back to the computer while the robot was running instead of ending a trial and collecting it manually.

 

The code written for the radio link worked in theory, but the radio connection itself was plagued by technical limitations that were impossible to overcome. The radio link concept was abandoned for manual collection of data.

 

©2009 Franklin & Marshall College  |  Lancaster, PA  |  717-291-3911