SegMan technical notes on driving

by Rob St. Amant

This document gives a very brief introduction to the driving code in the SegMan perceptual substrate. There are several issues that a visual system for driving must address: steering, speed control, and the integration of these activities.

1. Steering (directional control)
2. Speed control
3. Integrating directional and speed control
4. Running the driving simulation
5. An index of functions

The visual environment in which the driving code operates is shown below. The simulated car is approximately straddling the center line. Visual information is extracted from this scene by a call to attend-road. All function calls in the driving simulation are in the vision package. See the instructions for running the simulation and the index for a discussion of coding issues.

1. Steering

The fragment of code below first computes the current center of the field of vision, offset by some amount to account for the driver's seat being on the left side of the car. It then computes the goal location in the visual field, which is center of the lane. These two location should be the same, but are usually not. This code moves moves the mouse pointer from the current location to the goal location. Notice that in order to reach the goal location, the driver must steer in the opposite direction of the movement of the mouse.

(let* ((scan-line (nominal-scan-line))
       (horizontal-offset (* *pixels-per-degree* 4))
       (current-x (point-x (far-focus)))
       (goal-x (+ (field-vertical-center) horizontal-offset)))
  (attend-road)
  (driver-move-mouse-to (make-point current-x scan-line))
  (driver-move-mouse-to (make-point goal-x scan-line))
  (print (list current-x goal-x (- goal-x current-x))))

The code for determining the center of the lane can break down if the road is curving off too fast in one direction or another. In the image below, the left side of the road is no longer visible, suggesting that the driver needs to steer hard left to stay on the road.

The system can detect this in a few different ways. One is to use a function that computes the average slope of the road edge at some point. If this exceeds some value in the positive or negative direction, then that edge of the road has left the visual field (because the right and left edges of the visual field are vertical.) The fragment of code below shows this approach. Another possibility is to use the points at which the road starts and ends, with the functions road-edge-right-start, road-edge-right-end, and the left equivalents, computing slopes from these. This latter approach is not equivalent when the road edges are too curved, however.

(let ((scan-line (nominal-scan-line)))
  (cond ((> (road-edge-left-slope scan-line) most-positive-fixnum)
	 (print "Can't see the left edge of the road"))
	((> (road-edge-right-slope scan-line) most-positive-fixnum)
	 (print "Can't see the right edge of the road"))))

2. Speed control

Controlling speed is slightly more complex than steering. Deciding which direction to steer can be done by looking at a snapshot of the current environment. Braking and acceleration, on the other hand, depend on how the visual environment is changing. In the images below, taken within some seconds of one another, the curvature of the road has changed and the simulated car has moved further into the lane of oncoming traffic.

Ideally we'd have some sophisticated functionality for computing optical flow, but this is beyond our abilities at this point. Instead, we detect changes by recording the locations of specific points in the visual field, and measuring the distance they move from one snapshot to the next. The code below shows how this can be done. It is straightforward to define functions that abstract away the recording of values over time, but I've made it explicit here because it presumably needs to be explicit in the memory of a cognitive model. We have a loop here so that new values can be generated from successive calls to attend-road.

(loop with last-left = nil
    with last-right = nil
    with last-scan-line = nil
    repeat 10
    do (attend-road)
       (let ((flow (if (and last-left last-right last-scan-line)
		       (let ((left (point-x (road-edge-left last-scan-line)))
			     (right (point-x (road-edge-right last-scan-line))))
			 (prog1
			     (print (+ (abs (- left last-left))
				       (abs (- right last-right))))
			   (setf last-left left
				 last-right right
				 last-scan-line (nominal-scan-line))))
		     (let ((scan-line (nominal-scan-line)))
		       (prog1
			   (print 0)
			 (setf last-left (point-x (road-edge-left scan-line))
			       last-right (point-x (road-edge-right scan-line))
			       last-scan-line scan-line))))))))

3. Integration

One of the interesting difficulties we face in in the integration of the control of direction and speed arises from our simulation of the process in discrete time and the way that optical flow is handled. Suppose that at time t the model analyzes the road, records the data for estimating flow, and determines that steering one direction or another is appropriate. At time t+1 some steering command is issued, and the simulated car moves in that direction. At time t+1 or later the road is again analyzed so that flow can be computed, but at this point the action of the model have made some contribution to the changes in the visual field, independent of changes that would have occurred otherwise. This contribution needs to be accounted for, or the car might end up braking every time it steers.

4. Running the driving simulation

Running the driving code is straightforward, but a few preliminary steps are necessary. First, for reasons of ease of development, the system processes a small visual field in a fixed location on the screen. The top left corner of the simulation scene (i.e. not the browser window, but the actual road image) must be at a specific location, currently (300, 315). The entire image must be visible. To reach this arrangement, I usually put an Emacs or Lisp window above the browser window with the driving simlation, and place the cursor at the top left corner of the image. I can then iteratively call the function (get-cursor) to return the cursor position, and repeatedly (painfully) nudge the position of the window until it matches the location (300, 315). We'll fix this problem shortly.

Once the window is in place, and (attend-road) has been called, the code fragments above for retrieving steering information and speed information can be evaluated.

5. An index of functions

The functions accessible to the model are at two different levels. The first set gives access to data computed by the C++ code; these are the equivalent of primitive visual operators in a model of visual processing. They're not entirely compatible with real vision, but they're reasonable.

Another, smaller set of functions combines these primitives into an approximation of visual routines (the intention is that these routines should operate as visual routines in the intermediate vision sense.) The values that some of these functions return are still under discussion and development; for example, should everything be computed in degrees or in pixels?