SegMan technical notes

by Mark O. Riedl and Rob St. Amant

In this document you should be able to find enough information to use and to build on the SegMan perceptual substrate. The system described in this document can be downloaded from this URL: http://www.csc.ncsu.edu/faculty/stamant/code/segman-03-10-14.zip.


Table of contents

1. Overview
2. Architecture
3. Segmentation fundamentals
    3.1. Pixel neighbors
    3.2. Pixel patterns
    3.3. Pixel pattern definition
    3.4. A pixel pattern definition utility
4. Feature recognition
5. Memory management
6. Description of files
    6.1. segman.dll and the *.cpp files in src/
    6.2. foreign-interface.lisp
    6.3. wrappers.lisp
    6.4. segmentation.lisp
    6.5. segman.lisp
    6.6. state.lisp


1. Overview

SegMan is a perceptual substrate that uses computational vision to "see" the Microsoft Windows graphical direct-manipulation interface. SegMan enables other programs to be able to see the graphical interface screen as a human would see it. This enables programs to iteract with Microsoft Windows as if it were a user sitting at the console instead of relying on low-level APIs. With SegMan we can create and test more realistic cognitive models of direct-manipulation interface usage, build AI agents that can reason about and use the graphical interface, and write scripts and programs that learn and perform routine tasks in the graphical interface.

SegMan is a substrate because it is a layer of functionality that sits just above the level of the operating system and provides hook functions that other programs can use to perceive and manipulate the graphical user interface. SegMan itself does not perform any functionality except segmenting the screen into well-understood features and widgets that other programs and scripts can utilize for their own ends.

The computational vision routines that SegMan use are fairly rudimentary -- coming no where close to the sophistication of human vision. However, the Microsoft Windows graphical user interface is highly rectilinear and highly standardized so we can use short-cuts for detecting features and widgets on the screen. Section 3 goes into more detail on how SegMan segments the screen into useful visual components.


2. Architecture

The architecture is a layered architecture. On the bottommost level lies the operating system. For the current version of SegMan, the only operating system supported is Microsoft Windows; SegMan's feature detection routines are geared specifically for recognizing screen widgets that are defined by the Microsoft Windows look-and-feel.

Segman.dll is a dynamic-link library that is loaded into memory by SegMan during load-time. Segman.dll is a platform specific piece of C++ code that is able to capture the Windows screen and break it into groups of contiguous like-colored pixels. These groups are called pixel-groups. At capture time, the pixel-groups are not recognized or sorted in any fashion.

SegMan, a collection of lisp routines, accesses the DLL and retrieves the pixel-groups from the DLL's memory. Pixel-groups are subjected to predicates that identify their shapes and are categorized. SegMan determines the state of the Windows screen as a list of all pixel-groups and symbolic references for what they might look like and what they might be used for. More complicated routines can be run on the screen-state to identify increasingly more complicated features such as windows, buttons, and text.

Above this is a functional substrate. Programs and scripts can be written that access the data structures and functions representing the screen to solve problems. Scripting programs that use the graphical direct-manipulation interface can be accomplished at the time of document's creation.

Just above this is a state-oriented substrate. The state-level representation is intended to abstract away the procedures for identifying specific types of objects. The relevant functions are designed to generate information for specific states and to create transitions between states. Programs can be constructed based either on the functional or the state-based layer, at the discretion of the designer. State functions are described in Section 6.6.

Finally an additional, optional level, is being built on top of the substrate that provides a collection of common interface functions that can be used by planners and cognitive models. Planners and cognitive models require a certain amount of robustness, consistency, and predictability of the screen if they are to operate effectively. The controller interface will provide robustness, consistency, and predictability that might not otherwise exist in the direct-manipulation graphical interface. The controller interface is not yet complete.


3. Segmentation fundamentals

SegMan uses simple computational vision routines to pick out features of interest in the Microsoft Windows graphical interface. The basic architecture of the SegMan system has a dynamic-like library (DLL) which is able to capture the screen as a bitmap and then process the bitmap into lists of pixel-groups. A pixel-group is a region of the screen where all pixels are like-colored. All pixels on the screen are assigned to pixel-groups. Pixel-groups are non-overlapping.

Group 1 is a pixel-group comprised of the pixels in the letter 'F'. Group 2 is the pixel-group comprised of the pixels in the dot. Group 3 is the pixel-group comprised of the pixels in the stem of the 'i'. Group 4 is the pixel-group consisting of the backgroup pixels.

Pixel-groups can then be examined for specific shapes and for relationships between shapes. For shapes that consist of a single pixel-group, such as the letter 'F', recognition is simple. One could either look at the arrangement of pixels within the group, or one could look at the pixel-neighbor numbers.

3.1. Pixel neighbors

Each pixel-group has an array of pixel-neighbor numbers associated with it. The pixel-neighbor numbers are encodings of the relationships between pixels within the group. Each pixel in a group has 0-8 neighbors. Looking at an individual pixel in a pixel-group, there are eight possible positions that a neighbor can be in: west, southwest, south, southeast, east, northeast, north, and northwest. We assign a numerical value to each neighbor position, respectively. Thus the west position is assigned to "0" and the northwest position is assigned to "7".

Each of these numbers corresponds to a bit in a single integer. Thus if a pixel is the top-right corner of a box, it has neighbors to the south (position 2), southeast (position 3), and to the east (position 4). The pixel-neighbor value of that top-right corner pixel is 2^2 + 2^3 +2^4 = 28.

For an entire pixel-group, we add up all the member pixels with pixel-neighbor values of 0, 1, 2, ..., 255. For example, a pixel-group representing a 5-by-5 solid box, the pixel-neighbor numbers will look like:

pixel_neighbors[0] = 0
pixel_neighbors[1] = 0
pixel_neighbors[2] = 0
pixel_neighbors[3] = 0
pixel_neighbors[4] = 0
pixel_neighbors[5] = 0
pixel_neighbors[6] = 0
pixel_neighbors[7] = 1
....
pixel_neighbors[28] = 1
pixel_neighbors[29] = 0
pixel_neighbors[30] = 0
pixel_neighbors[31] = 3
....
pixel_neighbors[112] = 1
....
pixel_neighbors[124] = 3
....
pixel_neighbors[193] = 1
....
pixel_neighbors[199] = 3
....
pixel_neighbors[241] = 3
....
pixel_neighbors[255] = 9

The four corner pixels generate unique pixel-neighbor values (7, 28, 112, 193). There are three pixels on each edge if you exclude the corners (31, 124, 199, 241). There are nine pixels in the center that are completely surrounded by neighbors (255).

We can detect a single-colored box by looking for pixel-groups with the following combination of pixel-neighbor numbers:

pixel_neighbors[7] == 1 
AND 
pixel_neighbors[28] == 1 
AND 
pixel_neighbors[112] == 1 
AND 
pixel_neighbors[193] == 1 
AND 
pixel_neighbors[31] > 1 
AND 
pixel_neighbors[124] > 1 
AND 
pixel_neighbors[199] > 1 
AND 
pixel_neighbors[241] > 1

3.2. Pixel patterns

Conjunctive tests such as the one described above for finding a box were used in the original C++ code for SegMan and were then ported to the Lisp side for flexibility. Eventually we developed a declarative form in which pixel patterns could be specified. The form allows not only neighboring values to be specified, but also other properties of a group:

  1. count is the number of pixels in the group;
  2. size is the area of the group's bounding box;
  3. area (poor naming, for historical reasons) is the ratio of count to size;
  4. height is the height of the group's bounding box;
  5. width is the width of the group's bounding box;
  6. red is a component of the group's RGB value;
  7. green is a component of the group's RGB value;
  8. blue is a component of the group's RGB value;
  9. color is the group's numerical RGB value;
  10. proportion is the group's height / width, and 0 if width is 0.

We define patterns to capture combinations of group properties. Below is a description of how patterns can be specified.

pattern-def :=   (DEFINE-PATTERN name () pattern-list)

pattern-list :=  pattern-form pattern-list || pattern-form

pattern-form :=  (bool pattern more-patterns) || pattern

more-patterns := pattern more-patterns || pattern

pattern := code ||
           (code count) ||
           (code count comp) ||
           (:NEIGHBOR code count) ||
           (:NEIGHBOR code count comp) ||
           (accessor count) ||
           (accessor count comp)

bool :=  :AND || :OR

comp :=  < || <= || = || >= || >

accessor := :COUNT  ||
            :AREA   ||
            :HEIGHT ||
            :WIDTH  ||
            :RED    ||
            :GREEN  ||
            :BLUE   ||
            :COLOR  ||
            :PROPORTION

Terminal types:
   name :=  SYMBOL
   code :=  INTEGER
   count := INTEGER
   area :=  INTEGER

The variables count and comp default to 1 and =, respectively, if they are not specified. An atomic code is equivalent to the form (:NEIGHBOR code 1 =). In other words, a single instance of the pixel pattern exists in the pixel group. Unfortunately, these forms are not mutually exclusive, but confusions are minor.

Patterns can thus be specified either verbosely or concisely (in both cases somewhat cryptically) as shown in the patterns below. The single number 20 in the letter-e form expands to a test equivalent to pixel_neighbors[20] == 1; the form (1 3) expands to pixel_neighbors[1] == 3; the form (28 1 >=) in the :rectangles form expands to pixel_neighbors[28] >= 1.

(define-pattern letter-e (:translation #\e)
  (:and (1 3) 20 80 84)                 ;upper E
  (:and 18 9 72 2 (17 3)))              ;lower E

(define-pattern :rectangles ()
  (:and (28 1 >=)			;top-left
	(112 1 >=)			;bottom-left
	(193 1 >=)			;bottom-right
	(7 1 >=))			;top-right
  (:and (28 1 >=)			;top-left
	(112 1 >=)			;bottom-left
	(193 1 >=)			;bottom-right
	(5 1 >=)))

3.3. Pixel pattern definition

We have included a few very simple debugging functions, group-under-cursor, show, and pixel-pattern, to help the developer construct new pixel patterns. The function call (group-under-cursor) will return a list of groups whose centers are nearest the current position of the cursor. Some experimentation is usually required to figure out exactly which pixel group is which. The function call (show group), where group is one of the elements of the list returned by group-under-cursor, or for that matter any pixel group at all, will color that pixel group red. Finally, given a pixel group, the function pixel-pattern will generate a pixel pattern specification of that group.

For example, here is one way I might define a pattern for the letter E: I position the cursor over a specific pattern, and and call (group-under-cursor). This returns a list of groups. For each group, I call (show group), and see if the pixel group I'm interestd in is colored. When I get the right one, I call (pixel-pattern group), wrap a define-pattern form around the result, and save it in a file.

SEGMAN(56): (segment-screen)

Beginning segmentation of entire screen.
Completed segmentation; 2736 groups found.
((:WIN-LOGOS (217164432)) (:CHECK-BOX (243214072 243216392))
 ...)

;;; Move cursor to specific location

SEGMAN(57): (group-under-cursor) ; The two forms returned are the
                                 ; pixel group list and each group's
                                 ; distance from the cursor position.
(244391488 244411208 244414688 244390328 244384528)
(2.828427 3.1622777 3.1622777 3.6055512 3.6055512)

SEGMAN(59): (setf groups *) ; Put the result in a variable.
(244391488 244411208 244414688 244390328 244384528)

;;; A caveat: these fixnums reference structures created and
;;; maintained by the DLL but not on the Lisp side.  Further calls to
;;; group-under-cursor will work, but another call to segment-screen
;;; will rebuild the structures, and attempts to access the pixel
;;; groups referenced by these specific pointers/fixnums will result
;;; in an error.

SEGMAN(60): (show 244391488) ; Color one of the groups.  This one
                             ; turns out not to be the intended one.
244391488

SEGMAN(61): (show 244411208) ; Color another group, the right one.
244411208

SEGMAN(62): (pixel-pattern 244411208) ; Generate a pattern.
(:AND (:COUNT 34) (:AREA 2/7) (10 2) (17 8) (34 6) (46 2) 48 (49 2)
 (81 3) ...)
SEGMAN(63): (print *) ; Oops, can't see it all. . .

(:AND (:COUNT 34) (:AREA 2/7) (10 2) (17 8) (34 6) (46 2) 48 (49 2)
 (81 3) 129 (136 5) 142 (145 2) 162) 
(:AND (:COUNT 34) (:AREA 2/7) (10 2) (17 8) (34 6) (46 2) 48 (49 2)
 (81 3) ...)

;;; Now cut this form, insert it into a define-pattern form, and save
;;; it away in a file for loading in a different context. 

(define-pattern new-pattern ()
  (:AND (:COUNT 34) (:AREA 2/7) (10 2) (17 8) (34 6) (46 2) 48 (49 2)
        (81 3) 129 (136 5) 142 (145 2) 162))

Note that there's no generalization in the result returned by pixel-pattern, in the sense that there are no "don't care" neighbor values. The pattern returned is exhaustive over the 256 neighbors and uses equality for its comparisons. It would be an interesting problem to try to learn minimal representations of different patterns that changed dynamically based on the input of new patterns. This is a classic machine learning problem, which unfortunately we haven't had the time to look at.

3.4. A pixel pattern definition utility

We have built a small graphical application to help users define patterns without having to resort to a great deal of coding. The application runs in the Allegro IDE, version 6.0. It loads automatically when SegMan is loaded into the IDE. Here's what this looks like in our system:
;;; Note that the listener starts in the common-graphics-user package.
CG-USER(0): :ld /research/systems/segman-v2/load-system
;;; . . . . . .

CG-USER(1): (capture)
#(PATTERN-SPECIFICATION-WINDOW :CAPTURE in Listener 1 @ #x20f874a2)

At this point a window will pop up, as shown below. This window gives limited access to the functionality described in the previous section. It is a prototype, so not everything works, but it has some useful characteristics.

Start by clicking the "Grab Screen" button. A dialog box will come up, explaining how to drag over a region of the screen without using the mouse keys (so that you don't select other applications; you end up using the left control key as a substitute for mouse down and mouse up.) Once you select a region, the window will expand to show what you have selected, as shown below. The redisplay process can be somewhat slow, in that it is carrying out the segmentation process in addition to displaying the bitmap. Unfortunately, one of the current limitations of the application is that you can't move the window once you've selected a region to be displayed; if you do, you'll need to grab the screen again.

Once you have a region displayed in the window, the Pixel Groups list box will display the pixel groups in that region. These may not be all of the pixel groups; a text box shows the minimum number of pixels that must be contained in a group for it to be displayed. (Currently this box cannot be edited; this is a bug that we will take care of shortly.)

Clicking on a group in the Pixel Groups list box will cause that group to be colored in the display. If the "Move pointer to selected group" flag is set, the mouse pointer will move automatically to the selected group as well. A selection action will also cause the properties of the group to be displayed in the Group Properties list box. This is not a complete list of the group's properties, in that it leaves out pixel neighbor information, but it should be enough to give an overview of the group's properties.

It would be tedious to click through all of the potentially hundreds of groups in the Pixel Groups list box, searching for a specific visual element. You can also click on groups in the image. This will cause the system to search for every group in the image whose center is within a constant value of the mouse click event. It will display these groups in the Pixel Groups list box. You'll probably have to click through these to find exactly the group you're interested in, but it's a much smaller search. To go back to all the groups in the image, you can click the Reset Groups button.

When you've selected a group for which you are interested in creating a specification, you can click the Show Pattern button. This will print out a Lisp form to the Lisp listener. The specification form for the group shown in the picture above is as follows:

(SEGMAN:DEFINE-PATTERN T1 ()
  (:AND (:COUNT 42) (:AREA 1) (:SIZE 42) (:HEIGHT 6) (:WIDTH 7)
   (:RED 212) (:GREEN 208) (:BLUE 200) (:COLOR 13947080)
   (:PROPORTION 5/6) 7 28 (31 5) 112 (124 4) 193 (199 4) (241 5)
   (255 20)))

You would edit the form slightly, to change the name T1 to a symbol more descriptive of the pattern, such as small-square, and perhaps remove some of the properties that may be overly restrictive such as the RGB color information. The resulting form can be saved away in a file for later use, or could be evaluated and tested interactive, as shown below. The first form is obtained by cutting and pasting from the result of the Show Pattern operation above.

CG-USER(5): (SEGMAN:DEFINE-PATTERN small-box ()
                  (:AND (:COUNT 42) (:AREA 1) (:SIZE 42) (:HEIGHT 6) (:WIDTH 7)
                        (:RED 212) (:GREEN 208) (:BLUE 200) (:COLOR 13947080)
                        (:PROPORTION 5/6)
                        7 28 (31 5) 112 (124 4) 193 (199 4) (241 5) (255 20)))
((:AND (:COUNT 42) (:AREA 1) (:SIZE 42) (:HEIGHT 6) (:WIDTH 7) (:RED 212) (:GREEN 208)
  (:BLUE 200) (:COLOR 13947080) ...))

CG-USER(6): (segman::pattern-groups 'small-box)
;;; These are the small boxes visible on the entire screen, at least
;;; the last time segment-screen was called.
(143093736 142246920 142088000 142030000 142020712 141069656 140787768)

CG-USER(7): (segman::show *)
;;; Now all the small boxes should be colored on the screen.
(143093736 142246920 142088000 142030000 142020712 141069656 140787768)

This application is only appropriate for defining patterns associated with individual pixel groups, rather than combinations of groups, as is often necessary. An additional application is under development, as discussed below.


4. Feature recognition

Simple segementation of the screen into pixel-groups gives us a lot of power in terms of recognizing features. However, simple segmentation only allows us to see shapes that consist of a single pixel-group. Often it is valuable to recognize features on the screen that are made up of more than one pixel-group. Examples of features made up of multiple pixel groups are icons, buttons, window borders, and strings of letters.

To recognize features that are not made up of a single pixel-group we must employ a two-step process. The first step is to find the pixel-groups that make up the feature. We do this by looking for specific pixel-groups that might be part of overall feature. We do this by selecting pixel-groups that have the right shape (the correct pixel-neighbor numbers). Not all pixel-groups with the correct shape are necessarily going to be part of the feature we are trying to detect. The second step is to choose from the candidate pixel-groups the ones that are in proximity to each other and in the correct spatial configuration. The SegMan system provides a variety of functions that find pixel-groups based on the spatial relationship to others.

For example, a standard Windows button is a rectilinear feature that appears to be raised out of the screen. This raised effect is created by applying a thin strip of color around the edges; lighter on the top and darker on the bottom. As far as SegMan is concerned, a button is made up of three pixel-groups: a rectangle and two L-shaped regions. However, these three groups must be in the correct relationship to each other in order to form what looks like a button. The lighter L-shape (upper shading) must be directly above and to the left of the rectangle and the darker L-shape (lower shading) must be directly below and to the right of the rectangle. When these relationships hold, there is a feature recognizable to the human use as a button.

The following is psuedocode for recognizing a button:

PROCEDURE find_buttons (screen) DO
   rectangles = find_all(rectangles, screen)
   upper_shadings = find_all(upper_shading, screen)
   lower_shading = find_all(lower_shading, screen)
   buttons = EMPTY_LIST

   FORALL rect IN rectangles DO
         upshade = find_group_containing(rect, upper_shadings)
         lowshade = find_group_containing(rect, lower_shadings)
         IF distance_between(upshade, rect) < 5 AND
            distance_between(lowshad, rect) < 5 AND
            color(upshade) > color(lowshade) THEN DO append(rect, buttons)

   RETURN buttons

In the first stage, we find all the pixel-groups of the shapes we need: rectangles, upper-shading, and lower-shading. Buttons is an empty list into which we will collect all features that look like buttons. In the first stage, we iterate through the rectangles, looking for those in the proper relationship to the other shapes we have indicated. We find a pixel-group in the upper_shadings list that most closely contains the rectangle. We find a pixel-group in the lower_shadings list that most closely contains the rectangle. Containment is a useful relationship because, even though upper_shadings and lower_shadings are L-shaped, their bounding boxes enclose a much larger area that, ideally, will contain a rectangle if the feature is a button. The next check is proximity of the L-shapes to the rectangle. This is very important because a button might be contained in a window and windows also are bounded by L-shaped shaded areas. But if the shading belonged to a window, one or both shadings will probably be further than five pixels away. Finally, we much make sure that the L-shape above the rectangle is lighter in color than the L-shape below the rectangle. If the upper L-shape was darker than the lower L-shape, perceptually, the feature will look recessed into the screen instead of raised.

All features that comprise of more than one pixel-group can be detected by applying one or more of the following relationships: contains, above, below, to the left, to the right. Additional details such as distance may be required to ensure robust recognition.


5. Memory management in SegMan

Memory management is tricky with SegMan. The pixel-groups are data objects that are stored on the C++ side (in segman.dll) and are not subject to the Lisp interpreter's garbage collection. Pointers to pixel-groups are passed to the lisp interpreter through calls to the built-in iterator. Pixel-groups, on the lisp side, are essentially integer addresses of the corresponding pixel-group objects in the DLL. Thus pixel-groups in lisp are not true objects themselves. Special helper functions are used to access the data objects stored in the DLL.

Segmentation:

In order to ensure that memory is not leaked, SegMan deletes all pixel-groups when segmentation occurs and creates a new list of pixel-groups from scratch. However, this means that any pixel-group pointers that are held as lisp values become dangling pointers; there is no way for the lisp interpreter to know that these values should be invalidated (to the lisp interpreter, pixel-group pointers look like fixnums). When segmentation is performed, all pixel-groups from the previous screen become dangling and any attempt to access the old pixel-group pointers through helper functions will result in seg-faults.

Pixel-groups that are created during segmentation appear red on the screen when (show) is called. Pixel-groups that show up red are transient; they will be deleted automatically at the next screen segmentation. These pixel-groups are also called "unsafe."

"Safe" pixel-groups:

Pixel-groups can be created "safely." A "safe" pixel-group is one that will not be deleted when the next screen segmentation occurs. A safe pixel-group is created using special functions such as (make-pixel-group) and (make-pixel-group-by-bounds). The pixel-groups created through these function calls are still kept in the DLL and their addresses are returned, but these pixel-groups will not be deleted until the user explicitly asks the DLL to delete them. The pixel-groups can be deleted using the function (delete-pixel-group). However, the pointers to these pixel-groups only exist on the lisp side. So, if a variable holding a safe pixel-group pointer is lost due to garbage collection, there is no way to recover the address of the safe pixel-group. It's memory has been effectively leaked. Memory leakage will impact SegMan's performance over time.

Safe pixel-groups show up blue on the screen when (show) is called.

Other considerations:

Most functions that perform feature detection in SegMan return non-safe pixel-group pointers. For example, (find-buttons) returns a list of pixel-group pointers, referring to the unsafe pixel-groups. Some functions, such as (find-string) returns safe pixel groups. (find-string) returns a list of pixel-groups that are effectively bounding the sequence of characters making up a string on the screen. Because a string consists of many pixel-groups that are not necessarily adjacent, we must make a new, safe, pixel-group instead of returning unsafe pixel-groups. It is important, therefore, to know whether you are retrieving safe or unsafe pixel-groups when you call a function so you know whether to delete the memory after use or whether the memory is transient.

Sometimes it is important to remember pixel-groups after the next screen segmentation. Special functions are provided to convert unsafe pixel-groups into safe pixel-groups. (make-pixel-group (get-bounds unsafe-group)) will return a safe pixel-group with the same bounded region as the unsafe pixel-group. (memory-safe unsafe-group-list) will return a list of safe pixel-groups given a list of unsafe pixel-groups. However, converting an unsafe pixel-group to a safe pixel-group means information is lost. Safe pixel-groups do not store information about individual pixels inside the group, only the bounded region. Therefore a safe pixel-group is not equivalent to an unsafe pixel-group although the bounded regions are equivalent.


6. Description of files

The following discussion is what to find in each file that makes up the SegMan system. The core system is located in systems/segman. All other directories contain supporting systems such as planners and cognitive models. Lisp source files are contained in systems/segman/segman. The Microsoft Developer Studio files used to build segman.dll are contained withing systems/segman/src.

6.1. segman.dll and the *.cpp files in src/

This dynamic-link library is created by compiling the Microsoft Developer Studio C++ project. The DLL contains the code for capturing the screen and segmenting the captured bitmap into groups of like-colored pixels called "pixel-groups". The pixel-groups are stored on the C++ side but pointers to the pixel-group objects can be obtained and passed to the lisp side via iterator calls (c_is_next), (c_get_next), and (c_reset_iterator). Since pointers are passed to the lisp side, the DLL also provides functions for manipulating the objects referred to by the pointers. I have created a convention of prefixing exported DLL calls with "c_", although I have also written wrapper functions that make calling DLL calls easier (see wrappers.lisp).

c_segment_screen(0, 0, 1024, 768);   //initialize the pixel-group list

while (c_is_next()) {                //iterate through the list
   CPixelGroup* g = c_get_next();
   //do something with g.
}

c_reset_iterator();                  //reset the list for the next iteration.

Segman.dll also provides provisions for debugging. Calling c_show(CPixelGroup* p) will cause the system to color the pixels belonging to the pixel-group on the screen so you can see a visual representation of the pixel-group. The pixel-group will show up as red pixels. If the pixel-group is created by means other than segmentation (e.g. (make-pixel-group)), the pixel-group will be displayed blue.

6.2. foreign-interface.lisp

This file contains the foreign-function interface for Allegro CL 5.01. Each def-foreign-call corresponds to a function exported by the segman.dll. so that you can call the functions as if they were lisp functions.

Cursor position functions:

Segmentation and iteration functions:

Screen manipulation functions:

Pixel-group creation and deletion:

Pixel-group helper functions:

CRect* helper functions:

6.3. wrappers.lisp

This file duplicates the functions in foreign-interface.lisp but with lisp-friendly function names. (c_double_click) is wrapped by a new function called (double-click) with the same parameters. The wrapper functions are superior to the foreign-interface functions in that they perform some pointer error-checking. Pointers in lisp and pointers in C++ are not always handled the same so conversions are made to ensure certain errors do not occur. Other wrapper functions simplify calls to the DLL such as (get-cursor) which wraps calls to (c_get_cursor_x) and (c_get_cursor_y) and returns the results as a single list-value. Some wrapper functions do not correspond to any functions in foreign-interface.lisp but provide helper-routines for pixel-groups that can be derived from foreign-interface functions such as (get-height) which wraps (c_get_top) and (c_get_bottom) with some processing.

Cursor position functions:

Pixel-group iterator and screen difference functions:

Pixel-group creation and deletion:

Pixel-group helper functions:

Screen manipulation functions:

RGB color helper functions:

The wrapper for (c_segment_screen) is in segmentation.lisp. It combines all the pixel-group iterator routines plus extra processing to identify and record pixel-groups.

6.4. segmentation.lisp

This file contains the functions used to initiate segmentation of the screen, collect pixel-groups into lisp data structures, and to begin classification of pixel-groups into single-group features. The function (segment-screen) causes SegMan to capture the screen and break it down into its constituent pixel-groups. Each pixel-group is collected and a series predicates are used to identify each pixel-group. Pixel-groups are all unknown when they are retrieved from the built-in iterator. Pixel-groups are categorized and inserted into an association list according to the predicates that recognize them. For example, all pixel-groups retrieved from the screen that cause the predicate, (rectangle-p), to return true are collected into the association list under the key, :rectangles. If a pixel-group is not recognized by any predicate, it is categorized under the key, :unknown. The following is an example of the association list returned by (segment-screen):

((#\5 (93580260))
 (#\8 (102169832))
 (#\y (93302820 102005680 102469236))
 (#\x (93750192 102884240))
 (:down-triangles (93301664 102614892 97748952 98491104))
 (:check-marks (97966280 98642540))
 (:rectangles (122619108 103739436 93224212 93227680 93234616 93245020 93246176 ...))
 ...
)

In the example, there a one character g, one character 8, three character y's, two character x's, four downward pointing triangles, two check marks, and a multitude of rectangle shapes. The size of the association list can be quite large since there are a lot of pixel-groups on any given screen and a lot of predicates. The association list returned by (segment-screen) effectively represents the state of the screen at the time it was captured. All pixel-groups are enumerated at least once in the data structure. Furthermore, this is the first pass at recognizing features on the screen. Letters, check marks, and other features that are represented by a single contiguous set of like-colored pixels can be found in the association list.

Predicates used for recognition of pixel-groups are listed in a special global variable called *segmentation-predicates*. This variable lists all segmentation predicates and the keys that matching pixel-groups should be categorized under. Without going into details about each predicate, most predicates are built on the principal of using pixel-neighbor numbers to detect salient features of the pixel-group. The (segment-screen) function iteratively applies each predicate to every pixel-group. The segmentation process can thus be computationally expensive. It should be noted that a pixel-group may be recognized by more than one predicate and show up under more than one key entry.

The (segment-screen) call, by default will capture and segment the entire screen. However, the optional parameter bounds, given as a list in the form (left top right bottom), can be used to constrain the screen capture area.

6.5 segman.lisp

This file contains higher-level recognition routines for finding screen features that are made up of more than one pixel-group, for example buttons, windows, and strings of text. Multi-group features are detected by selecting single-group features out of the screen-state association list and by making comparisons between the candidate pixel-groups. For example, a button is found by finding all rectanges that have shading above and below.

Most functions take the screen-state as a parameter because it must search for pixel-groups that are the right shape and in the right relationship with other pixel-groups. The screen-state refered to here is the association list returned by (segment-screen).

Many of the functions described in this section have been changed; updates to the documentation are in progress.

Debugging functions:

Pixel-group search functions:

Widget detection functions:

Word detection functions:

Adobe Illustrator widget detection functions:

The Adobe Illustrator canvas's border can be found using (second (assoc :illustrator-canvases screen-state)). The border around objects that are selected in the Adobe Illustrator canvas can be found using (second (assoc :illustrator-selections screen-state)).

Functions for detecting change:

Menu navigation functions:

6.6 state.lisp

The state substrate is intended for Segman controllers that reason about states of the user interface, rather than its procedural behavior. A small number of functions provides this kind of access; it is entirely equivalent to the functional substrate.