Professor Fred DePiero

Reference and Review Page

This page contains various references, cited on other parts of this web site.


In addition to the various references that are listed below, a number of the techniques for graph matching are reviewed. The reviews include a discussion of ‘comparison horizon’ associated with a given technique. This term describes the nature of the node-to-node comparisons that are performed. The horizon refers to the size of the subgraph neighborhood that is used when two nodes are compared. This size is measured as a path distance out to the furthest node incorporated in the comparison. For example, if only node colors were compared then the horizon distance would be zero. If the current node together with its immediate neighbors were compared, then the horizon would equal one, and so forth.

The review also describes whether a given technique is approximate or exact. Exact techniques will enumerate all possible mappings in the worst case. Approximate techniques avoid this enumeration, but are not guaranteed to report a maximal subgraph.

The LeRP algorithm is approximate, requiring polynomial effort. It also uses a dynamic comparison horizon. This latter feature is novel for LeRP, and is a key element to the mapping accuracy.

Methods for Object Recognition

Graphical Techniques

R.M Haralick, L.G. Shapiro, The consistent labeling problem I, IEEE Trans. Pattern Analysis and Machine Intelligence, 1 (1979) 173-184.

Maximal Clique of Association Graph

An association graph has nodes that designate a node-to-node mapping between scene and model graphs. Edges in the association graph are present when the adjacent nodes (of the association graph) are each consistent. This consistency includes features that are associated with nodes in the scene graphs (short line maps to short line, for example) and features that are relational between scene elements (relative orientation between lines is consistent, for example). Note the horizon distance for comparisons of consistency equals one.

A clique is a subset of nodes in a graph that are all fully interconnected. A maximal clique is the largest such subgraph that exists within some other graph. The maximal clique of an association graph is the largest set of corresponding features that are have consistent node features (colors) and consistent edge features (colors). Finding the maximal clique has the result of finding a maximal neighborhood (maximal horizon distance) for the reported subgraph. The problem of finding a maximal clique is proven to be NP-Complete.

H.G. Barrow, R.M. Burstall, Subgraph Isomorphism, Matching relational structures and maximal cliques, Information Processing Letters, 4 (1976) 83-84.

L.G. Shapiro, R.M Haralick, A metric for comparing relational descriptions, IEEE Trans. Pattern Analysis and Machine Intelligence, 7 (1985) 90-94.

Interpretation Tree

Uses a depth first search through all possible mappings. Technique prunes search paths when inconsistent relations are found. The method will backtrack, potentially exploring an exponential number of paths. This is not an approximate technique. Rather it is exact – by searching all possibilities. The basic technique can be used to find all possible mappings, or the best possible mapping (see relational distance matching). Individual comparisons of consistency are local, but a path from the root of the search tree to a leaf yields a larger comparison horizon. See grimson90.

Relational Distance Matching

Method improves search efficiency, by using a measure of structural similarity. Tree search will traverse all possible mappings. This is an exact technique – it will enumerate all possible mappings in the worst-case.

The structural similarity measure weights a given mapping by the number of differences present in the two graphs (under the mapping) such as missing edges. Earlier versions add dummy graph components to make the two graphs have the same number of nodes.

L.G. Shapiro, R.M Haralick, Structural descriptions and inexact matching, IEEE Trans. Pattern Analysis and Machine Intelligence, 3 (5) (1981) 504-519.

Discrete Relaxation

Initially assigns all possible mappings to a node (or all appropriate mappings, if colors are available). Potential mappings are then eliminated by examining local constraints – for example, by comparing the legitimacy of the mapping of two adjacent nodes. Comparisons of consistency are local (a distance of one node away), but after iterating the comparison horizon effectively increases. Note that the iteration is done, in general, with multiple mapping assignments being in place for any given node. No guarantee that the best mapping will be found. This is an approximate technique. No guarantee that iterations terminate with a single assignment remaining for each node (could be more than one).

A. Rosenfeld, R. Hummel, S. Zucker, Scene labeling by relaxation operators, IEEE Trans. Systems, Man and Cybernetics, 6 (1976) 420-453.

Version below performs comparisons using ‘superclique’ of each node. This consists of one central node and all of its neighbors. This comparison horizon distance is still one, but the neighborhood does include all nodes within the distance one.

The technique also uses a padded dictionary of graphs for comparisons. Assumptions are made regarding the degree of occlusion in scenes to determine how much padding is appropriate (how many nodes might be missing). Then, many graphs are generated – for each object that is to be recognized – which contain all permutations of missing nodes. This results in exponential growth of the stored graphs, and more comparisons that must be performed.

R.C. Wilson, E.R. Hancock, Structural matching by discrete relaxation, IEEE Trans. Pattern Analysis and Machine Intelligence, 19 (6) (1997) 634-648.

Graph Edit Distance

Compares the structural similarity of graphs by counting the number of editing changes that are necessary to make identical structures. The number of missing or extra edges and nodes are counted. This eliminates the problem of having a padded dictionary of graphs to compare against, as with [hancock97]. This approach is approximate in nature. Local comparisons are made to determine if nodes or edges should be edited. When combined into a sequence, the edit operations describe global changes to the entire graph.

R. Myers, R.C. Wilson, E.R. Hancock, Bayesian graph edit distance, IEEE Trans. Pattern Analysis and Machine Intelligence, 22 (6) (1997) 628-635.

The following technique is further optimized for large databases of known objects. It eliminates redundant subgraphs in the database models, storing each subgraph only once.

B.T.Messmer, H. Bunke, A new algorithm for error-tolerant subgraph isomorphism detection, IEEE Trans. Pattern Analysis and Machine Intelligence, 20 (5) (1998) 493-504.

Continuous Relaxation

Technique is similar in concept to discrete relaxation. But rather than a potential mapping being present or not, this method allows for a probability that describes the legitimacy of each mapping. Probabilities are updated iteratively. This is an approximate technique. No guarantee to find an optimal solution. As in the discrete relaxation case, comparisons are inherently local, but have a broader effect due to iterations.

R. Hummel, S. Zucker, On the foundations of relaxation labeling processes, IEEE Trans. Pattern Analysis and Machine Intelligence, 5 (1983) 267-287.

K. Price, Relaxation matching techniques – a comparison, IEEE Trans. Pattern Analysis and Machine Intelligence, 7 (1985) 617-623.

Continuous Assignment

The algorithm iteratively updates probabilities in a match matrix, which is similar to continuous version of a permutation matrix. Constraints are used to force the row sums and column sums to equal 1.0. Reported results appear to indicate that the technique requires a relatively high level of connectivity to yield accurate results. The technique is approximate in nature.

S. Gold, A Rangarajan, A graduated assignment algorithm for graph matching, IEEE Trans. Pattern Analysis and Machine Intelligence, 18 (4) (1996) 377-388.

Relational Indexing

Technique finds pairs of features. Each feature becomes a node in a 2-node graph; the relative position colors the edge. The feature values associated with the 2 nodes and edge are binned and used as an index into an accumulator array. Votes are accumulated; the model ending up with the largest number of votes is selected. Technique is approximate in nature and uses a fixed horizon (2-nodes with connecting edge).

M.S. Costa, L.G. Shapiro, Scene analysis using appearance-based models and relational indexing, IEEE Symposium on Computer Vision, (Nov. 1995) 103-108.

Methods for Object Recognition

Methods Involving Special Kinds of Graphs

Method using association graphs, but matching is between tree structures, not graphs in general. Hence method would not apply to relational models of objects, nor to electronic circuits, nor chemical molecules, and many other applications of interest.

M. Pelillo, K. Siddiqi, S.W.Zucker, Matching hierarchical structure using association graphs, IEEE Trans. Pattern Analysis and Machine Intelligence, 21 (11) (1999) 1105-1120.

 

Methods for Object Recognition

Non-Graphical Techniques

Local-Feature-Focus

Method finds a ‘focus feature’ on an object that is particularly distinctive. Then examines neighboring features and compares to models. Feature correspondence is hypothesized and verified by solving for the geometric transform and checking residual errors. Has been applied to 2-D objects. Uses a limited horizon of comparison, just 1 node distance.

R.C. Bolles, R.A. Cain, Recognizing and locating partially visible objects: the local-feature-focus method, Int. J. of Robotics Research, 1 (3) (1982) 1236-1253

Pose Clustering

Compute geometric transform using all possible mappings of a pair of corresponding points to another pair of corresponding points (or ‘control points’). Find clusters of geometric transform parameters that are present after all possible pairings have been examined. Technique is simple and on the order N^2. Application-specific parameters needed to detect occurrence of a cluster. Strictly local comparisons are performed.

G. Stockman, Object recognition and localization via pose clustering, Computer Vision, Graphics and Image Processing, 40 (1987) 361-387.

Geometric Hashing

Use each possible triplet of feature points for form a basis. Examine all other feature points, and express location for the given basis, a 2-D location. Bin the 2-D location, use it as an index to hash table, which contains a list of links to (model, triplet-feature-basis) data sets. For recognition, select all possible triplets of features in image, for each basis then compute the 2-D location off all other features. Go to the hash table to access the list of (model, basis) data sets that are appropriate for this location. Increment an accumulator for each of the (model, basis) data sets. After processing all triplets, find accumulator peaks.

Designed for better performance in applications with large databases, having many objects. Also optimized for fast on-line processing at the expense of more off-line processing and memory. Worst-case effort is O(N^4) during on-line, best-case is O(N). Effort for the off-line processing is O(M N^4), for M models. Requires binning of Affine transform parameters (bin size is application-specific). Binning problem exacerbated due to errors in feature locations. False peaks can occur in accumulator array due to occlusion, multiple objects, and bad feature computations.

Y. Lamden, H. Wolfson, Geometric hashing: a general and efficient model-based recognition scheme, Proc. 2nd Int. Conf. On Computer Vision, Tarpon Springs, FL (Nov. 1988) 238-249.

Range Image Registration

NON-Graphical Methods

ICP

Estimate transform (by any means), then compute residual error associated with the closest points found via the transform. Iteratively adjust transform to reduce residual error.

P.J. Besl, N.D. McKay, A method for registration of 3-D shapes, IEEE Trans. Pattern Analysis and Machine Intelligence, 14 (2) (1992) 239-256.

RANSAC

Uses randomly chosen correspondences, then pose is computed and verified. Geared for range images that lack distinctive features.

M. Fischler, R. Bolles, Random Consensus: a paradigm for model fitting with applications in image analysis and automated cartography, Communications of the ACM, 24 (1981) 381-395.

Spin Images

Spin Images are robust and descriptive features that are computed for each vertex of a mesh of range data. Spin images are invariant with respect to orientation – defined with respect to a local coordinate frame, formed by the local surface norm. Spin Images in two sets of range data are compared (N^2) and pairs with high correlation are matched. (Note this is equivalent to matching just the nodes of a graph, without regard for the edges, or any edge relationships.) Iterations are performed by computing a geometric transformation, comparing the transformed data, removing outliers, and repeating.

A.E. Johnson, M. Hebert, Efficient multiple model recognition in cluttered 3-D scenes, Proc. IEEE Conf. On Computer Vision and Pattern Recognition, Hawaii, (June, 1998).

 

Feature Tracking with Affine Motion Model

These techniques assume very little motion in the scene, 1/5% of the image, in some cases. The affine model describes the warping of the image that is necessary to describe the intensity shift in pixels associated with the slight movement.

These techniques find features which are (ideally) corners in images. These are found by examining the eigenvalues of a gradient matrix. When both eigenvalues are large, then a corner is likely to be present. Given a set of feature points in each image, these algorithms attempt to determine feature correspondance, and then to find camera movement by a method like Horn's.

B. D. Lucas and T. Kanade. An iterative image registration technique with an application to stereo vision. In IJCAI, pages 674–679, 1981.

J. Shi and C. Tomasi. “Good Features To Track.” Proc. Computer Vision and Pattern Recognition (CVPR’94), pp. 593-600, 1994.

Much work has been done to improve the robustness of the tracking. By making an assumption of small camera displacements and high frame rate calculations, it is possible to assume that the (ideal) motion of all the features is all due to the camera with the translation, rotation, and perspective effects all described by each feature's affine model. Then the residual error of each feature may be examined and robust statistics (based on medians) may be used to eliminate poor correspondances.

T. Tommasini, A. Fusiello, E. Trucco and V. Roberto. Making good features track better. in Proc. IEEE Computer Society Conference on Computer Vision Pattern Recognition, 1998, pp. 145-149.

Although robust methods improve tracking, these methods do not include inter-feature constraints, as does the landmark-graph technique.

Range Image Registration

Graphical Methods

Combines problem of graph matching (for point correspondence) and problem of absolute orientation together in order to register range data. This is an iterative approach.

A.D.J. Cross, E.R. Hancock, Graph matching with a dual-step EM algorithm, IEEE Trans. Pattern Analysis and Machine Intelligence, 20 (11) (1998) 1236-1253

Uses simulated annealing to match graphs (a slow, but effective iterative optimization technique – See Numerical Recipes). Graphs of facial features are stretched to match faces.

Tefas, C. Kotropoulos, I. Pitas, Using support vector machines to enhance the performance of elastic graph matching for frontal face authentication, IEEE Trans. Pattern Analysis and Machine Intelligence, 23 (7) (2001) 735-746

Uses LeRP algorithm to eliminate iterative calculations. Graph structure greatly improves robustness of feature tracking, compared to only examining the residual error of individual tracked paths. The graph structure also constrains the inter-node separation when establishing correspondance. Much greater displacements between images can be handled than with methods involving affine motion models.

F. W. DePiero, "Fast Landmark-Based Registration via Deterministic and Efficient Processing, Some Preliminary Results," Proc. 1st Intl. Symp. on 3-D Data Processing, Visualization and Transmission (3DPVT), (Padova, Italy), June, 2002.

Representation of 3-D Data

2 ½-D Mesh - Delaunay Triangulation

Forms a connected mesh (connects ‘nearest neighbors’) of 2-d points. Mesh is unique, given a set of points. Selection of neighbors is insensitive to ordering of points.

J.R. Shewchuk, Triangle: Engineering a 2D Quality Mesh Generator and Delaunay Triangulator, Proc. First Workshop on Applied Computational Geometry, Philadelphia, Pennsylvania, pages 124-133, ACM, May 1996.

Surface-Edge-Vertex Model

Data structure stores modes of surface patches, linked to curves that form borders of patches, linked to points that form end points of the curves.

Generalized Cylinder

A space curve defines the ‘centerline’ of the cylinder. A separate function defines the radius, which is swept 180 degrees to form the cross-section.

Octrees

Formed by an 8-ary tree of cubic regions (‘voxels’).

Superquadrics

A high-order 3-D surface parameterized by a longitudinal and latitudinal degree of freedom.

Aspect Graph

Aspect graphs describe all possible views of an object (view classes).

Balloons and Snakes

Balloons and snakes are deformable models that are adjusted in shape via iterative algorithms. Shape adjusted to minimize an energy function that can weight the perpendicular distance to each associated point, and that can weight the curvature of the model. Models can also adjust over time to changes in data.

Absolute Orientation, Pose Estimation without Recognition, And Coordinate Transforms

Absolute Orientation

Finds optimal rotation (in least squares sense) via a maximum eigenvalue formulation. Uses quaternion representation for rotation, avoiding the orthonomality constraint associated with a standard rotation matrix. Quaternions describe a rotation via a unit rotation axis, together with a scalar that designates the rotation angle.

B.K.P. Horn, Closed-form solution of absolute orientation using unit quaternions, J. Optical Society of America A, 4 (4) (1987) 629-642.

Affine Transformation

Can be used to map 2-D points into 2-D points, including the effect of rotation, scaling, and translation. Does not include an explicit rotation matrix which is orthonormal. Generalized form can also include shear and reflection. See shapiro2001.

Homogenous Coordinate Transform

Can be used to map 2-D points into 2-D points, or 3-D. Includes the effect of rotation, scaling, and translation. Does include an explicit rotation matrix that is orthonormal.

T. Yoshikawa, Foundations of Robotics: Analysis and Control, MIT Press, NJ, 1990.

Perspective 3-Point Problem

Determines the pose of a triangle observed in an image. True dimensions of triangle must be known. First version used an iterative solution method. Then, Fischler and Bolles derived a closed form solution, as part of their RANSAC system. See bolles81.

 

Camera Calibration

Tsai’s Method

Solves for extrinsic calibration parameters (camera rotation and translation). Does not solve for intrinsic camera parameters – and assumes a form that governs these parameters (focal length, pixels per square inch, a lens distortion factor). Also assumes all rays pass through a perfect lens center.

R. Tsai, A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the-shelf cameras and lenses, IEEE Trans. Robotics and Automation, 3 (4) (1987).

Two Planes Method

Finds mapping between image plane and two different (world) planes. Makes no assumptions about intrinsic or extrinsic camera parameters. Makes no assumption about the form of lens distortion. Does not assume all rays pass through a perfect lens center. May permit improved modeling of distortion, with lower order models that Tsai’s method.

A. Isaguirre, P. Pu, J. Summers, A new development in camera calibration for a pair of mobile cameras, Proc. Int. Conf. On Robotics and Automation, (1985) 74-79.

G.Q. Wei, S.D. Ma, Two plane camera calibration: a unified model, Proc. IEEE Conf. On Computer Vision and Pattern Recognition, Hawaii, (June, 1991) 133-138.

 

Image Processing

L.G. Shapiro, G.C. Stockman, Computer Vision, Prentice-Hall, NJ, 2001.

B.K.P. Horn, Robot Vision, McGraw-Hill, Cambridge, MA, 1984.

W.E.L. Grimson, Object recognition by computer, MIT Press, 1990

 

Signal Processing, Probability and Statistics

R.E. Kalman, A new approach to linear filtering and prediction problems, Trans ASME J. Basic Eng., 83 (1960) 35-45.

Y. Bar-Shalom, T.E.Fortmann, Tracking and Data Association, Academic Press, New York, 1989.

G.R. Cooper, C.D. McGillem, Probabilistic Methods of Signal and System Analysis, Harcourt Brace and Jovanovich, Orlando, FL, 1986.

A.V. Oppenheim, R.W. Shafer, Discrete-Time Signal Processing, Prentice-Hall, Englewood Cliffs, NJ, 1989.

 

Pattern Recognition

R. Duda, P. Hart, Pattern Classification and Scene Analysis, John Wiley and Sons, New York, 1973.

 

Mathematics

G. Strang, Linear Algebra and Its Applications, 2nd ed., Academic Press, New York, 1980.

G. Strang, Introduction to Applied Mathematics, Wellesley-Cambridge Press, Wellesley MA, 1986.

I.M. Sobol, A Primer for the Monte Carlo Method, CRC Press, Boca Raton, 1994.

W.H. Press, B.P. Flannery, S.A. Teukolsky, W.T. Vetterling, Numerical Recipes in C, Cambridge University Press, New York, 1988.

G. Shafer, A Mathematical Theory of Evidence, Princeton University Press, Princeton, New Jersey, 1976.

G. H. Golub, C. F. Van Loan, Matrix Computations, 2nd. Ed., Hopkins University Press, Baltimore, MD, 1989.

Graph Theory

D.B. West, Introduction to Graph Theory, 2nd ed., Prentice-Hall, Upper Saddle River, New Jersey, 2001.

L.R. Foulds, Graph Theory Applications, Springer-Verlag, New York, 1992.

E. M. Palmer, Graphical Evolution – An Introduction to the Theory of Random Graphs, Wiley-Interscience, 1985.


For More Information, Contact:

Fred DePiero
Electrical Engineering Department
Cal Poly State University