Professor Fred DePiero

The Challenge of Subgraph Matching

Subgraph isomorphism (or 'graph matching') is proven to be in a combinatorically-wicked category of problems (NP-Complete). It requires an amount of computational effort that increases exponentially with the problem size (as given by the number of nodes). Computing a subgraph isomorphism via an exact algorithm would typically be considered intractable; this is why an approximate algorithm (such as LeRP or Basis-Graphs) may be attractive for some applications.

What are Graphs?

Graphs are a discrete structure that can be used to describe many kinds of information or physical systems, like chemical molecules or electric circuits. Two graphs (G and H) appear below:

These graphs are meant to serve as a simple (abstract) example - not representing any particular physical system. In general, graphs contain a set of "nodes" (shown as the small letters above) and "edges" which are connections between nodes. Nodes and edges may also have application-specific attributes, or "coloring". Formally, a graph G is defined to have a set of N nodes, gi, and Q edges, eij

What is a Graph Isomorphism?

Graphs G and H are said to be isomorphic if there exists a mapping, hk = m(gi), that preserves the adjacency of all nodes [foulds92]. Any application-specific node and edge colorings must also be consistent under the mapping. Subgraph isomorphism is a condition of isomorphism that exists between two subgraphs of G and H. The goal for a subgraph-matching algorithm is to have a maximal number of nodes in each subgraph.

Why is Determining Graph Isomorphism Difficult?

The amount of effort associated with an exact solution would typically be considered intractable, requiring an amount of effort that is exponential in the number of nodes, N. Note there are N! possible orderings of nodes. The subgraph isomorphism problem is even worse combinatorically, as the size of the matching subgraph is unknown. The subgraph isomorphism is proven to be NP-Complete. However, the class of difficulty for the graph isomorphism problem is unknown [West01]. It is due to the NP-Completeness of the subgraph isomorphism problem that an approximate solution was sought in the LeRP algorithm.

A factor increasing the difficulty of the isomorphism problem can be the (lack of) variation in degree. The degree of a node forms a natural classification for nodes, permitting candidate node mappings to be pruned out. Note however, that this type of classification may be less useful in the subgraph matching case, where the degree of some nodes can be altered due to structural differences.

The dynamic range of node and edge coloring is another factor that affects the computational demand of matching algorithms. Higher dynamic ranges greatly reduce potential matches. (Depending on the nature of the application, norms may be required to decide whether colors are equivalent).

To appreciate the combinatoric nature of the graph matching problem, consider the following two graphs... Are these isomorphic?

The answer is: YES! The mapping is simply a to a, b to b... Spot checking the nodes will verify that this is true. Such a verification would require on the order N^2 effort. 

This discussion has close ties to the matter of NP-Completeness. This categorization makes no guarantee of the effort needed to find a solution, but it does ensure that such a solution can be verified via a polynomial amount of effort.

Adjacency Matrix

The adjacency matrix [foulds92] is NxN for an N-node graph. It contains a one at a location (i,j) if an edge is present from node i to node j. In the LeRP algorithm the main diagonal is also set to one, although this is not part of the standard definition. Also in LeRP, edges are not considered to be directional (digraphs are not processed). Hence the adjacency matrix is symmetric.

The adjacency matrix plays an important role in verifying isomorphism. Given a node-to-node mapping, m(), between two graphs G and H, isomorphism may be  verified by reordering the nodes and relocating the edges of H. The reordered adjacency matrices A’ and B’, of graphs G and H, respectively, will then be identical – if an isomorphism exists. (This operation requires N^2 effort to visit each location in the adjacency matrices.) Verification using the reordered adjacency matrices can be used to eliminate the possibility of a false-positive result being reported for an isomorphism.

Automorphisms

An automorphism is an isomorphic mapping of the nodes of G onto itself. Automorphisms arise due to symmetries in graph structures [foulds92]. These are significant because the symmetries can result in more than one isomorphic mapping. In these cases two different mappings will result in identical adjacency matrices. We consider both to be valid, provided graph colorings are also in agreement.

The nodes that may be interchanged as part of the automorphism are said to form an automorphism group. The graphs G and H below are isomorphic and contain two automorphism groups.

A

a

b

c

d

e

f

g

a

1

 

 

 

 

 

1

b

 

1

 

 

 

 

1

c

 

 

1

 

1

 

1

d

 

 

 

1

 

1

1

e

 

 

1

 

1

 

 

f

 

 

 

1

 

1

 

g

1

1

1

1

 

 

1

B

b

a

d

c

f

e

g

b

1

 

 

 

 

 

1

a

 

1

 

 

 

 

1

d

 

 

1

 

1

 

1

c

 

 

 

1

 

1

1

f

 

 

1

 

1

 

 

e

 

 

 

1

 

1

 

g

1

1

1

1

 

 

1

G and H are two isomorphic graphs. G and H also contain automorphisms, as indicated in their adjacency matrices, A and B.

The first automorphism group contains nodes a and b. Notice that interchanging the rows and columns associated with nodes a and b results in an identical adjacency matrix. The second group involves two exchanges: [c, e]<->[d, f], where  “<->” means “exchange with”. Both exchanges must be made in this group before the adjacency matrix will appear as above.

This example demonstrates that the symmetries can be of different ‘order’, as in 2 sets of 2 nodes each with [c, e]<->[d, f]. The ‘order’ of an automorphism group describes the symmetry using the notation (s,t), where s gives the number of sets of nodes and t gives the number of nodes in each set. Hence the group with nodes a and b has order (2,1) and the larger group has order (2,2).

When the nodes of an automorphism group of order (2,1) appear on two subsequent rows, i and i+1, of an adjacency matrix, then the entries on these rows will be identical but for the 2x2 diagonal block. The element on the main diagonal will be 1 on both rows, by the LeRP convention. Also the super- and sub-diagonal elements on each of the two rows may be either 1 or 0. This location corresponds to an edge that directly connects ni to ni+1. Entries outside the 2x2 diagonal block are identical for an automorphism of order (2,1).


For More Information, Contact:

Fred DePiero
Electrical Engineering Department
Cal Poly State University