TreeMatcher
TreeMatcher is a toolkit we have developed in the
past few years. Basically, the toolkit contains
several programs for comparing two ordered
or unordered trees. The comparison is based on
the edit operations on trees, which are an extension
of string edit operations. These edit operations
include inserting a node, deleting a node, and
changing the label of a node.
Ordered Trees
Ordered, labeled trees are trees in which each node
has a label and the left-to-right order of its
children (if it has any) is fixed.
We have developed 3 useful programs for comparing
ordered trees: TMATCH, TMATCH-VLDC and TDISCOVER.
TMATCH:
Given two ordered trees T1 and T2, TMATCH compares
T1 and T2, calculating the minimum edit
distance from T1 and T2.
Please click
here
to see some examples about TMATCH.
TMATCH-VLDC:
Given an ordered pattern tree P and an ordered data tree D
where D contains multiple variable length don't cares (VLDCs),
TMATCH-VLDC compares P and D, and
-
allow each VLDC to substitute for part of a path
from the root to a leaf of D;
-
allow each VLDC to match part of such a path and
all the subtrees emanating from the nodes of the path;
-
allow either type of substitution with
the additional possibility that subtrees can be
freely removed from D.
TDISCOVER:
Given two ordered trees T1 and T2, TDISCOVER can find
the consensus of T1 and T2, i.e. the largest
common substructures of the two trees.
The common substructures are allowed to differ by
a user-specified edit distance.
Unordered Trees
Unordered, labeled trees are trees in which
the order among siblings is unimportant and
only ancestor-descendant relationships are
important. Finding the edit distance
between unordered trees T1 and T2 is computationally
expensive (an NP-complete problem).
We solve the problem using two approaches:
-
develop efficient heuristics
to find an approximate distance of T1 and T2;
-
impose a restriction on the edit
operations and find a constrained edit distance
between T1 and T2.
We have developed algorithms using both approaches.