TreeMatcher

TreeMatcher is a toolkit we have developed in the past few years. Basically, the toolkit contains several programs for comparing two ordered or unordered trees. The comparison is based on the edit operations on trees, which are an extension of string edit operations. These edit operations include inserting a node, deleting a node, and changing the label of a node.

Ordered Trees

Ordered, labeled trees are trees in which each node has a label and the left-to-right order of its children (if it has any) is fixed. We have developed 3 useful programs for comparing ordered trees: TMATCH, TMATCH-VLDC and TDISCOVER.

TMATCH:

Given two ordered trees T1 and T2, TMATCH compares T1 and T2, calculating the minimum edit distance from T1 and T2. Please click here to see some examples about TMATCH.

TMATCH-VLDC:

Given an ordered pattern tree P and an ordered data tree D where D contains multiple variable length don't cares (VLDCs), TMATCH-VLDC compares P and D, and

TDISCOVER:

Given two ordered trees T1 and T2, TDISCOVER can find the consensus of T1 and T2, i.e. the largest common substructures of the two trees. The common substructures are allowed to differ by a user-specified edit distance.

Unordered Trees

Unordered, labeled trees are trees in which the order among siblings is unimportant and only ancestor-descendant relationships are important. Finding the edit distance between unordered trees T1 and T2 is computationally expensive (an NP-complete problem). We solve the problem using two approaches: We have developed algorithms using both approaches.