Complexity and Algorithms for MUL-Tree Pruning
A multiply-labeled tree (or MUL-tree) is a rooted tree in which every leaf is labeled by an element from some set X, but in which more than one leaf may be labeled by the same element of X. MUL-trees have applications in many fields. In phylogenetics, they can represent the evolution of gene families, where genes are represented by the species they belong to, the non-uniqueness of leaf-labels coming from the fact that a given genome may contain many paralogous genes. In this paper, we consider two problems related to the pruning of MUL-tree leading to single-labeled-phylogenetic trees on X. First, given a set of MUL-trees, we consider the MUL-tree Set Pruning for Consistency (MULSETPC) Problem asking for a leaf-pruning of each tree leading to a set of consistent trees, i.e. a collection of label-isomorphic single-labeled trees. Second, proceeding each gene tree at a time, we consider the MUL-tree Pruning for Reconciliation (MULPR) Problem, asking for a pruning minimizing a reconciliation cost with a given species tree. The two problems have applications in the species tree and synteny tree reconstruction fields. We show that MULTSETPC is NP-complete and that MULPR is W[2]-hard when parameterized by the duplication cost. We then develop a greedy heuristic for MULPR and show its accuracy on a set of Ensembl gene trees.
https://link.springer.com/chapter/10.1007/978-3-030-79987-8_23