二叉树最近公共祖先详解（LCA问题详解）

Lowest Common Ancestor of a Binary Tree 问题描述

Given a binary tree, find the lowest common ancestor of two given nodes in the tree.

给定一棵二叉树，求最近公共祖先。

最近公共祖先定义为：“对于有根树 T 的两个结点 p、q，最近公共祖先表示为一个结点 x，满足 x 是 p、q 的祖先且 x 的深度尽可能大（一个节点也可以是它自己的祖先）。”

二叉搜索树 BST

先从最简单的二叉搜索树BST讲起，给定一棵 BST如下：

     _______6______
    /              \
 ___2__          ___8__
/      \        /      \
0      _4       7       9
      /  \
      3   5

比如如求2和8的LCA，那么为6，博主简单记为LCA(2,8) = 6

更多例子：

LCA(2,4) = 2
LCA(0,5) = 2

很简单的思路就是看两个值在root的哪边：

两个值都在左边，则LCA在左边
两个值都在右边，则LCA在右边
一个在左一个在右，则说明LCA就是当前的root节点。

用Python简单的实现如下，这里总是p < q的，复杂度O(n)

class Solution(object):
    def lowestCommonAncestor(self, root, p, q):
        """
        :type root: TreeNode
        :type p: TreeNode
        :type q: TreeNode
        :rtype: TreeNode
        """
        if not root:return root;
        if p.val > q.val:return self.lowestCommonAncestor(root, q, p)
        if root.val >= p.val and root.val <= q.val: return root;
        if root.val < p.val:return self.lowestCommonAncestor(root.right, p, q)
        if root.val > q.val:return self.lowestCommonAncestor(root.left, p, q)

ps:其实这也是Lowest 235 Common Ancestor of a Binary Search Tree 的解法

普通二叉树

节点定义如下，有左右子节点：(用java貌似看起来清晰点)

public class TreeNode {
     int val;
     TreeNode left;
     TreeNode right;
     TreeNode(int x) { val = x; }
}

像刚才那样看左右子节点肯定不行的啦，看下面的树:

     _______3______
    /              \
 ___5__          ___1__
/      \        /      \
6      _2       0       8
      /  \
      7   4

在举个栗子

LCA(5,2) = 5
LCA(5,1) = 3

思路：

一种简单的方法是DFS分别寻找到两个节点p和q的路径，然后对比路径，查看他们的第一个分岔口，则为LCA。

class Solution {
    bool find_path(TreeNode *root, const TreeNode * const p, vector<TreeNode *> &path){
        if(!root) return false;
        path.push_back(root);
        if(root == p){
            return true;
        }
        int len = path.size();
        if(find_path(root->left, p, path))
            return true;
        path.resize(len);
        return find_path(root->right, p, path);
    }
    
public:
    TreeNode* lowestCommonAncestor(TreeNode* root, TreeNode* p, TreeNode* q) {
        vector<TreeNode *> first, second;
        find_path(root, p, first);
        find_path(root, q, second);
        int len = min(first.size(), second.size());
        for(int i = 0; i < len; ++i){
            if(first[i] != second[i])
                return first[i - 1];
        }
        return first.size() < second.size()? first.back() : second.back();
    }
};

这个思路比较简单，代码写起来不如下面这种方法优雅：

Using a bottom-up approach, we can improve over the top-down approach by avoiding traversing the same nodes over and over again.

We traverse from the bottom, and once we reach a node which matches one of the two nodes, we pass it up to its parent. The parent would then test its left and right subtree if each contain one of the two nodes. If yes, then the parent must be the LCA and we pass its parent up to the root. If not, we pass the lower node which contains either one of the two nodes (if the left or right subtree contains either p or q), or NULL (if both the left and right subtree does not contain either p or q) up.

Sounds complicated? Surprisingly the code appears to be much simpler than the top-down one.

from : http://articles.leetcode.com/2011/07/lowest-common-ancestor-of-a-binary-tree-part-i.html

我们仍然可以用递归来解决，递归寻找两个带查询LCA的节点p和q，当找到后，返回给它们的父亲。如果某个节点的左右子树分别包括这两个节点，那么这个节点必然是所求的解，返回该节点。否则，返回左或者右子树（哪个包含p或者q的就返回哪个）。

复杂度O(n)

代码很简单，也写得很漂亮：

class Solution(object):
    def lowestCommonAncestor(self, root, p, q):
        """
        :type root: TreeNode
        :type p: TreeNode
        :type q: TreeNode
        :rtype: TreeNode
        """
        if not root or root == p or root == q: return root
        left = self.lowestCommonAncestor(root.left, p, q)
        right = self.lowestCommonAncestor(root.right, p, q)
        if left and right:
            return root
        return left if left else right

ps:其实这也是leetcode 236 Lowest Common Ancestor of a Binary Tree 的解法

带有父节点信息的二叉树

扩展，如果带有父节点的信息呢？

换句话说，树的定义为

public class TreeNode {
     int val;
     TreeNode left;
     TreeNode right;
     TreeNode parent;
     TreeNode(int x) { val = x; }
}

那么，一个简单的思路，对p和q向上走，用hashtable记录访问过的节点，

如果某个节点已经被访问过了，那么返回该节点。

复杂度O(h),h为树的高度

def LCA(root, p, q):
    vis = set()
    while p or q:
        if p:
            if p in vis:
                return p
            vis.add(p)
            p = p.parent
        if q:
            if q in vis:
                return q
            vis.add(q)
            q = q.parent
    return None

更好的方法是求出p和q的高度，高度比较高的向上移动，直到两个节点相遇。

复杂度O(h),h为树的高度

int getHeight(Node *p) {
  int height = 0;
  while (p) {
    height++;
    p = p->parent;
  }
  return height;
}
 
// As root->parent is NULL, we don't need to pass root in.
Node *LCA(Node *p, Node *q) {
  int h1 = getHeight(p);
  int h2 = getHeight(q);
  // swap both nodes in case p is deeper than q.
  if (h1 > h2) {
    swap(h1, h2);
    swap(p, q);
  }
  // invariant: h1 <= h2.
  int dh = h2 - h1;
  for (int h = 0; h < dh; h++)
    q = q->parent;
  while (p && q) {
    if (p == q) return p;
    p = p->parent;
    q = q->parent;
  }
  return NULL;  // p and q are not in the same tree
}

多次查询的普通二叉树

如果是多次查询呢?每次都递归一次?

这样有q次查询就要O(q * n) !

显然这复杂度太高。

我们可以用Tarjan离线算法来求LCA

Tarjan算法基于DFS和并查集进行求解。

我们知道，树的后序遍历顺序是：左子树，右子树，根节点。

所以，如果LCA(p,q) = x ,那么，进行后续遍历的时候，必然先访问完x的左右子树，才会返回x所在的节点。

嗯，所以我们进行如下操作：处理完一棵子树(root为x)的时候，把子树的节点放入x所在的集合，并设置祖先为x（并查集）

这样做的话，当我们访问了一个节点的时候，如果该节点为待查节点q,且p已经被访问过了，那么，p的祖先就是LCA.

     _______3______
    /              \
 ___5__          ___1__
/      \        /      \
6      _2       0       8
      /  \
      7   4

以上面的树为例，假设求

LCA(6,4) ,那么，由于是后续遍历，我们遍历到4的时候，6已经被访问过，且祖先设置为5，所以答案就是5 （后序遍历的是标志是否被访问~）

再比如LCA(5,4),我们访问到4的时候，5还没被访问，返回上级，然后是2节点，把2的祖先与4合并为2，在返回上级，此时是节点5，把5节点的右子树祖先和5合并，接着发现2已经被访问了，答案就是2的祖先，就是5

看到这里，你应该知道了用并查集的原因：可以快速的找到某个节点的祖先，这对多个查询是非常有利的。

C++

struct UnionFindSet {
	unordered_map<TreeNode* , TreeNode*> fa;

	TreeNode* find(TreeNode *p) {
		return fa[p] == NULL ? p : fa[p] = find(fa[p]);
	}

	void unionSet(TreeNode *p, TreeNode * q) {
		TreeNode *root_p = find(p);
		TreeNode *root_q = find(q);
		fa[root_p] = root_q;
	}
};

class Solution {
	unordered_set<TreeNode *> vis;
	TreeNode *p, *q,*ans;
	UnionFindSet unionFindSet;
public:
	
	TreeNode* lowestCommonAncestor(TreeNode* root, TreeNode* p, TreeNode* q) {
		this->p = p;
		this->q = q;
		Tarjan(root);
		return ans;
	}

	void Tarjan(TreeNode * root) {
		if (!root) return;
		unionFindSet.fa[root] = NULL;

		if (root->left) {
			Tarjan(root->left);
			unionFindSet.unionSet(root->left, root);
		}
			
		if (root->right) {
			Tarjan(root->right);
			unionFindSet.unionSet(root->right, root);
		}

		vis.insert(root);
		if (root == p  && vis.find(q) != vis.end())
			ans = unionFindSet.find(q);
		else if (root == q  && vis.find(p) != vis.end())
			ans = unionFindSet.find(p);
	}
};

上面的代码只是单个查询的情况，多个查询的情况只需要把最后的改为：遍历与当前结点root相关联的查询即可。

细语呢喃

二叉树最近公共祖先详解（LCA问题详解）

二叉搜索树 BST

普通二叉树

带有父节点信息的二叉树

多次查询的普通二叉树

参考资料