执行交换操作后的最小汉明距离

难度: Medium

给你两个整数数组 source 和 target ，长度都是 n 。还有一个数组 allowedSwaps ，其中每个 allowedSwaps[i] = [a_i, b_i] 表示你可以交换数组 source 中下标为 a_i 和 b_i（下标从 0 开始）的两个元素。注意，你可以按任意顺序多次交换一对特定下标指向的元素。

相同长度的两个数组 source 和 target 间的 汉明距离 是元素不同的下标数量。形式上，其值等于满足 source[i] != target[i] （下标从 0 开始）的下标 i（0 <= i <= n-1）的数量。

在对数组 source 执行任意数量的交换操作后，返回 source 和 target 间的 最小汉明距离 。

示例 1：

输入：source = [1,2,3,4], target = [2,1,4,5], allowedSwaps = [[0,1],[2,3]]
输出：1
解释：source 可以按下述方式转换：
- 交换下标 0 和 1 指向的元素：source = [2,1,3,4]
- 交换下标 2 和 3 指向的元素：source = [2,1,4,3]
source 和 target 间的汉明距离是 1 ，二者有 1 处元素不同，在下标 3 。

示例 2：

输入：source = [1,2,3,4], target = [1,3,2,4], allowedSwaps = []
输出：2
解释：不能对 source 执行交换操作。
source 和 target 间的汉明距离是 2 ，二者有 2 处元素不同，在下标 1 和下标 2 。

示例 3：

输入：source = [5,1,2,4,3], target = [1,5,4,2,3], allowedSwaps = [[0,4],[4,2],[1,3],[1,4]]
输出：0

提示：

n == source.length == target.length
1 <= n <= 10⁵
1 <= source[i], target[i] <= 10⁵
0 <= allowedSwaps.length <= 10⁵
allowedSwaps[i].length == 2
0 <= a_i, b_i <= n - 1
a_i != b_i

Submission

运行时间: 226 ms

内存: 51.2 MB

class Solution:
    def minimumHammingDistance(self, source: List[int], target: List[int], allowedSwaps: List[List[int]]) -> int:
        n = len(source)
        parent = list(range(n))  # Initialize parent array for union-find

        def find(x):
            if x != parent[x]:
                parent[x] = find(parent[x])
            return parent[x]

        def union(x, y):
            root_x = find(x)
            root_y = find(y)
            if root_x != root_y:
                parent[root_x] = root_y

        # Build union-find data structure based on allowed swaps
        for swap in allowedSwaps:
            union(swap[0], swap[1])

        # Create a dictionary to map root indices to sets of elements
        groups = {}
        for i in range(n):
            root = find(i)
            if root not in groups:
                groups[root] = set()
            groups[root].add(source[i])

        # Iterate over target array and remove elements from sets
        hamming_distance = 0
        for i in range(n):
            root = find(i)
            if target[i] not in groups[root]:
                hamming_distance += 1
            else:
                groups[root].remove(target[i])

        return hamming_distance

class Solution:
    def minimumHammingDistance(self, source: List[int], target: List[int], allowedSwaps: List[List[int]]) -> int:
        n = len(source)
        parent = list(range(n))  # Initialize parent array for union-find

        def find(x):
            if x != parent[x]:
                parent[x] = find(parent[x])
            return parent[x]

        def union(x, y):
            root_x = find(x)
            root_y = find(y)
            if root_x != root_y:
                parent[root_x] = root_y

        # Build union-find data structure based on allowed swaps
        for swap in allowedSwaps:
            union(swap[0], swap[1])

        # Create a dictionary to map root indices to dictionaries of elements and their frequencies
        groups = {}
        for i in range(n):
            root = find(i)
            if root not in groups:
                groups[root] = {}
            groups[root][source[i]] = groups[root].get(source[i], 0) + 1

        # Iterate over target array and remove elements from dictionaries
        hamming_distance = 0
        for i in range(n):
            root = find(i)
            if target[i] not in groups[root] or groups[root][target[i]] == 0:
                hamming_distance += 1
            else:
                groups[root][target[i]] -= 1

        return hamming_distance

Explain

这个题解使用了并查集（union-find）数据结构来管理元素交换的可能性。首先，通过allowedSwaps提供的索引对，构建并查集，以此确定source中哪些元素可以通过交换被重新组织。接着，使用一个映射（字典）来记录每个并查集根节点下所有元素的出现频率。最后，遍历target数组，并尝试从相对应的并查集根节点的字典中删除target中的元素。如果在字典中找不到相应元素，汉明距离就增加1。这样，最终的汉明距离就是source和target在所有允许的交换操作后，仍然不匹配的元素总数。

时间复杂度: O(n)

空间复杂度: O(n)

class Solution:
    def minimumHammingDistance(self, source: List[int], target: List[int], allowedSwaps: List[List[int]]) -> int:
        n = len(source)  # 数组长度
        parent = list(range(n))  # 初始化并查集的父节点数组

        def find(x):
            if x != parent[x]:
                parent[x] = find(parent[x])  # 路径压缩
            return parent[x]

        def union(x, y):
            root_x = find(x)
            root_y = find(y)
            if root_x != root_y:
                parent[root_x] = root_y  # 合并两个集合

        # 根据allowedSwaps构建并查集
        for swap in allowedSwaps:
            union(swap[0], swap[1])

        # 为每个根节点构建元素频率字典
        groups = {}
        for i in range(n):
            root = find(i)
            if root not in groups:
                groups[root] = {}
            groups[root][source[i]] = groups[root].get(source[i], 0) + 1

        # 计算汉明距离
        hamming_distance = 0
        for i in range(n):
            root = find(i)
            if target[i] not in groups[root] or groups[root][target[i]] == 0:
                hamming_distance += 1  # 找不到匹配元素，增加距离
            else:
                groups[root][target[i]] -= 1  # 找到匹配元素，减少频率

        return hamming_distance

Explore

并查集通过将可以互相交换的元素组织成一个集合来确保这些元素可以自由重组。每次调用union操作，它将两个元素的根节点合并，这样所有连接的元素最终都会共享同一个根节点。当多个交换操作涉及相互连接的元素时，这些元素通过不断的union操作最终都会归属于同一个根节点下，从而形成一个可以自由交换元素的大集合。

元素频率字典是在构建并查集之后、计算汉明距离之前构建的。在所有的union操作完成后，每个元素都已经被分配到一个确定的集合中，此时构建频率字典可以准确地记录下每个集合中不同元素的出现次数。这样做的原因是，只有在集合构建完成后，我们才能得到每个元素最终所属的集合，进而正确统计和利用这些信息来与target数组进行比较和匹配。

题解中通过构建每个集合的元素频率字典来处理重复元素的匹配问题。这个字典记录了每个集合中每种元素出现的次数。在计算汉明距离时，对于target中的每个元素，程序会尝试从相应集合的字典中减少这个元素的计数。如果字典中该元素的计数已经为零或者元素不存在于字典中，就增加汉明距离。这种方法确保了即使在source和target中元素有重复时，也能正确地匹配相同元素的数量，只有无法匹配的元素才会增加汉明距离。

当target中的元素在对应的并查集组中不存在时，直接增加汉明距离是因为这表示source数组中的元素，即使经过所有允许的交换，仍无法匹配target中的该元素。增加汉明距离是为了反映这种不匹配的情况。潜在的遗漏可能发生在错误地管理并查集或元素频率字典时，例如并查集的合并操作没有正确执行，或者频率统计有误。但如果这些数据结构和操作都正确无误，直接增加汉明距离是合理且必要的，因为它确实反映了两个数组在经过所有可能的合法交换后的差异。