最短单词距离 II

难度: Medium

Submission

运行时间: 39 ms

内存: 23.8 MB

class WordDistance:

    def __init__(self, wordsDict: List[str]):
        wordIdx = DefaultDict(int)
        for w in wordsDict:
            if w not in wordIdx:
                wordIdx[w] = len(wordIdx)
        idxs = [[] for _ in range(len(wordIdx))]
        for i, w in enumerate(wordsDict):
            idx = wordIdx[w]
            idxs[idx] .append(i)
        self.wordIdx = wordIdx
        self.idxs = idxs

    def shortest(self, word1: str, word2: str) -> int:
        idxs1, idxs2 = self.idxs[self.wordIdx[word1]
                                 ], self.idxs[self.wordIdx[word2]]
        ans = inf
        i, j = 0, 0
        while i < len(idxs1):
            while j < len(idxs2) and idxs2[j] < idxs1[i]:
                ans = min(ans, idxs1[i] - idxs2[j])
                j += 1
            if j < len(idxs2):
                ans = min(ans, idxs2[j] - idxs1[i])
            else:
                break
            i += 1
        return ans

Explain

该题解采用了预处理+双指针的思路。在初始化时，先用哈希表 wordIdx 将单词映射到对应的序号，然后用列表 idxs 存储每个单词在原数组中出现的所有下标。在计算最短距离时，对于给定的两个单词 word1 和 word2，分别取出它们在 idxs 中存储的下标列表 idxs1 和 idxs2，然后用双指针 i 和 j 同时遍历这两个列表，更新最短距离 ans。

时间复杂度: O(n+m1+m2)

空间复杂度: O(n)

class WordDistance:

    def __init__(self, wordsDict: List[str]):
        # 将单词映射到对应的序号
        wordIdx = DefaultDict(int)
        for w in wordsDict:
            if w not in wordIdx:
                wordIdx[w] = len(wordIdx)
        
        # 存储每个单词在原数组中出现的所有下标
        idxs = [[] for _ in range(len(wordIdx))]
        for i, w in enumerate(wordsDict):
            idx = wordIdx[w]
            idxs[idx].append(i)
        
        self.wordIdx = wordIdx
        self.idxs = idxs

    def shortest(self, word1: str, word2: str) -> int:
        # 取出 word1 和 word2 对应的下标列表
        idxs1, idxs2 = self.idxs[self.wordIdx[word1]], self.idxs[self.wordIdx[word2]]
        ans = inf
        i, j = 0, 0
        
        # 双指针遍历下标列表，更新最短距离
        while i < len(idxs1):
            while j < len(idxs2) and idxs2[j] < idxs1[i]:
                ans = min(ans, idxs1[i] - idxs2[j])
                j += 1
            if j < len(idxs2):
                ans = min(ans, idxs2[j] - idxs1[i])
            else:
                break
            i += 1
        
        return ans

Explore

双指针方法适用于处理两个已排序的列表来寻找最小差值的问题，这种方法可以高效地通过逐步逼近最小差值来减少不必要的比较。对于本题，每个单词的索引列表是有序的，使用双指针可以在线性时间内找到两个列表中索引的最小差距，相比于暴力方法的平方时间复杂度要高效得多。

当`idxs2[j] < idxs1[i]`时，意味着当前`idxs2[j]`的值小于`idxs1[i]`的值，此时增加`j`可以使`idxs2[j]`向`idxs1[i]`靠近，可能得到更小的差值。如果增加`i`，则`idxs1[i]`将远离`idxs2[j]`，从而增大两者之间的差距。因此，为了寻找最小距离，我们应优先考虑增加`j`。

即使当一个单词出现次数远多于另一个单词时，双指针算法仍然是效率较高的方法。这是因为双指针方法只需遍历两个列表一次，总的时间复杂度依然是O(m+n)，其中m和n是两个单词的出现次数。双指针算法能够有效地跳过不必要的比较，直接逼近最小距离。

使用`DefaultDict(int)`可以自动处理新单词的索引分配问题。当访问一个尚未存储在字典中的键时，`DefaultDict(int)`会自动为该键创建一个默认值，这里的默认值是从0开始的整数，这样可以自动为新单词分配序号。如果使用普通字典，我们则需要手动检查键是否存在并适当地添加新键和值，这会使代码更加复杂和容易出错。