单词替换

难度: Medium

在英语中，有一个叫做 词根(root) 的概念，它可以跟着其他一些词组成另一个较长的单词——我们称这个词为 继承词(successor)。例如，词根an，跟随着单词 other(其他)，可以形成新的单词 another(另一个)。

现在，给定一个由许多词根组成的词典和一个句子，需要将句子中的所有继承词用词根替换掉。如果继承词有许多可以形成它的词根，则用最短的词根替换它。

需要输出替换之后的句子。

示例 1：

输入：dictionary = ["cat","bat","rat"], sentence = "the cattle was rattled by the battery"
输出："the cat was rat by the bat"

示例 2：

输入：dictionary = ["a","b","c"], sentence = "aadsfasf absbs bbab cadsfafs"
输出："a a b c"

示例 3：

输入：dictionary = ["a", "aa", "aaa", "aaaa"], sentence = "a aa a aaaa aaa aaa aaa aaaaaa bbb baba ababa"
输出："a a a a a a a a bbb baba a"

示例 4：

输入：dictionary = ["catt","cat","bat","rat"], sentence = "the cattle was rattled by the battery"
输出："the cat was rat by the bat"

示例 5：

输入：dictionary = ["ac","ab"], sentence = "it is abnormal that this solution is accepted"
输出："it is ab that this solution is ac"

提示：

1 <= dictionary.length <= 1000
1 <= dictionary[i].length <= 100
dictionary[i] 仅由小写字母组成。
1 <= sentence.length <= 10^6
sentence 仅由小写字母和空格组成。
sentence 中单词的总量在范围 [1, 1000] 内。
sentence 中每个单词的长度在范围 [1, 1000] 内。
sentence 中单词之间由一个空格隔开。
sentence 没有前导或尾随空格。

注意：本题与主站 648 题相同： https://leetcode-cn.com/problems/replace-words/

Submission

运行时间: 41 ms

内存: 27.5 MB

class Solution:
    def replaceWords(self, dictionary: List[str], sentence: str) -> str:
        trie = {}
        for word in dictionary:
            cur = trie
            for c in word:
                if c not in cur:
                    cur[c] = {}
                cur = cur[c]
            cur['#'] = {}
        
        words = sentence.split(' ')
        for i, word in enumerate(words):
            cur = trie
            for j, c in enumerate(word):
                if '#' in cur:
                    words[i] = word[:j]
                    break
                if c not in cur:
                    break
                cur = cur[c]
        return ' '.join(words)

Explain

该题解采用了字典树（Trie）的数据结构来高效处理词根替换问题。首先，构建一个字典树，将所有词根逐个插入到字典树中。每个节点存储其子节点的字母，并在词根的最后一个字母的节点中标记结束标志（使用'#'）。接下来，对输入句子进行分词，逐个检查每个单词是否能在字典树中找到匹配的词根。对于每个单词，从字典树的根节点开始匹配，如果在任何时刻找到了结束标志，则说明当前路径对应一个有效的词根，此时替换原单词为词根。如果未找到，则保留原单词。最后，将处理后的单词列表重新组合成句子。

时间复杂度: O(S + L)

空间复杂度: O(L)

# Solution class definition

class Solution:
    def replaceWords(self, dictionary: List[str], sentence: str) -> str:
        # Create a Trie
        trie = {}
        for word in dictionary:
            cur = trie
            for c in word:
                if c not in cur:
                    cur[c] = {}
                cur = cur[c]
            cur['#'] = {}
        
        # Split the sentence into words
        words = sentence.split(' ')
        # Replace words with the shortest root in the Trie
        for i, word in enumerate(words):
            cur = trie
            for j, c in enumerate(word):
                if '#' in cur:
                    words[i] = word[:j]
                    break
                if c not in cur:
                    break
                cur = cur[c]
        # Join the modified words back into a sentence
        return ' '.join(words)

Explore

在构建字典树时，如果多个词根具有相同的前缀，这些词根会共享相同的前缀路径。字典树的每个节点代表一个字母，并通过键-值对存储其子节点。当插入一个新的词根时，从根节点开始遍历每个字母，如果当前字母已经存在于该节点，则沿用现有的路径；如果不存在，则创建一个新的节点。这种共享前缀的结构使得字典树在空间利用上更加高效，同时也加快了查找速度。

选择最短的词根进行替换主要是基于题目要求，以确保替换后的句子尽可能简短和清晰。通常，最短的词根能够最有效地表达原单词的意图且保持句子的简洁。在实际应用中，这种做法可能不总是最优的，因为有时较长的词根可能提供更多的信息或更符合语境。然而，在特定的应用中，如搜索引擎优化或文本处理软件，简短的替换可能更受欢迎。

在字典树中使用'#'来标记一个词根的结束是为了明确区分哪些节点代表词根的结尾。这个标记帮助我们在遍历字典树时，清楚地知道何时到达了一个完整的词根。虽然使用'#'是常见的做法，但实际上可以使用任何不会与词根中的字母冲突的字符或符号。除了使用特殊字符，也可以通过在节点中设置一个特殊的布尔标记（如`isEnd`）来表示词根的结束，这种方法同样有效。