找出字符串中第一个匹配项的下标

难度: Easy

给你两个字符串 haystack 和 needle ，请你在 haystack 字符串中找出 needle 字符串的第一个匹配项的下标（下标从 0 开始）。如果 needle 不是 haystack 的一部分，则返回 -1 。

示例 1：

输入：haystack = "sadbutsad", needle = "sad"
输出：0
解释："sad" 在下标 0 和 6 处匹配。
第一个匹配项的下标是 0 ，所以返回 0 。

示例 2：

输入：haystack = "leetcode", needle = "leeto"
输出：-1
解释："leeto" 没有在 "leetcode" 中出现，所以返回 -1 。

提示：

1 <= haystack.length, needle.length <= 10⁴
haystack 和 needle 仅由小写英文字符组成

Submission

运行时间: 56 ms

内存: 15.9 MB

class Solution:
    def strStr(self, haystack: str, needle: str) -> int:
        if len(needle) == 0:
            return 0

        nxt = []
        def search():
            tar = 0
            pos = 0

            while tar < len(haystack):
                if haystack[tar] == needle[pos]:
                    tar += 1
                    pos += 1
                elif pos != 0:
                    pos = nxt[pos-1]
                else:
                    tar += 1

                if pos == len(needle):
                    return tar - pos

            return -1

        def gen_nxt():
            nxt.append(0)
            x = 1
            now = 0  # next[x-1]

            while x < len(needle):
                if needle[x] == needle[now]:
                    x += 1
                    now += 1
                    nxt.append(now)
                elif now != 0:
                    now = nxt[now-1]
                else:
                    x += 1
                    nxt.append(0)

        gen_nxt()
        return search()

Explain

这个题解使用了 KMP 算法来解决字符串匹配问题。首先通过 gen_nxt() 函数生成 needle 字符串的 next 数组，next[i] 表示 needle[0:i] 这个子串的最长相等前后缀的长度。然后在 search() 函数中，使用 next 数组来优化匹配过程，当出现字符不匹配时，通过 next 数组将 needle 向右移动尽可能远的距离，避免了暴力解法中的重复匹配，从而提高了匹配效率。

时间复杂度: O(m+n)

空间复杂度: O(m)

class Solution:
    def strStr(self, haystack: str, needle: str) -> int:
        if len(needle) == 0:
            return 0

        nxt = []
        def search():
            tar = 0
            pos = 0

            while tar < len(haystack):
                if haystack[tar] == needle[pos]:
                    tar += 1
                    pos += 1
                elif pos != 0:
                    pos = nxt[pos-1]  # 如果不匹配，则通过 next 数组将 needle 向右移动
                else:
                    tar += 1

                if pos == len(needle):  # 匹配成功
                    return tar - pos

            return -1

        def gen_nxt():
            nxt.append(0)
            x = 1
            now = 0  # next[x-1]

            while x < len(needle):
                if needle[x] == needle[now]:  # 如果相等，则 next[x] = now + 1
                    x += 1
                    now += 1
                    nxt.append(now)
                elif now != 0:  # 如果不相等，则通过 next 数组移动 now 指针
                    now = nxt[now-1]
                else:  # 如果 now 已经为 0，则 next[x] = 0
                    x += 1
                    nxt.append(0)

        gen_nxt()  # 生成 next 数组
        return search()  # 进行匹配

Explore

在 gen_nxt() 函数中，next 数组记录了模式串 needle 的每个位置的前缀与后缀最长公共元素长度。这意味着当在 haystack 中进行匹配时，如果遇到不匹配的情况，我们可以利用 next 数组找到 needle 中前一部分已匹配的最长前后缀，直接跳过这部分已知的匹配，避免从头开始匹配。这样，算法就不需要重复检查已经知道匹配的部分，从而减少不必要的比较次数，提高匹配效率。

在 search() 函数中，使用 next 数组可以实现快速匹配。当在 haystack 中匹配 needle 时，如果发现当前字符不匹配，而且不是 needle 的第一个字符，那么可以查看 next 数组。next 数组告诉我们 needle 中已匹配的部分的最长相等前后缀长度，我们可以将 needle 滑动，使得 needle 的这部分后缀对齐到已匹配的前缀，继续匹配过程。这种方式避免了重复匹配已知的部分，从而加速了整体的匹配进程。

当 needle 的第一个字符在 haystack 中就不匹配时，KMP 算法会简单地将 haystack 的匹配位置向右移动一位，然后重新开始匹配。由于此时 needle 的匹配位置（pos）为 0，next 数组对于 pos 为 0 的位置通常也是 0，所以这种情况下的处理非常直接，即 tar（target 在 haystack 中的位置）自增一位，尝试下一个字符。

虽然在提供的代码示例中没有直接检查 haystack 的长度是否小于 needle 的长度，但这是一个实际应用中应当考虑的情况。理论上，如果 haystack 的长度小于 needle，那么不可能存在匹配，因此可以在搜索开始前进行一次简单的长度比较，如果 haystack 较短，则直接返回 -1，避免执行无用的匹配逻辑。这样不仅提升效率，也避免了不必要的计算。