Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LeetCode] 1316. Distinct Echo Substrings #1316

Open
grandyang opened this issue May 30, 2019 · 0 comments
Open

[LeetCode] 1316. Distinct Echo Substrings #1316

grandyang opened this issue May 30, 2019 · 0 comments

Comments

@grandyang
Copy link
Owner

grandyang commented May 30, 2019

Return the number of distinct non-empty substrings of text that can be written as the concatenation of some string with itself (i.e. it can be written as a + a where a is some string).

Example 1:

Input: text = "abcabcabc"
Output: 3
Explanation: The 3 substrings are "abcabc", "bcabca" and "cabcab".

Example 2:

Input: text = "leetcodeleetcode"
Output: 2
Explanation: The 2 substrings are "ee" and "leetcodeleetcode".

Constraints:

  • 1 <= text.length <= 2000
  • text has only lowercase English letters.

这道题定义了一种回声字符串 Echo String,就是说可以拆分成两个相同的子串的字符串,比如 abcabc,可以拆分成两个 abc,这就是一个回声串。现在给定了一个字符串,问其有多少个不同的回声子串。这道题的难点就在于如何查找回声串,我们之前应该做过很多回文串的题目,跟这里的回声串很像,只不过重复的字母顺序不同,还是需要逐个字母的来比较。这道题是要求不同的回声字符串的个数,为了避免重复,可以将找到的回声串放到一个 HashSet 中,利用其自动去重复的特点。

根据回声串的定义可以得知其最小长度应该为2,其重复的字符长度最小为1,最大为 n/2,可以按长度来遍历所有可能的回声串。len 是重复字符串的长度,从1遍历到 n/2,这里还是要用双指针来做,i是第一个重复字符串的起始位置,j是第二个重复字符串的起始位置,初始化为 len,还需要一个变量 cnt 来记录当前已匹配的字符个数。在循环中判断,若 text[i] 等于 text[j],则 cnt 自增1,否则 cnt 重置为0,若此时 cnt 等于 len 了,则将整个回声串取出加到 HashSet 中,cnt 自减1。这里为何要自减1呢,因为两个指针同时右移一位后,若字母相等,则 cnt 还会自增1,若之前不减1,则无法触发加入 HashSet 操作。循环结束后返回 HashSet 的长度即可,参见代码如下:

解法一:

class Solution {
public:
    int distinctEchoSubstrings(string text) {
        unordered_set<string> st;
        int n = text.size();
        for (int len = 1; len <= n / 2; ++len) {
            for (int i = 0, j = len, cnt = 0; i < n - len; ++i, ++j) {
                if (text[i] == text[j]) ++cnt;
                else cnt = 0;
                if (cnt == len) {
                    st.insert(text.substr(i, len));
                    --cnt;
                }
            }
        }
        return st.size();
    }
};

再来看一种滚动哈希 Rolling Hash 的解法,其经常被用于快速匹配字符串的 Rabin-Karp 算法,这种算法实际上是把每个子串都编码成一个独特的长整型数,根据这个编码值就可以快速的判断子串是否相等。具体来说,这里取了两个互质的数字 BASE 和 M,对于字符串 abcd,定义哈希函数为:

(a * BASE^3 + b * BASE^2 + c * BASE^1 + d * BASE^0) % M

这里的字符是用其 ASCII 码来表示的, BASE^k 是固定量,可以放在一个数组 power 中,同时可以计算出 [0, 1], [0, 2], [0, 3], ... [0, n] 范围内的所有哈希值,保存在 hash 数组中,则此时 hash[i] 就表示前i个字符组成的字符串的哈希值(注意这里为了计算方便,字符串的高位是在最右边)。这实际上就类似于累加和数组,可以快速求出任意区间的哈希值。

此时按照长度来遍历所有区间,len 的范围是 [2, n-i],i为此时的左边界位置,取其中点位置 i + len/2,现在的目标是判断区间 [i, mid] 和 [mid, i+len] 之间的子串是否相等。可以通过计算其哈希值来判断,计算哈希值的方法就类似于累加和数组求任意区间之和,计算区间 [i, j] 的哈希值的方法为 (hash[j] - hash[i] * power[j - i] % M + M) % M,这里不停的对M取余就是为了防止溢出,参见代码如下:

解法二:

class Solution {
public:
    int distinctEchoSubstrings(string text) {
        unordered_set<long> st;
        int n = text.size(), BASE = 29, M = 1e9 + 7;
        vector<long> hash(n + 1), power(n + 1);
        power[0] = 1;
        for (int i = 1; i <= n; ++i) {
            hash[i] = (hash[i - 1] * BASE + text[i - 1]) % M;
            power[i] = power[i - 1] * BASE % M;
        }
        for (int i = 0; i < n; ++i) {
            for (int len = 2; i + len <= n; len += 2) {
                int mid = i + len / 2;
                long hash1 = getHash(i, mid, hash, power, M);
                long hash2 = getHash(mid, i + len, hash, power, M);
                if (hash1 == hash2) st.insert(hash1);
            }
        }
        return st.size();
    }
    long getHash(int i, int j, vector<long>& hash, vector<long>& power, int M) {
        return (hash[j] - hash[i] * power[j - i] % M + M) % M;
    }
};

Github 同步地址:

#1316

类似题目:

Find Substring With Given Hash Value

参考资料:

https://leetcode.com/problems/distinct-echo-substrings/

https://leetcode.com/problems/distinct-echo-substrings/discuss/492704/Intuitive100-Sliding-Counter-(with-pictures)

https://leetcode.com/problems/distinct-echo-substrings/discuss/477217/Java-Brute-force-and-Hash-Solution-Clean-code

LeetCode All in One 题目讲解汇总(持续更新中...)

(欢迎加入博主的知识星球,博主将及时答疑解惑,并分享刷题经验与总结,快快加入吧~)

知识星球 喜欢请点赞,疼爱请打赏❤️~.~

微信打赏

|

Venmo 打赏


---|---

@grandyang grandyang changed the title [LeetCode] 1316. Missing Problem [LeetCode] 1316. Distinct Echo Substrings Nov 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant