leetcode 30 Substring with Concatenation of All Words

You are given a string, S, and a list of words, L, that are all of the same length. Find all starting indices of substring(s) in S that is a concatenation of each word in L exactly once and without any intervening characters.

For example, given: S: "barfoothefoobarman" L: ["foo", "bar"]

You should return the indices: [0,9]. (order does not matter).

题目地址：leetcode Substring with Concatenation of All Words

题目大意：给定一些单词words(可能重复，长度相同）和一个长字符串s，要求求出长字符串的所有起始位置，使得给定的所有单词在之后出现一次且仅一次

思路：

方法一

首先用对words中每个单词进行计数count_word(hash表)，

枚举起始位置和终点位置，每次用cur_cnt计数，若单词存在且cur_cnt[word] <= count_word[word]，总计数total_in+1, 如果rotal_in等于words的长度，放入答案ans

复杂度O(len(s) * len(words) * t)， t为计算每个单词hash值的时间。

C++ 125ms

class Solution {
public:
	vector<int> findSubstring(string s, vector<string>& words) {
		vector<int> ans;
		if (s.empty() || words.empty()) return ans;

		int word_len = words[0].size();
		int n = words.size();
		unordered_map<string, int> count_word;
		for (const string &word : words)
			++count_word[word];

		for (int i = 0; i + n * word_len <= s.size(); ++i) {
			int total_in = 0;
			unordered_map<string, int> cur_cnt;

			for (int j = i; j < i + n * word_len; j += word_len) {
				string word = s.substr(j, word_len);
				if (++cur_cnt[word] > count_word[word]) break;
				++total_in;
			}
			if (total_in == n)
				ans.push_back(i);
		}
		return ans;
	}
};

Python

class Solution:
    # @param S, a string
    # @param L, a list of string
    # @return a list of integer
    def findSubstring(self, S, L):
        ans=[]
        Dict = dict.fromkeys(L,0)
        for word in L:
            Dict[word]=Dict[word]+1
              
        
        totWord = len(L)
        wordLen = len(L[0])
        slen =len(S) - totWord * wordLen;
        for  i in range(0,slen+1):
            cnt =dict.fromkeys(L,0)
            okNum=0
            for k in range(0,totWord):
                cur=S[i+k*wordLen:i+(k+1)*wordLen]
                if(cur in Dict ):
                    cnt[cur]=cnt[cur] + 1
                    if(cnt[cur] > Dict[cur]):
                        break
                    okNum=okNum + 1
            
                          
            if(okNum==totWord):
                ans.append(i)
        
        return ans

方法二

我们可以用滑动窗口思想优化上面的过程。

设word_len为单词的长度，枚举可能的起点为[0, word_len]，

然后枚举s中的单词，令start = i表示当前窗口的起点，令j = i表示当前枚举的字符串s的起点，每次枚举结束j+= word_len

在枚举s中单词的过程中保持一个合法的窗口使得cur_cnt[word] <= count_word[word]，若窗口不合法，则说明之前已经有单词存在，不断的令start对应的单词-=1，start+=word_len即可。

复杂度O(word_len * len(s) / word_len * 2)= O(len(s) * 2)，乘上2是因为一次遍历中，每个单词最多两次。（PS：这个复杂度没有计算哈希的代价，假设为O(1）)

C++

class Solution {
public:
	vector<int> findSubstring(string s, vector<string>& words) {
		vector<int> ans;
		if (s.empty() || words.empty()) return ans;

		int word_len = words[0].size();
		unordered_map<string, int> count_word;
		for (const string &word : words)
			++count_word[word];

		for (int i = 0; i < word_len; ++i) {
			int total_in = 0;
			unordered_map<string, int> cur_cnt;

			for (int j = i, start = i; j + word_len <= s.size(); j += word_len) {
				string word = s.substr(j, word_len);
				if (count_word.find(word) == count_word.end()) {
					cur_cnt.clear();
					total_in = 0;
					start = j + word_len;
				}
				else {
					++cur_cnt[word];
					while (cur_cnt[word] > count_word[word]) {
						--cur_cnt[s.substr(start, word_len)];
						start += word_len;
						--total_in;
					}
					if (++total_in == words.size()) {
						ans.push_back(start);
					}
				}
			}
		}
		return ans;
	}
};

Python

class Solution(object):
    def findSubstring(self, s, words):
        """
        :type s: str
        :type words: List[str]
        :rtype: List[int]
        """
        if not s or not words: return []
        word_len = len(words[0])
        word_total = (len(words) - 1) * word_len
        ans = []
        word_cnt = collections.Counter(words)
        for i in range(word_len):
            start = i
            cur_cnt = collections.Counter()
            for j in range(i, len(s) - word_len + 1, word_len):
                word = s[j: j + word_len]
                if word in word_cnt:
                    cur_cnt[word] += 1
                    while cur_cnt[word] > word_cnt[word]:
                        cur_cnt[s[start: start + word_len]] -= 1
                        start += word_len
                else:
                    cur_cnt.clear()
                    start = j + word_len
                
                if(start + word_total == j):
                    ans.append(start)
        return ans

更多题解可以查看： https://www.hrwhisper.me/leetcode-algorithm-solution/