LeetCode 139. Word Break

题目

Given a string s and a dictionary of words dict, determine if s can be segmented into a space-separated sequence of one or more dictionary words.
For example, given
s = "leetcode",
dict = ["leet", "code"].
Return true because "leetcode" can be segmented as "leet code".

思路1

写的第一个版本是超时的，但是思路应当是正确的，反正就是动态规划，但我用的是递归方法。
就是如果找到s[start,end]是单词表中的单词时，有两种处理，一种是算入然后往后走；另一种是忽略这次匹配成功，继续往后寻找。
代码大概这样：

class Solution {
public:
	int findNextWord(const string& s, int start, int curr, unordered_set<string>& wordDict) {
		//Find the end position of next possible word, or return -1
		string word = s.substr(start, curr - start + 1);
		for (int i = curr + 1; i < s.size(); i++) {
			word += s[i];
			if (wordDict.count(word) > 0) {
				return i;
			}
		}
		return -1;
	}
	bool parseSeq(const string& s, unordered_set<string>& wordDict, int start, int end, map<int, bool>& cache) {
		//Judge s[start...end]
		if (end >= s.size())
			return false;
		if (cache.count(start) > 1)
			return cache[start];
		string word = s.substr(start, end - start + 1);
		if (wordDict.count(word) > 0) {
			bool withThisWord = parseSeq(s, wordDict, end + 1, end + 1, cache);
			if (end + 1 == s.size() || withThisWord) {
				cache[start] = true;
				return true;
			}
		}
		int nextPos = findNextWord(s, start, end, wordDict);
		if (-1 != nextPos) {
			bool appendWord = parseSeq(s, wordDict, start, nextPos, cache);
			if (appendWord) {
				cache[start] = true;
				return true;
			}
		}
		cache[start] = false;
		return false;
	}
	bool wordBreak(string s, unordered_set<string>& wordDict) {
		map<int, bool> cache;
		return parseSeq(s, wordDict, 0, 0, cache);
	}
};

但是如果这里出现了超级多的单字，比如字符串是

aaaaaaaaaaaaaaaaaaaaaaa

然后字典是

{a,aa,aaa,aaaa}

这就搞笑了，得匹配死，所以肯定是超时。

思路2

这个是讨论组里看来的，突然发现一般来说DP递归能做的，就可以从后往前用非递归的方法写。对于这个问题来说，从后往前并不像做LCS那样时要完全倒着写。举个例子，有字符串str=thisisawesome 和字典{this, is, a, we, some, awesome} 的时候，可以用一个缓存cache来记录是否在str[i]位置有单词结束，比如str[3]位置就是”this”这个词结束的地方。当顺着继续搜索的时候，往前查找缓存记录，可以知道如果cache[i]=true的话，那才有可能从i这个位置往后到当前位置组成新单词，接着去查找字典看看有没有这个单词，如果没有的话，就得继续往前找到新的可用的开头i。这里加黑的“才有可能”是比较关键的。
代码大概是这样

bool wordBreak(string s, unordered_set<string>& wordDict) {
		if (wordDict.empty())
			return false;
		vector<bool> cache(s.size() + 1, false); //If a word ends at i, cache[i] = true
		cache[0] = true;
		for (int i = 0; i <= s.size(); i++) {
			for (int j = i - 1; j >= 0; j--) {
				if (cache[j]) {
					//Look backward and start at the end place of previous word
					string word = s.substr(j, i - j); // s[j...i-1]
					if (wordDict.find(word) != wordDict.end()) {
						cache[i] = true;
						break;
					}
				}
			}
		}
		return cache[s.size()];
	}

这里面向前迭代不断寻找cache[i]并尝试的情况，对应于之前思路1里面找到可用单词后的决定是否使用的分支情况。但是由于减少了一大堆递归调用，所以实际情况下的复杂度降低了不少。

题目

思路1

思路2

Leave a Reply Cancel reply