Problem 2707 Extra Characters in a String
Table of Contents
Problem Statement
Link - Problem 2707
Question
You are given a 0-indexed string s
and a dictionary of words dictionary
. You have to break s
into one or more non-overlapping substrings such that each substring is present in dictionary
. There may be some extra characters in s
which are not present in any of the substrings.
Return the minimum number of extra characters left over if you break up s
optimally.
Example 1:
Input: s = "leetscode", dictionary = ["leet","code","leetcode"] Output: 1 Explanation: We can break s in two substrings: "leet" from index 0 to 3 and "code" from index 5 to 8. There is only 1 unused character (at index 4), so we return 1.
Example 2:
Input: s = "sayhelloworld", dictionary = ["hello","world"] Output: 3 Explanation: We can break s in two substrings: "hello" from index 3 to 7 and "world" from index 8 to 12. The characters at indices 0, 1, 2 are not used in any substring and thus are considered as extra characters. Hence, we return 3.
Constraints:
1 <= s.length <= 50
1 <= dictionary.length <= 50
1 <= dictionary[i].length <= 50
dictionary[i]
ands
consists of only lowercase English lettersdictionary
contains distinct words
Solution
class Solution {
int solve(string s,unordered_map<string, int>& um, unordered_map<string, int>& memo) {
if (s.empty()) {
return 0;
}
if(memo.find(s) != memo.end()){
return memo[s];
}
int extra = INT_MAX;
string tmp = "";
for (int i = 0; i < s.size(); i++) {
tmp += s[i];
if (um.find(tmp) != um.end()) {
extra = min(extra, solve(s.substr(i + 1),um,memo));
}
}
extra = min(extra, 1 + solve(s.substr(1),um,memo));
memo[s] = extra;
return extra;
}
public:
int minExtraChar(string s, vector<string>& dict) {
unordered_map<string, int> um;
unordered_map<string, int> memo;
for (auto& val : dict) {
um[val]++;
}
return solve(s,um,memo);
}
};
Complexity Analysis
| Algorithm | Time Complexity | Space Complexity |
| --------------------------- | --------------- | ---------------- |
| Memoized Depth-First Search | O( m 2^n ) | O(n) |
Explanation
Intial Thoughts
To solve this problem, we need to break down the string into substrings that are present in the dictionary. We can start by checking each substring of the string to see if it is in the dictionary. If it is, we can recursively check the remaining part of the string. We should also keep track of the minimum number of extra characters we find. Some key points to consider are: checking all possible substrings, using recursion to solve the problem, keeping track of the minimum number of extra characters, using a dictionary to store the words for quick lookup, and avoiding duplicate work by storing the results of subproblems.
Intuitive Analysis
One way to intuitively solve this problem is to think of it like trying to cut a piece of string into smaller pieces that match a set of given templates. We start with the whole string and try to cut it into the largest possible piece that matches one of the templates. If we can’t find a match, we cut off the smallest possible piece and try again. We keep track of how many pieces we have to cut off that don’t match any template, and that’s our answer. Some key points to consider are: working from left to right, using the longest possible matching template, keeping track of the number of extra characters, using recursion to solve the problem, and avoiding duplicate work by storing the results of subproblems. We should also consider using a memoization technique to store the results of subproblems and avoid redundant work.
1. Intuition
- The problem can be approached using dynamic programming, where we try to break the string into substrings that are present in the dictionary.
- We need to consider all possible substrings of the input string and check if they are present in the dictionary.
- If a substring is found in the dictionary, we recursively try to break the remaining part of the string.
- The key insight is to use memoization to store the results of subproblems to avoid redundant calculations.
- We also need to consider the case where a character is not part of any substring in the dictionary, in which case we increment the count of extra characters.
- The solution works by finding the minimum number of extra characters left over after breaking the string into substrings.
- The time complexity is reduced by using memoization, which stores the results of subproblems and avoids redundant calculations.
- The space complexity is also optimized by using an unordered map to store the dictionary words and memoized results.
2. Implementation
- We define a recursive function
solve
that takes the input strings
, an unordered mapum
of dictionary words, and an unordered mapmemo
to store memoized results. - We use a loop to iterate over the input string and try to find substrings that are present in the dictionary using
um.find(tmp) != um.end()
. - If a substring is found, we recursively call
solve
with the remaining part of the string and update the count of extra characters usingmin(extra, solve(s.substr(i + 1),um,memo))
. - We also consider the case where a character is not part of any substring in the dictionary using
extra = min(extra, 1 + solve(s.substr(1),um,memo))
. - We store the memoized results in
memo[s] = extra
to avoid redundant calculations. - The final result is returned by calling
solve
with the input string and dictionary words usingreturn solve(s,um,memo)
. - We use an unordered map
um
to store the dictionary words and an unordered mapmemo
to store memoized results, which reduces the time complexity of the solution.
Complexity Analysis
Time Complexity:
- The time complexity is driven by the recursive
solve
function, which in the worst-case scenario, needs to explore all possible substrings of the inputs
. This leads to a time complexity of O(2^n * m), where n is the length of the input strings
and m is the average length of the strings in the dictionarydict
. Thefor
loop and recursive call insolve
contribute to this exponential time complexity. - The dominant operation is the recursive call to
solve
within thefor
loop, which is repeated for each character in the string. Theum.find(tmp)
andmemo.find(s)
operations have an average time complexity of O(1) due to the use ofunordered_map
, but are negligible compared to the recursive calls. - Justification of the O(2^n * m) time complexity lies in the fact that in the worst-case scenario (when no substring matches any dictionary word), the function explores all possible splits of the input string, resulting in 2^n possible recursive calls, each of which involves a string operation (substrings and dictionary lookups) that takes O(m) time.
Space Complexity:
- The space complexity is primarily driven by the use of
unordered_map
for memoization (memo
) and storing the dictionary (um
), as well as the recursive call stack. The space used bymemo
is proportional to the number of unique substrings ofs
, which is O(n), where n is the length ofs
. - The dictionary
um
stores m strings, but since it’s using anunordered_map
, its space complexity is O(m), where m is the number of strings in the dictionary. However, since m is not directly dependent on the input size n in terms of complexity analysis, it’s often not the dominant factor in space complexity for large inputs. - Justification of the O(n) space complexity lies in the maximum number of unique substrings that can be stored in
memo
, which directly correlates with the length of the input strings
. The recursive call stack’s depth can also reach up to n in the worst case, further supporting the O(n) space complexity classification.
Footnote
This question is rated as Medium difficulty.
Hints
Can we use Dynamic Programming here?
Define DP[i] as the min extra character if breaking up s[0:i] optimally.
Similar Questions:
Title | URL | Difficulty |
---|---|---|
Word Break | https://leetcode.com/problems/word-break | Medium |