Problem 1371 Find the Longest Substring Containing Vowels in Even Counts
Table of Contents
Problem Statement
Link - Problem 1371
Question
Given the string s
, return the size of the longest substring containing each vowel an even number of times. That is, 'a', 'e', 'i', 'o', and 'u' must appear an even number of times.
Example 1:
Input: s = "eleetminicoworoep" Output: 13 Explanation: The longest substring is "leetminicowor" which contains two each of the vowels: e, i and o and zero of the vowels: a and u.
Example 2:
Input: s = "leetcodeisgreat" Output: 5 Explanation: The longest substring is "leetc" which contains two e's.
Example 3:
Input: s = "bcbcbc" Output: 6 Explanation: In this case, the given string "bcbcbc" is the longest because all vowels: a, e, i, o and u appear zero times.
Constraints:
1 <= s.length <= 5 x 10^5
s
contains only lowercase English letters.
Solution
class Solution {
public:
int findTheLongestSubstring(string s) {
int prefixXOR = 0;
int characterMap[26] = {0};
characterMap['a' - 'a'] = 1;
characterMap['e' - 'a'] = 2;
characterMap['i' - 'a'] = 4;
characterMap['o' - 'a'] = 8;
characterMap['u' - 'a'] = 16;
vector<int> mp(32, -1);
int longestSubstring = 0;
for (int i = 0; i < s.length(); i++) {
prefixXOR ^= characterMap[s[i] - 'a'];
if (mp[prefixXOR] == -1 and prefixXOR != 0) mp[prefixXOR] = i;
longestSubstring = max(longestSubstring, i - mp[prefixXOR]);
}
return longestSubstring;
}
};
Complexity Analysis
| Algorithm | Time Complexity | Space Complexity |
| ---------- | --------------- | ---------------- |
| Prefix XOR | O(n) | O(1) |
Explanation
Intial Thoughts
The problem seems to involve tracking the occurrence of vowels in a string and finding the longest substring where each vowel appears an even number of times. It can be approached by maintaining a counter for each vowel or using a clever encoding scheme. An initial step could be to identify the possible encoding schemes for the vowels, such as using bits to represent each vowel. Another thought is to use a prefix sum or prefix XOR to efficiently track the occurrence of vowels. Breaking down the problem into smaller sub-problems, such as finding the longest substring ending at each position, could also be a viable strategy.
Intuitive Analysis
One way to intuitively solve this problem is to think of it as finding a ‘balance’ in the string where the counts of vowels are even. Using a bit encoding scheme, where each vowel is associated with a bit, can help in efficiently tracking this balance. The XOR operation can then be used to ‘cancel out’ the occurrence of vowels, effectively finding substrings where each vowel appears an even number of times. Another intuitive approach is to consider the problem as finding a ‘matching’ between the start and end of the substring, where the counts of vowels are balanced. This can be visualized as a ‘journey’ through the string, where the goal is to find the longest path with balanced vowel counts.
1. Intuition
- The problem requires finding the longest substring with each vowel appearing an even number of times, which can be solved using a prefix XOR approach.
- The key insight is to use a bitmask to track the parity of each vowel, where each bit corresponds to a vowel.
- The prefix XOR approach allows us to efficiently track the parity of vowels in the substring.
- Using a hashmap to store the prefix XOR values and their corresponding indices enables us to find the longest substring in O(n) time complexity.
- The solution works by iterating through the string and updating the prefix XOR value based on the current character.
- The algorithmic choice of using a prefix XOR approach and a hashmap allows for an efficient solution with a time complexity of O(n).
- The problem constraints, such as the string length and character set, do not affect the overall approach but do impact the implementation details.
2. Implementation
- The
characterMap
array is used to map each vowel to a unique bitmask value, wherecharacterMap['a' - 'a'] = 1
,characterMap['e' - 'a'] = 2
,characterMap['i' - 'a'] = 4
,characterMap['o' - 'a'] = 8
, andcharacterMap['u' - 'a'] = 16
. - The
prefixXOR
variable is used to track the prefix XOR value, which is updated using the^=
operator based on the current character. - The
mp
vector is used to store the prefix XOR values and their corresponding indices, wheremp[prefixXOR]
stores the index of the first occurrence of the prefix XOR value. - The
longestSubstring
variable is used to store the length of the longest substring found so far, which is updated using themax
function. - The solution iterates through the string using a for loop, where
i
is the current index ands[i]
is the current character. - The
if (mp[prefixXOR] == -1 and prefixXOR != 0)
condition checks if the prefix XOR value has not been seen before and is not zero, in which case the index is stored in themp
vector. - The
longestSubstring = max(longestSubstring, i - mp[prefixXOR])
line updates the length of the longest substring found so far based on the current index and the stored index in themp
vector.
Complexity Analysis
Time Complexity:
- The algorithm iterates over the string
s
once, resulting in a linear time complexity. Thefor
loop runs fromi = 0
tos.length()
, traversing each character exactly once, which mathematically equates to O(n) where n is the length of the string. - The dominant operation within the loop is the XOR assignment
prefixXOR ^= characterMap[s[i] - 'a']
, which takes constant time. Additionally, themax
function call and array access also take constant time, supporting the overall linear time complexity. - Justification of the O(n) classification comes from the fact that the algorithm performs a constant amount of work for each character in the string, and no nested loops or recursive calls are involved, maintaining the linear time complexity.
Space Complexity:
- The space complexity is analyzed by considering the amount of extra memory used by the algorithm. In this case, the character map
characterMap
and the vectormp
are used, but their sizes are constant (26 forcharacterMap
and 32 formp
), not dependent on the input size. - The data structures used (
characterMap
andmp
) have a fixed size, resulting in a constant space complexity. The input strings
is not counted towards the space complexity as it is part of the input, not extra memory allocated by the algorithm. - Justification of the O(1) space complexity classification comes from the fact that the amount of extra memory used does not grow with the size of the input string, maintaining a constant space requirement.
Footnote
This question is rated as Medium difficulty.
Hints
Represent the counts (odd or even) of vowels with a bitmask.
Precompute the prefix xor for the bitmask of vowels and then get the longest valid substring.