Posts

Problem 1371 Find the Longest Substring Containing Vowels in Even Counts

Posted on Dec 22, 2024 6 mins

Hash-Table String Bit-Manipulation Prefix-Sum

Table of Contents

Problem Statement

Question

Given the string s, return the size of the longest substring containing each vowel an even number of times. That is, 'a', 'e', 'i', 'o', and 'u' must appear an even number of times.

Example 1:

Input: s = "eleetminicoworoep"
Output: 13
Explanation: The longest substring is "leetminicowor" which contains two each of the vowels: e, i and o and zero of the vowels: a and u.

Example 2:

Input: s = "leetcodeisgreat"
Output: 5
Explanation: The longest substring is "leetc" which contains two e's.

Example 3:

Input: s = "bcbcbc"
Output: 6
Explanation: In this case, the given string "bcbcbc" is the longest because all vowels: a, e, i, o and u appear zero times.

Constraints:

1 <= s.length <= 5 x 10^5
s contains only lowercase English letters.

Solution

class Solution {
public:
    int findTheLongestSubstring(string s) {
        int prefixXOR = 0;
        int characterMap[26] = {0};
        characterMap['a' - 'a'] = 1;
        characterMap['e' - 'a'] = 2;
        characterMap['i' - 'a'] = 4;
        characterMap['o' - 'a'] = 8;
        characterMap['u' - 'a'] = 16;
        vector<int> mp(32, -1);
        int longestSubstring = 0;

        for (int i = 0; i < s.length(); i++) {
            prefixXOR ^= characterMap[s[i] - 'a'];
            if (mp[prefixXOR] == -1 and prefixXOR != 0) mp[prefixXOR] = i;

            longestSubstring = max(longestSubstring, i - mp[prefixXOR]);
        }

        return longestSubstring;
    }
};

Complexity Analysis

| Algorithm  | Time Complexity | Space Complexity |
| ---------- | --------------- | ---------------- |
| Prefix XOR | O(n)            | O(1)             |

Explanation

Intial Thoughts

The problem seems to involve tracking the occurrence of vowels in a string and finding the longest substring where each vowel appears an even number of times. It can be approached by maintaining a counter for each vowel or using a clever encoding scheme. An initial step could be to identify the possible encoding schemes for the vowels, such as using bits to represent each vowel. Another thought is to use a prefix sum or prefix XOR to efficiently track the occurrence of vowels. Breaking down the problem into smaller sub-problems, such as finding the longest substring ending at each position, could also be a viable strategy.

Intuitive Analysis

One way to intuitively solve this problem is to think of it as finding a ‘balance’ in the string where the counts of vowels are even. Using a bit encoding scheme, where each vowel is associated with a bit, can help in efficiently tracking this balance. The XOR operation can then be used to ‘cancel out’ the occurrence of vowels, effectively finding substrings where each vowel appears an even number of times. Another intuitive approach is to consider the problem as finding a ‘matching’ between the start and end of the substring, where the counts of vowels are balanced. This can be visualized as a ‘journey’ through the string, where the goal is to find the longest path with balanced vowel counts.

1. Intuition

The problem requires finding the longest substring with each vowel appearing an even number of times, which can be solved using a prefix XOR approach.
The key insight is to use a bitmask to track the parity of each vowel, where each bit corresponds to a vowel.
The prefix XOR approach allows us to efficiently track the parity of vowels in the substring.
Using a hashmap to store the prefix XOR values and their corresponding indices enables us to find the longest substring in O(n) time complexity.
The solution works by iterating through the string and updating the prefix XOR value based on the current character.
The algorithmic choice of using a prefix XOR approach and a hashmap allows for an efficient solution with a time complexity of O(n).
The problem constraints, such as the string length and character set, do not affect the overall approach but do impact the implementation details.

2. Implementation

The characterMap array is used to map each vowel to a unique bitmask value, where characterMap['a' - 'a'] = 1, characterMap['e' - 'a'] = 2, characterMap['i' - 'a'] = 4, characterMap['o' - 'a'] = 8, and characterMap['u' - 'a'] = 16.
The prefixXOR variable is used to track the prefix XOR value, which is updated using the ^= operator based on the current character.
The mp vector is used to store the prefix XOR values and their corresponding indices, where mp[prefixXOR] stores the index of the first occurrence of the prefix XOR value.
The longestSubstring variable is used to store the length of the longest substring found so far, which is updated using the max function.
The solution iterates through the string using a for loop, where i is the current index and s[i] is the current character.
The if (mp[prefixXOR] == -1 and prefixXOR != 0) condition checks if the prefix XOR value has not been seen before and is not zero, in which case the index is stored in the mp vector.
The longestSubstring = max(longestSubstring, i - mp[prefixXOR]) line updates the length of the longest substring found so far based on the current index and the stored index in the mp vector.

Complexity Analysis

Time Complexity:

The algorithm iterates over the string s once, resulting in a linear time complexity. The for loop runs from i = 0 to s.length(), traversing each character exactly once, which mathematically equates to O(n) where n is the length of the string.
The dominant operation within the loop is the XOR assignment prefixXOR ^= characterMap[s[i] - 'a'], which takes constant time. Additionally, the max function call and array access also take constant time, supporting the overall linear time complexity.
Justification of the O(n) classification comes from the fact that the algorithm performs a constant amount of work for each character in the string, and no nested loops or recursive calls are involved, maintaining the linear time complexity.

Space Complexity:

The space complexity is analyzed by considering the amount of extra memory used by the algorithm. In this case, the character map characterMap and the vector mp are used, but their sizes are constant (26 for characterMap and 32 for mp), not dependent on the input size.
The data structures used (characterMap and mp) have a fixed size, resulting in a constant space complexity. The input string s is not counted towards the space complexity as it is part of the input, not extra memory allocated by the algorithm.
Justification of the O(1) space complexity classification comes from the fact that the amount of extra memory used does not grow with the size of the input string, maintaining a constant space requirement.

Footnote

This question is rated as Medium difficulty.

Hints

Represent the counts (odd or even) of vowels with a bitmask.

Precompute the prefix xor for the bitmask of vowels and then get the longest valid substring.