Problem 1233 Remove Sub-Folders from the Filesystem
Table of Contents
Problem Statement
Link - Problem 1233
Question
Given a list of folders folder
, return the folders after removing all sub-folders in those folders. You may return the answer in any order.
If a folder[i]
is located within another folder[j]
, it is called a sub-folder of it. A sub-folder of folder[j]
must start with folder[j]
, followed by a "/"
. For example, "/a/b"
is a sub-folder of "/a"
, but "/b"
is not a sub-folder of "/a/b/c"
.
The format of a path is one or more concatenated strings of the form: '/'
followed by one or more lowercase English letters.
- For example,
"/leetcode"
and"/leetcode/problems"
are valid paths while an empty string and"/"
are not.
Example 1:
Input: folder = ["/a","/a/b","/c/d","/c/d/e","/c/f"] Output: ["/a","/c/d","/c/f"] Explanation: Folders "/a/b" is a subfolder of "/a" and "/c/d/e" is inside of folder "/c/d" in our filesystem.
Example 2:
Input: folder = ["/a","/a/b/c","/a/b/d"] Output: ["/a"] Explanation: Folders "/a/b/c" and "/a/b/d" will be removed because they are subfolders of "/a".
Example 3:
Input: folder = ["/a/b/c","/a/b/ca","/a/b/d"] Output: ["/a/b/c","/a/b/ca","/a/b/d"]
Constraints:
1 <= folder.length <= 4 * 104
2 <= folder[i].length <= 100
folder[i]
contains only lowercase letters and'/'
.folder[i]
always starts with the character'/'
.- Each folder name is unique.
Solution
class Solution {
public:
vector<string> removeSubfolders(vector<string>& folder) {
sort(folder.begin(), folder.end());
vector<string> ans;
ans.push_back(folder[0]);
for(int i = 1; i < folder.size(); i++) {
string lastFolder = ans.back();
lastFolder.push_back('/');
if(folder[i].compare(0, lastFolder.size(), lastFolder) != 0) {
ans.push_back(folder[i]);
}
}
return ans;
}
};
Complexity Analysis
| Algorithm | Time Complexity | Space Complexity |
| -------------- | --------------- | ---------------- |
| Sorted Compare | O(n log n) | O(n) |
Explanation
1. Intuition
- The problem can be solved by sorting the folders and then iterating through them to find the non-subfolders.
- The key insight is that a subfolder will always have a longer path than its parent folder.
- By sorting the folders, we can easily identify the subfolders by comparing the path of the current folder with the path of the last non-subfolder.
- If the current folder’s path starts with the last non-subfolder’s path, it is a subfolder and should be skipped.
- This approach works because the sorting ensures that parent folders come before their subfolders.
- The time complexity of this approach is O(n log n) due to the sorting, where n is the number of folders.
- The space complexity is O(n) for storing the result.
2. Implementation
- The solution starts by sorting the input vector
folder
using thesort
function. - The first non-subfolder is added to the result vector
ans
. - Then, the solution iterates through the sorted folders starting from the second folder.
- For each folder, it checks if the folder’s path starts with the last non-subfolder’s path by comparing the first
lastFolder.size()
characters. - If the folder’s path does not start with the last non-subfolder’s path, it is a non-subfolder and is added to the result vector
ans
. - The solution uses the
compare
function to compare the folder’s path with the last non-subfolder’s path. - The
push_back
function is used to add the non-subfolders to the result vectorans
.
Complexity Analysis
Time Complexity:
- The time complexity of this solution is dominated by the sorting operation
sort(folder.begin(), folder.end())
. This operation has a time complexity of O(n log n) because it uses an efficient sorting algorithm, most likely a variation of quicksort or mergesort. - The subsequent for loop iterates over the sorted list and compares each folder path with the last accepted folder path. This operation has a time complexity of O(n) because it only iterates over the list once.
- However, since the for loop is dominated by the sorting operation, the overall time complexity remains O(n log n).
Space Complexity:
- The space complexity of this solution is primarily due to the extra space used to store the result vector
ans
. In the worst case, the size ofans
will be the same as the input vectorfolder
. - Additionally, the sorting operation may require some extra space for swapping elements, but this is typically negligible compared to the space used by the input and output vectors.
- Therefore, the overall space complexity is O(n).
Footnote
This question is rated as Medium difficulty.
Hints
Sort the folders lexicographically.
Insert the current element in an array and then loop until we get rid of all of their subfolders, repeat this until no element is left.