Posts

Problem 2416 Sum of Prefix Scores of Strings

Posted on Dec 10, 2024 7 mins

Table of Contents

Problem Statement

Question

You are given an array words of size n consisting of non-empty strings.

We define the score of a string term as the number of strings words[i] such that term is a prefix of words[i].

For example, if words = ["a", "ab", "abc", "cab"], then the score of "ab" is 2, since "ab" is a prefix of both "ab" and "abc".

Return an array answer of size n where answer[i] is the sum of scores of every non-empty prefix of words[i].

Note that a string is considered as a prefix of itself.

Example 1:

Input: words = ["abc","ab","bc","b"]
Output: [5,4,3,2]
Explanation: The answer for each string is the following:
- "abc" has 3 prefixes: "a", "ab", and "abc".
- There are 2 strings with the prefix "a", 2 strings with the prefix "ab", and 1 string with the prefix "abc".
The total is answer[0] = 2 + 2 + 1 = 5.
- "ab" has 2 prefixes: "a" and "ab".
- There are 2 strings with the prefix "a", and 2 strings with the prefix "ab".
The total is answer[1] = 2 + 2 = 4.
- "bc" has 2 prefixes: "b" and "bc".
- There are 2 strings with the prefix "b", and 1 string with the prefix "bc".
The total is answer[2] = 2 + 1 = 3.
- "b" has 1 prefix: "b".
- There are 2 strings with the prefix "b".
The total is answer[3] = 2.

Example 2:

Input: words = ["abcd"]
Output: [4]
Explanation:
"abcd" has 4 prefixes: "a", "ab", "abc", and "abcd".
Each prefix has a score of one, so the total is answer[0] = 1 + 1 + 1 + 1 = 4.

Constraints:

1 <= words.length <= 1000
1 <= words[i].length <= 1000
words[i] consists of lowercase English letters.

Solution

class TrieNode{
public:
    TrieNode* child[26];
    int countChild;

    TrieNode(){
        for(int i=0;i<26;i++){
            child[i]=nullptr;
        }
        countChild=0;
    }
};

class Trie{
public:
    TrieNode* root;

    Trie(){
        root=new TrieNode();
    }

    void insert(string word){
        TrieNode* node=root;
        for(auto c:word){
            int index=c-'a';
            if(node->child[index]==nullptr){
                node->child[index]=new TrieNode();
            }
            node=node->child[index];
            node->countChild++;
        }
    }

    int getScore(string word){
        int count=0;
        TrieNode* node=root;
        for(auto c:word){
            int index=c-'a';
            node=node->child[index];
            count+=node->countChild;
        }
        return count;
    }
};

class Solution {
public:
    vector<int> sumPrefixScores(vector<string>& words) {
        vector<int> ans;
        Trie trie;

        for(auto word:words){
            trie.insert(word);
        }

        for(auto word:words){
           int count=trie.getScore(word);
           ans.push_back(count);
        }

        return ans;
    }
};

Complexity Analysis

| Algorithm                       | Time Complexity | Space Complexity |
| ------------------------------- | --------------- | ---------------- |
| Trie construction and traversal | O(nm)           | O(nm)            |

Explanation

Intial Thoughts

To approach this problem, we should think of how strings can be interconnected, such as a tree-like structure where each node represents a character. We should also consider how to efficiently count the occurrences of each prefix. The given words can be seen as paths in this tree, and the score of a string can be determined by how many paths pass through its prefixes. Another key point is that a string is also a prefix of itself, which needs to be accounted for in our solution. We can use a data structure like a Trie to model this structure and calculate the scores. We can iterate over each word, and for each word, we can iterate over its prefixes to calculate the score.

Intuitive Analysis

Intuitively, solving this problem involves understanding the relationship between the given words and their prefixes. We can imagine a library where each word is a book, and the prefixes are like catalog numbers that help us find the books. We need to create a system that can efficiently count how many books (words) each catalog number (prefix) points to. A Trie data structure is perfect for this, as it allows us to visualize the interconnectedness of the words and their prefixes. By inserting each word into the Trie and then traversing the Trie for each word’s prefixes, we can intuitively understand how to calculate the scores. The key is to recognize that the score of a string is essentially the sum of the counts of its prefixes in the Trie.

1. Intuition

The problem requires calculating the sum of scores of every non-empty prefix of each word in the given array of words.
A score of a string term is defined as the number of strings that have term as a prefix.
To efficiently calculate the scores, a Trie data structure can be utilized to store the words and their prefixes.
The Trie allows for fast lookup and insertion of words, making it suitable for this problem.
By iterating through each word and its prefixes, the score of each prefix can be calculated using the Trie.
The sum of scores of all prefixes for each word can then be computed and returned as the result.
This approach ensures that the solution is both efficient and accurate.

2. Implementation

The TrieNode class is defined to represent each node in the Trie, containing an array child to store child nodes and a countChild variable to keep track of the number of child nodes.
The Trie class is implemented with methods insert and getScore to insert words into the Trie and calculate the score of a given word, respectively.
In the insert method, each character of the word is iterated through and the corresponding child node is created if it does not exist, with the countChild variable incremented accordingly.
The getScore method traverses the Trie for a given word, summing up the countChild values of each node to calculate the score.
The Solution class utilizes the Trie class to solve the problem, first inserting all words into the Trie and then calculating the sum of scores for each word using the getScore method.
The result is stored in the ans vector and returned as the final output.
The use of a Trie data structure enables an efficient solution with a time complexity of O(n*m), where n is the number of words and m is the maximum length of a word.

Complexity Analysis

Time Complexity:

The time complexity is determined by the two nested loops in the sumPrefixScores method: one for inserting words into the Trie and another for calculating the prefix scores. Each word is iterated over once for insertion and once for scoring, resulting in a total of 2n*m operations, where n is the number of words and m is the average length of a word.
The dominant operation is the traversal of each word’s characters, which occurs in both the insert and getScore methods. The insert method creates new nodes if necessary, and the getScore method accumulates the count of child nodes. Both operations have a time complexity of O(m), and since they are performed for each word, the overall time complexity becomes O(n*m).
The Big O classification of O(n*m) is justified because the algorithm’s running time grows linearly with the product of the number of words and the average length of a word. This is due to the fact that each word is processed twice (once for insertion and once for scoring), and each character in a word is processed once.

Space Complexity:

The space complexity is driven by the storage required for the Trie data structure. Each node in the Trie has an array of 26 possible child nodes, which corresponds to the 26 letters of the alphabet.
The impact of the Trie data structure on space complexity is significant because each unique prefix of a word requires a separate node in the Trie. The number of nodes in the Trie grows with the number of unique prefixes, which can be as high as nm in the worst case (when all words are unique and have different lengths).
The justification of the space complexity being O(nm) lies in the fact that in the worst-case scenario, every character in every word requires a new node in the Trie, resulting in a total of nm nodes. This occurs when all words are distinct and have distinct prefixes, maximizing the number of nodes needed to store all words.

Footnote

This question is rated as Hard difficulty.

Hints

What data structure will allow you to efficiently keep track of the score of each prefix?

Use a Trie. Insert all the words into it, and keep a counter at each node that will tell you how many times we have visited each prefix.

Title	URL	Difficulty
Design Add and Search Words Data Structure	https://leetcode.com/problems/design-add-and-search-words-data-structure	Medium
Maximum XOR of Two Numbers in an Array	https://leetcode.com/problems/maximum-xor-of-two-numbers-in-an-array	Medium
Map Sum Pairs	https://leetcode.com/problems/map-sum-pairs	Medium

Problem 2416 Sum of Prefix Scores of Strings

Problem Statement

Question

Solution

Complexity Analysis

Explanation

Intial Thoughts

Intuitive Analysis

1. Intuition

2. Implementation

Complexity Analysis

Time Complexity:

Space Complexity:

Footnote

Hints

Similar Questions: