Problem Summary
In this problem, we are given:
Our task is to count all substrings that contain exactly k distinct characters.
A substring is a continuous part of a string.
This is one of the most important Sliding Window problems in Data Structures and Algorithms and is commonly asked in coding interviews, competitive programming, and Java DSA interview preparation.
Example
Input
s = "aba", k = 2
Output
3
Valid Substrings
["ab", "ba", "aba"]
These substrings contain exactly 2 distinct characters.
Understanding the Main Idea
At first glance, counting substrings with exactly k distinct characters may look difficult.
Instead of solving it directly, we use a very smart and optimized technique.
Important Formula
Exactly(K) = AtMost(K) - AtMost(K - 1)
This is the key concept behind solving this Sliding Window algorithm efficiently.
Why Does This Formula Work?
Let us understand this in simple words.
AtMost(K)
This includes all substrings that contain:
AtMost(K - 1)
This includes all substrings that contain:
When we subtract them:
AtMost(K) - AtMost(K - 1)
we only get the substrings that contain exactly K distinct characters.
This optimization is widely used in advanced Sliding Window interview questions.
Step 1: Solve the AtMost(K) Problem
To solve the main problem efficiently, we first create a function that counts substrings with at most K distinct characters.
We use:
This approach gives an optimized O(n) time complexity solution.
How Sliding Window Works
The Sliding Window algorithm follows these steps:
Expand the Window
Move the right pointer forward and include new characters.
Track Character Frequency
Use a frequency array of size 26 because the string contains lowercase English letters.
Count Distinct Characters
Whenever a new character appears for the first time, increase the distinct count.
Shrink the Window
If distinct characters become greater than k, move the left pointer forward until the condition becomes valid again.
Count Valid Substrings
At every step, add:
(right - left + 1)
because all substrings ending at right are valid.
Java Implementation
class Solution {
public int countSubstr(String s, int k) {
return atMostK(s, k) - atMostK(s, k - 1);
}
private int atMostK(String s, int k) {
int left = 0, count = 0, distinct = 0;
int[] freq = new int[26]; // since lowercase letters
for (int right = 0; right < s.length(); right++) {
char ch = s.charAt(right);
if (freq[ch - 'a'] == 0) distinct++;
freq[ch - 'a']++;
// shrink window if distinct > k
while (distinct > k) {
char leftChar = s.charAt(left);
freq[leftChar - 'a']--;
if (freq[leftChar - 'a'] == 0) distinct--;
left++;
}
count += (right - left + 1);
}
return count;
}
}
Step-by-Step Dry Run
Input
s = "abc", k = 2
Step 1: Find AtMost(2)
Valid substrings are:
["a", "b", "c", "ab", "bc"]
So:
AtMost(2) = 5
Step 2: Find AtMost(1)
Valid substrings are:
["a", "b", "c"]
So:
AtMost(1) = 3
Final Calculation
Exactly(2) = 5 - 3 = 2
Correct Answer
2
Complexity Analysis
Time Complexity
O(n)
Each character is visited at most twice:
This makes the Sliding Window approach highly optimized.
Space Complexity
O(1)
We only use a fixed-size frequency array of 26 characters.
Why Frequency Array Is Better Than HashMap
For lowercase English letters, a frequency array is faster than a HashMap because:
Example:
int[] freq = new int[26];
This is a common Java optimization technique in string-based Sliding Window problems.
Key Takeaways
Exactly(K) = AtMost(K) - AtMost(K - 1)
Pattern Recognition
This Sliding Window pattern appears in many important DSA and coding interview problems.
Common Problems Using the Same Pattern
Subarrays with K distinct integers
Longest substring with K distinct characters
Fruit Into Baskets problem
Longest substring without repeating characters
Binary subarrays with sum
Learning this pattern helps in solving many advanced Sliding Window interview questions.
Summary
The problem of counting substrings with exactly K distinct characters can be solved efficiently using the Sliding Window technique and the mathematical relationship between Exactly(K) and AtMost(K). Instead of checking every substring using brute force, we optimize the solution to O(n) time complexity by maintaining a dynamic window and tracking character frequencies. This pattern is extremely important for Java DSA, coding interviews, competitive programming, and real-world string processing problems.