Data Structures and Algorithms (DSA)  

Substrings With Exactly K Distinct Characters

Problem Summary

In this problem, we are given:

  • A string s

  • An integer k

Our task is to count all substrings that contain exactly k distinct characters.

A substring is a continuous part of a string.

This is one of the most important Sliding Window problems in Data Structures and Algorithms and is commonly asked in coding interviews, competitive programming, and Java DSA interview preparation.

Example

Input

s = "aba", k = 2

Output

3

Valid Substrings

["ab", "ba", "aba"]

These substrings contain exactly 2 distinct characters.

Understanding the Main Idea

At first glance, counting substrings with exactly k distinct characters may look difficult.

Instead of solving it directly, we use a very smart and optimized technique.

Important Formula

Exactly(K) = AtMost(K) - AtMost(K - 1)

This is the key concept behind solving this Sliding Window algorithm efficiently.

Why Does This Formula Work?

Let us understand this in simple words.

AtMost(K)

This includes all substrings that contain:

  • 1 distinct character

  • 2 distinct characters

  • 3 distinct characters

  • ... up to K distinct characters

AtMost(K - 1)

This includes all substrings that contain:

  • 1 distinct character

  • 2 distinct characters

  • ... up to K - 1 distinct characters

When we subtract them:

AtMost(K) - AtMost(K - 1)

we only get the substrings that contain exactly K distinct characters.

This optimization is widely used in advanced Sliding Window interview questions.

Step 1: Solve the AtMost(K) Problem

To solve the main problem efficiently, we first create a function that counts substrings with at most K distinct characters.

We use:

  • Sliding Window technique

  • Two pointers (left and right)

  • Frequency array to track character counts

This approach gives an optimized O(n) time complexity solution.

How Sliding Window Works

The Sliding Window algorithm follows these steps:

Expand the Window

Move the right pointer forward and include new characters.

Track Character Frequency

Use a frequency array of size 26 because the string contains lowercase English letters.

Count Distinct Characters

Whenever a new character appears for the first time, increase the distinct count.

Shrink the Window

If distinct characters become greater than k, move the left pointer forward until the condition becomes valid again.

Count Valid Substrings

At every step, add:

(right - left + 1)

because all substrings ending at right are valid.

Java Implementation

class Solution {
    
    public int countSubstr(String s, int k) {
        return atMostK(s, k) - atMostK(s, k - 1);
    }
    
    private int atMostK(String s, int k) {
        int left = 0, count = 0, distinct = 0;
        int[] freq = new int[26]; // since lowercase letters
        
        for (int right = 0; right < s.length(); right++) {
            char ch = s.charAt(right);
            
            if (freq[ch - 'a'] == 0) distinct++;
            freq[ch - 'a']++;
            
            // shrink window if distinct > k
            while (distinct > k) {
                char leftChar = s.charAt(left);
                freq[leftChar - 'a']--;
                
                if (freq[leftChar - 'a'] == 0) distinct--;
                left++;
            }
            
            count += (right - left + 1);
        }
        
        return count;
    }
}

Step-by-Step Dry Run

Input

s = "abc", k = 2

Step 1: Find AtMost(2)

Valid substrings are:

["a", "b", "c", "ab", "bc"]

So:

AtMost(2) = 5

Step 2: Find AtMost(1)

Valid substrings are:

["a", "b", "c"]

So:

AtMost(1) = 3

Final Calculation

Exactly(2) = 5 - 3 = 2

Correct Answer

2

Complexity Analysis

Time Complexity

O(n)

Each character is visited at most twice:

  • Once by the right pointer

  • Once by the left pointer

This makes the Sliding Window approach highly optimized.

Space Complexity

O(1)

We only use a fixed-size frequency array of 26 characters.

Why Frequency Array Is Better Than HashMap

For lowercase English letters, a frequency array is faster than a HashMap because:

  • Array access is faster

  • Less memory overhead

  • Better performance in coding interviews and competitive programming

Example:

int[] freq = new int[26];

This is a common Java optimization technique in string-based Sliding Window problems.

Key Takeaways

  • Sliding Window is the core technique used in this problem.

  • Convert Exactly K problems into AtMost K problems.

  • Use the formula:

Exactly(K) = AtMost(K) - AtMost(K - 1)
  • Frequency arrays are faster than HashMaps for lowercase characters.

  • This approach gives an efficient O(n) solution.

Pattern Recognition

This Sliding Window pattern appears in many important DSA and coding interview problems.

Common Problems Using the Same Pattern

  • Subarrays with K distinct integers

  • Longest substring with K distinct characters

  • Fruit Into Baskets problem

  • Longest substring without repeating characters

  • Binary subarrays with sum

Learning this pattern helps in solving many advanced Sliding Window interview questions.

Summary

The problem of counting substrings with exactly K distinct characters can be solved efficiently using the Sliding Window technique and the mathematical relationship between Exactly(K) and AtMost(K). Instead of checking every substring using brute force, we optimize the solution to O(n) time complexity by maintaining a dynamic window and tracking character frequencies. This pattern is extremely important for Java DSA, coding interviews, competitive programming, and real-world string processing problems.