Understanding HashSet in C#

Article

HashSet in C# is a collection type that stores unique elements and does not allow duplicates. It is an unordered collection optimized for fast lookup, insertion, and removal using hashing. HashSet is ideal when we need to ensure unique data and perform set operations like unions, intersections, and differences efficiently.

List allows duplicate values and performs slower searches, because it checks items one by one.
HashSet, on the other hand, stores only unique values and offers much faster lookups due to its hash-based structure.

What is HashSet in C#

HashSet belongs to the System.Collections.Generic namespace and was introduced in .NET 3.5.

A HashSet<T> is a collection that implements the set data structure concept.

T stands for the type of the elements in the set (e.g., int, string, a custom class).
It is an unordered collection. Elements are not stored in any particular sequence, and you cannot access them by index.
It enforces uniqueness. If you try to add an element that already exists in the set, the operation is ignored, and the set remains unchanged.
It provides extremely fast performance for operations like adding, removing, and checking for the existence of an element. These operations are typically O(1) constant time.
O(1), or constant time complexity, means that the time it takes to complete an operation is fixed and does not depend on the size of the input. In other words, whether we have a small dataset or a very large one, the operation takes the same amount of time to execute. This is considered highly efficient because the runtime remains constant regardless of data size.
For example, accessing an element by index in an array or checking membership in a HashSet typically operates in O(1) time, meaning it performs a fixed number of steps even if the collection grows larger. This is why HashSet methods like Add, Remove, or Contains are said to work in average O(1) time, offering very fast performance for these operations.

How Does it Achieve Uniqueness and Speed?

The Hashset achieves its speed and uniqueness by utilizing a technique called hashing.

When an element is added, the set first calls the object's GetHashCode() method to calculate an integer hash code.
This hash code determines a "bucket" or index where the element should be stored in the underlying data structure (often a specialized form of a hash table).
To ensure uniqueness, the set then checks if an element with the exact same value already exists in that bucket by calling the object's Equals() method. If both the hash code and the value are equal to an existing element, the insertion is blocked.

This hash-based approach allows the runtime to quickly jump to the correct location in memory, resulting in blazing fast lookups.

HashSet Declaration and Syntax

The Hashset<T> syntax is straightforward and uses the C# Generics feature, where the <T> specifies the type of element the set will hold.

1. The using Directive

Because Hashset<T> is part of the Collections framework, we should always include the following at the top of your C# file:

using System.Collections.Generic;

2.Basic Declaration

HashSet<T> variableName = new HashSet<T>();

Example

Imagine building a feature for an event management system where we want to keep track of unique attendees' email addresses. Using a list would require extra code to check for duplicates manually. With HashSet, we can add email addresses directly, and it ensures no duplicates exist automatically:

HashSet<string> attendeeEmails = new HashSet<string>();
attendeeEmails.Add("[email protected]");
attendeeEmails.Add("[email protected]");
attendeeEmails.Add("[email protected]"); // This will be ignored, no duplicates allowed

foreach (var email in attendeeEmails)
    Console.WriteLine(email);

will be:

[email protected]
 [email protected]

Explanation

The HashSet stores only unique elements.
When adding "[email protected]" the second time, it is ignored because it's already present in the HashSet.
The foreach loop prints each unique email once, hence "[email protected]" appears only one time, followed by "[email protected]".
The order in a HashSet is not guaranteed, so the printed order could vary, but it will only contain the two unique emails added.

This behavior ensures no duplicates exist in the collection.

Common Operations in HashSet

1. Basic Manipulation Operations

Method	Description	Example	Time Complexity
Add(T item)	Attempts to add a specified element to the set. Returns true if the element was added (it was unique), and false if the element already existed (it was ignored).	mySet.Add("data");	O(1)
Remove(T item)	Removes the specified element from the set. Returns true if the element was successfully found and removed, and false otherwise.	mySet.Remove("old");	O(1)
Contains(T item)	Checks if the specified element exists in the set. Returns true or false.	mySet.Contains("test");	O(1) (The most efficient lookup)
Count	Gets the number of elements currently in the set (a property, not a method).	int count = mySet.Count;	O(1)
Clear()	Removes all elements from the HashSet.	mySet.Clear();	O(n)

2. Set Mathematics Operations

These methods allow you to compare and combine two HashSet objects based on mathematical set theory principles. These are the operations that truly differentiate HashSet from other collections like List<T>.

Operation	Method	Description	Analogy
Union	UnionWith(IEnumerable<T> other)	Modifies the current set to contain all elements that are present in either the current set OR the other collection.	Combining two contact lists into one master list.
Intersection	IntersectWith(IEnumerable<T> other)	Modifies the current set to contain only the elements that are present in BOTH the current set AND the other collection.	Finding the common users who belong to two separate security groups.
Difference	ExceptWith(IEnumerable<T> other)	Modifies the current set by removing all elements that are also present in the other collection.	Finding employees who are currently on the payroll but NOT signed up for the new training.
Symmetric Difference	SymmetricExceptWith(IEnumerable<T> other)	Modifies the current set to contain elements that are present in one set or the other, but NOT in both.	Finding elements that are unique to Set A or unique to Set B (the elements on the "outside" of the intersection).

3. Subset and Superset Relations

Method	Description	Analogy
IsSubsetOf(IEnumerable<T> other)	Checks if the current set is entirely contained within the other collection. Returns true if every element in the current set is also in the other set.	The Shopping List: Is everything on our small shopping list (current set) also available in the store's inventory (other set)?
IsSupersetOf(IEnumerable<T> other)	Checks if the current set contains all the elements of the other collection. Returns true if the current set holds every element in the other set.	The Recipe Ingredients: Does our pantry's contents (current set) contain all the mandatory ingredients required by the recipe (other set)?
Overlaps(IEnumerable<T> other)	Checks if the two collections share at least one common element. Returns true if their intersection is not empty.	The Movie Schedules: Do the showtimes for Movie A (current set) and Movie B (other set) conflict by sharing at least one common time slot?

HashSet use-cases and advantages

1. Eliminating Duplicate Data

HashSet is primarily used when we must ensure uniqueness. It automatically removes duplicates without extra validation code. This is useful in real-world scenarios like importing holidays, student roll numbers, product codes, or any data where repeated values must be avoided.

2. Fast Lookup Performance

HashSet provides very fast lookups because it uses hashing internally. Checking whether an item exists is almost instant, even in large collections. This makes it suitable for scenarios like verifying if an email, employee ID, or token has already been used.

3. Efficient Set Operations

HashSet supports built-in operations like Union, Intersection, and Difference. These operations help combine or compare lists efficiently. For example, finding common employees between two project teams becomes very simple and faster compared to manual looping.

4. Preventing Repeated Processing

When processing a bulk list, HashSet helps ensure that an item is processed only once. This prevents issues like double insertion, duplicate calculations, or repeated validations. It is useful in Excel uploads, log processing, or any batch system.

5. Cleaner and Maintainable Code

Using HashSet reduces the need for long “check-before-add” conditions. It results in cleaner, more readable, and less error-prone code. Whenever you need to ensure that values remain unique. such as roles, permissions, categories, or tags—HashSet is the most straightforward choice.

Conclusion

C# HashSet is a powerful collection designed to store only unique values, making it perfect for avoiding duplicates automatically. It offers very fast lookup and insert operations because it uses hashing internally. HashSet works best when we need to quickly check whether an item exists or not, rather than maintaining order. Overall, it helps improve performance while keeping our data clean, efficient, and duplicate-free.