Bulk Data Insertion with Entity Framework in C#

Introduction

In this blog post, we'll explore how to perform bulk data insertion using Entity Framework in a C# application. Bulk insertion is a common requirement when dealing with large datasets, and it's essential to handle errors gracefully and efficiently. We'll cover the step-by-step process, including setting up the Entity Framework context, implementing retry logic for failed insertions, and handling errors effectively.

Step 1. Setting up the Entity Framework Context

First, let's create an Entity Framework context to interact with our database. Assume we have a simple DbContext class named AppDbContext with a DbSet for our entity type MyEntity.

public class AppDbContext : DbContext
{
    public DbSet<MyEntity> MyEntities { get; set; }

    // Constructor to configure database connection
    public AppDbContext(DbContextOptions<AppDbContext> options) : base(options)
    {
    }
}

Step 2. Implementing Bulk Data Insertion

Next, let's implement the method to perform bulk data insertion using Entity Framework. We'll use the AddRange method to add multiple entities to the context and then call SaveChanges to persist the changes to the database.

public class BulkDataProcessor
{
    private readonly AppDbContext _dbContext;

    public BulkDataProcessor(AppDbContext dbContext)
    {
        _dbContext = dbContext;
    }

    public void InsertBulkData(List<MyEntity> entities)
    {
        _dbContext.MyEntities.AddRange(entities);
        _dbContext.SaveChanges();
    }
}

Step 3. Implementing Retry Logic for Failed Insertions

To handle scenarios where bulk insertion fails due to transient errors (e.g., database connection issues), we'll implement retry logic. We'll retry the insertion operation a configurable number of times with a delay between retries.

public class BulkDataProcessor
{
    // Previous code remains unchanged

    public void InsertBulkDataWithRetry(List<MyEntity> entities, int maxRetries = 3, TimeSpan delayBetweenRetries = default)
    {
        int retries = 0;
        bool success = false;

        while (!success && retries < maxRetries)
        {
            try
            {
                InsertBulkData(entities);
                success = true; // Mark insertion as successful
            }
            catch (DbUpdateException ex) when (IsTransientError(ex) && retries < maxRetries - 1)
            {
                // Transient error occurred, retry after delay
                retries++;
                if (delayBetweenRetries != default)
                    Thread.Sleep(delayBetweenRetries);
            }
        }

        if (!success)
        {
            // Log or handle failed insertion after retries
            Console.WriteLine($"Bulk data insertion failed after {maxRetries} retries.");
        }
    }

    private bool IsTransientError(DbUpdateException ex)
    {
        // Check if the exception is due to a transient database error
        // Implement logic to identify transient errors based on the exception type or message
        return true; // Placeholder implementation
    }
}

Let's integrate batch size handling into the bulk data insertion process using Entity Framework.

public class BulkDataProcessor
{
    private readonly AppDbContext _dbContext;
    private const int DefaultBatchSize = 1000; // Default batch size

    public BulkDataProcessor(AppDbContext dbContext)
    {
        _dbContext = dbContext;
    }

    public void InsertBulkData(List<MyEntity> entities, int batchSize = DefaultBatchSize)
    {
        for (int i = 0; i < entities.Count; i += batchSize)
        {
            IEnumerable<MyEntity> batch = entities.Skip(i).Take(batchSize);
            _dbContext.MyEntities.AddRange(batch);
            _dbContext.SaveChanges();
        }
    }

    public void InsertBulkDataWithRetry(List<MyEntity> entities, int maxRetries = 3, TimeSpan delayBetweenRetries = default, int batchSize = DefaultBatchSize)
    {
        int retries = 0;
        bool success = false;

        while (!success && retries < maxRetries)
        {
            try
            {
                InsertBulkData(entities, batchSize);
                success = true; // Mark insertion as successful
            }
            catch (DbUpdateException ex) when (IsTransientError(ex) && retries < maxRetries - 1)
            {
                // Transient error occurred, retry after delay
                retries++;
                if (delayBetweenRetries != default)
                    Thread.Sleep(delayBetweenRetries);
            }
        }

        if (!success)
        {
            // Log or handle failed insertion after retries
            Console.WriteLine($"Bulk data insertion failed after {maxRetries} retries.");
        }
    }

    private bool IsTransientError(DbUpdateException ex)
    {
        // Check if the exception is due to a transient database error
        // Implement logic to identify transient errors based on the exception type or message
        return true; // Placeholder implementation
    }
}

In this updated code

  • We've added a batchSize parameter to the InsertBulkData and InsertBulkDataWithRetry methods, allowing the caller to specify the size of each batch.
  • Inside the InsertBulkData method, we iterate over the list of entities in batches of the specified size and perform bulk insertion for each batch.
  • The InsertBulkDataWithRetry method now accepts an additional batchSize parameter and passes it to the InsertBulkData method for batched insertion.
  • By utilizing batched insertion, we can efficiently process large datasets while minimizing the impact on memory consumption and database performance.

To implement the rolling back failed batches and retrying the insertion process, we need to enhance our code with error handling, retry logic, and tracking of failed batches. Below is an updated version of the BulkDataProcessor class incorporating these features.

public class BulkDataProcessor
{
    private readonly AppDbContext _dbContext;
    private const int DefaultBatchSize = 1000; // Default batch size
    private const int DefaultMaxRetries = 3;   // Default max retry attempts
    private const int DefaultDelayMs = 1000;   // Default delay between retry attempts in milliseconds

    public BulkDataProcessor(AppDbContext dbContext)
    {
        _dbContext = dbContext;
    }

    public void InsertBulkData(List<MyEntity> entities, int batchSize = DefaultBatchSize)
    {
        using (var transaction = _dbContext.Database.BeginTransaction())
        {
            try
            {
                InsertBatchedData(entities, batchSize);
                transaction.Commit(); // Commit transaction if all batches are successful
            }
            catch (Exception ex)
            {
                transaction.Rollback(); // Rollback transaction on error
                throw ex;
            }
        }
    }

    private void InsertBatchedData(List<MyEntity> entities, int batchSize)
    {
        for (int i = 0; i < entities.Count; i += batchSize)
        {
            IEnumerable<MyEntity> batch = entities.Skip(i).Take(batchSize);
            _dbContext.MyEntities.AddRange(batch);
            _dbContext.SaveChanges();
        }
    }

    public void InsertBulkDataWithRetry(List<MyEntity> entities, int maxRetries = DefaultMaxRetries, int batchSize = DefaultBatchSize, int delayMs = DefaultDelayMs)
    {
        int retries = 0;

        while (retries < maxRetries)
        {
            try
            {
                InsertBulkData(entities, batchSize);
                return; // Exit method if successful
            }
            catch (Exception ex)
            {
                // Log error or perform any necessary handling
                Console.WriteLine($"Error occurred: {ex.Message}");
                retries++;
                Thread.Sleep(delayMs); // Delay before retry
            }
        }

        // If max retries exceeded, log or handle accordingly
        Console.WriteLine($"Max retries exceeded. Unable to insert bulk data.");
    }
}

In this updated implementation

  • We define default values for maximum retry attempts (DefaultMaxRetries), delay between retry attempts in milliseconds (DefaultDelayMs), and batch size (DefaultBatchSize).
  • The InsertBulkData method now encapsulates the entire batched insertion process within a database transaction. If an error occurs during any batch, the entire transaction is rolled back to maintain data consistency.
  • We introduce the InsertBulkDataWithRetry method, which attempts to insert bulk data with retry logic. If an exception occurs during insertion, the method retries the operation up to the specified maximum number of times (maxRetries). It also includes a delay between retry attempts (delayMs) to prevent excessive load on the system.
  • The retry loop continues until the insertion is successful or the maximum number of retries is reached. If the maximum number of retries is exceeded, an appropriate message is logged or handled accordingly.

Conclusion

In this blog post, we've learned how to perform bulk data insertion using Entity Framework in a C# application. We've covered setting up the Entity Framework context, implementing bulk insertion logic, and handling errors with retry logic. By following these steps, you can efficiently handle large-scale data insertion tasks in your applications.

You can use InsertBulkDataWithRetry in your application code to perform bulk data insertion with retry logic, providing the list of entities to be inserted and optionally specifying the maximum number of retries, batch size, and delay between retries.