Robots.txt For ASP.NET Core MVC Application

Introduction

In this article, I will explain about robots.txt, its importance, and how to serve hosting environment specific robots.txt file in ASP.NET Core MVC web applications.

Robots.txt and Its Importance

Robots.txt is a simple text file that contains special instructions for web crawlers (mostly search engine crawlers). This file tells crawlers which areas are allowed, and which are disallowed. Robots.txt will make easy to search engines to crawl your site’s page or directories. Not all the crawlers are familiar with robots.txt instructions but most popular search engine crawler like Googlebot, Bingbot will understand robots.txt instructions.

A robots.txt file plays an important role to manage crawler traffic to your site. Or we can say that this can avoid unwanted crawler requests that cause overloading to the sites. This is also responsible for communication between your site and search engines to send correct information.

Example of Robots.txt

User-agent: *  
Disallow: /
Allow: /register

Sitemap: https://www.example.com/sitemap.xml

Instructions used in Robots.txt

As I mentioned previously not all crawlers understand robots.txt instructions but most popular crawlers will do that. This has two important parts, user-agent, and instruction with allow or disallow option. Below terms are used in a robots.txt file:

  • User-agent: This indicates the name of the crawler (crawlers name can be found here) or you can add * instead of crawlers’ name.
  • Disallow: This indicates to block certain web pages, directories, or files
  • Allow: This indicates to allow crawling of web pages, directories, or files. This also overwrites disallow.
  • *:  This indicates any characters
  • $: This indicates end of the line
  • Sitemap: This holds location or URL of the sitemap. This is optional.

You can also check Google’s robots.txt to understand its better use.

Robots.txt Based on Environment

In this section, I will explain how we can define robots.txt for each environment and how to serve them with the sites. I am not going to explain how to create a new project in Visual Studio. If you are not familiar with creating project in Visual Studio then you can follow this tutorial. All the code example I have included in this section is based on Visual Studio 2022 and .NET 7.0.

Add robots.txt to your project

To add robots.txt file, you need to follow below steps,

  1. Right click on the project then Add>New Item.
  2. Find Text File in new item window. Select it and click on Add button

Robots.txt For ASP.NET Core MVC Application

In my case I have three environments so, I created three different files with environment name as suffixes.

Code Explanation

Below is how my code looks like. This code snippet has to be in Program.cs file.

app.Use(async (context, next) => {
    if (context.Request.Path.StartsWithSegments("/robots.txt")) {
        var robotsTxtPath = Path.Combine(app.Environment.ContentRootPath, $ "robots.{app.Environment.EnvironmentName}.txt");
        string output = "User-agent: *  \nDisallow: /";
        if (File.Exists(robotsTxtPath)) {
            output = await File.ReadAllTextAsync(robotsTxtPath);
        }
        context.Response.ContentType = "text/plain";
        await context.Response.WriteAsync(output);
    } else await next();
});

Below lines of code check if the request URL is robots.txt or not

if (context.Request.Path.StartsWithSegments("/robots.txt"))

This code will format the robots file path based on the environment name.

var robotsTxtPath = Path.Combine(app.Environment.ContentRootPath, $"robots.{app.Environment.EnvironmentName}.txt");

This variable declaration holds default value of robots.txt. If file does not exist as per the environment then it will return default value.

string output = "User-agent: *  \nDisallow: /";

And this code reads the content of robots.txt file for current environment that is set in the project properties.

if (File.Exists(robotsTxtPath)) {
    output = await File.ReadAllTextAsync(robotsTxtPath);
}

And finally, I am writing the content to response. Remember that content type of robots.txt has to be text/plain always.

context.Response.ContentType = "text/plain";
await context.Response.WriteAsync(output);

Full code from my Program.cs

var builder = WebApplication.CreateBuilder(args);
// Add services to the container.
builder.Services.AddControllersWithViews();
var app = builder.Build();
// Configure the HTTP request pipeline.
if (!app.Environment.IsDevelopment()) {
    app.UseExceptionHandler("/Home/Error");
    // The default HSTS value is 30 days. You may want to change this for production scenarios, see https://aka.ms/aspnetcore-hsts.
    app.UseHsts();
}
app.Use(async (context, next) => {
    if (context.Request.Path.StartsWithSegments("/robots.txt")) {
        var robotsTxtPath = Path.Combine(app.Environment.ContentRootPath, $ "robots.{app.Environment.EnvironmentName}.txt");
        string output = "User-agent: *  \nDisallow: /";
        if (File.Exists(robotsTxtPath)) {
            output = await File.ReadAllTextAsync(robotsTxtPath);
        }
        context.Response.ContentType = "text/plain";
        await context.Response.WriteAsync(output);
    } else await next();
});
app.UseHttpsRedirection();
app.UseStaticFiles();
app.UseRouting();
app.UseAuthorization();
app.MapControllerRoute(name: "default", pattern: "{controller=Home}/{action=Index}/{id?}");
app.Run();

There are many ways that you can add robots.txt in your application. If you want to try alternative way also then you can find a good explanation by Scott Hanselman here

Summary

In this article, I discussed about robots.txt and its importance. I also explained about the code that serves robots.txt contents for each environment in ASP.NET Core MVC application. I hope you will find this article helpful. If you have any suggestions, then please feel free to ask in the comment section.