Containers' Default Storage Behavior

In Docker, when we spin up a container that is required for some kind of data storage to be used by other containers, by default the containers like these are stateless. This means they lose their state (including the data they contain), when they are removed. This is not what we want, when we run our containerized applications in the production. This is, where we use Docker volumes for persistence storage.

Docker volumes

Docker volumes bypass the union file-system of Docker used for the data storage in a container. Before a practical demo, let's understand it, when it comes to the containers. We have 3 options to be used as Docker volumes.

  1. Docker Host Level Volumes
  2. OS Host Level Volumes
  3. Volume Plugins

Docker Host Level Volumes

This is the simplest example of understanding and getting started with volume. To demonstrate this, I will use a simple BusyBox Docker image to keep things simple. The very first category of volumes that you will find is the Docker Host level volumes. This means you will use the Docker Host as the target data volume store and you will be able to reuse this volume, as soon as Docker Host is there.

If the Docker Host (or the container itself) dies, the volume will also die. Volumes are mounted, when we run them, and are demonstrated by -v flag in the run command. For example, if you run the BusyBox container, as given below.

docker run -it -v /data --name volume1 busybox

This will use the Docker-Host level volume and will launch the interactive terminal session within the container. Try to check if the directories are there and you will find the /data directory, as we specified in the command. CD into it and try to create a simple text file as, given below.

touch file.txt

Try to restart the container and re-attach it as, given below.

docker restart volume1
docker attach volume1

CD into the same directory. You will still find your text file, which you created on the volume. You can also inspect the container to see where volumes have been used by the containers with the command given below.

docker inspect volume1

OS-Host Level Volumes

Docker-Host level volumes has a problem, i.e. if we remove the Docker-Host from the system, we will lose all our data inside the container. Also, if we remove the container and re-run it, you won't find your data again. To tackle this, we use OS-Host Level Volumes. The only thing, it does is rather than writing data to the Docker-Host, it writes directly to the OS-Host file system, where Docker Host is running. Thus, if Docker Host dies and we re-run the container with the same volume path, our data will still be there on the Host OS.

To see it in action with the same example, type in the terminal, as shown below.

docker run -it -v d:\data:/data --name volume1 busybox

It will start an interactive terminal session CD into the /data directory. Create a text file and you will find the folder with the file file.txt at your host OS.

I'm on Windows, so I'm using Windows directory structure. To enable this feature on Windows, you have to go to Docker-Engine Settings => Shared Drives and mark the drive letter, where you want to allow the Docker Engine to access the directories.

Now, try to remove the container and re-run it with the same volume path, as shown below.

docker rm -f volume1
docker run -it -v d:\data:/data --name volume1 busybox

CD into the same directory and you will find your text file in the /data directory. You can also inspect the container to see host mount volume path.

Using Volume Plugins

With OS-Host Level Volume, we still have a couple of problems. What will happen if the Host Operating System dies? What if a VM that we provisioned and holds the container data got under maintenance and is unavailable for a moment? What will happen, if we use a Swarm Cluster in the production, where the containers are being orchestrated across different nodes and how do we get our data at different hosts? The preceding two techniques are not appropriate in these cases. This is where we use Volume Plugins or Volume Drivers.

Docker supports Plugins. Docker Plugins add additional functionalities to a Docker Engine, usually 3rd-Party plugins. There are different types of plugins such as Volume Plugins, Networking Plugins. If we take Volume Plugins only, there are couple of plugins for dealing with persistence storage and each has different capabilities and some of them are given below.

  • Azure File Storage
    Use Azure File Storage as it mounts to the containers running in a Swarm Cluster.

  • Flocker
    As we use Swarm for container orchestration across the different nodes in a Swarm Cluster, Flocker is Volume Orchestrator plugin i.e. it moves the volume to a different host, if the container that writes the data to it moves to another host in a Swarm Cluster. You can read more about Flocker here.

  • REX-Ray
    It is one of the popular Volume Plugins. It provides a shared mechanism for volumes across the different nodes in a Swarm Cluster. REX-Ray uses storage providers to which data volumes are written such as Amazon EC2, EMC and others.

Data Volumes with PostgreSQL Container

Now, let's see a tiny example of data persistence on a single Docker Host, using a .NET Core Application, which writes some data to a Postgres database container. I already have the Docker image of the Application on my machine, where the Server section is pointing to the name of the Postgres container in the connection string. Thus, I'll run the Postgres container with a custom name and a volume mount first, as shown below.

docker run -d -p 5432:5432 -e POSTGRES_PASSWORD=5432 --network=isolated_network --name postgres -v data:/var/lib/postgresql/data postgres

Note the path /var/lib/postgresql/data. This is where Postgres container writes its data. The command given above will spin up the container in a separate Docker network with the name isolated_network and ready for use. Now let's run the Application container with the image name aspnetcoreapp in the same network, as shown below.

docker run -d -p 5000:5000 --network=isolated_network aspnetcoreapp

Navigate to the Application in the Browser, try to register a user, and you will see that the user is registered successfully,

Now, try to remove Postgres container and re-run it with the same volume mount, as shown below.

docker rm -f postgres
docker run -d -p 5432:5432 -e POSTGRES_PASSWORD=5432 --network=isolated_network --name postgres -v data:/var/lib/postgresql/data postgres

Try to login with the same user, which you registered previously, and you will see that the user is logged in successfully. Hence, our Postgres data is persistence. You can check the used Docker volumes by names on your host, as shown below.

docker volume ls