Docker images
The Docker project has established itself as a standard for container virtualisation with software of the same name. A key concept when using Docker platforms is the Docker image. In this article, we will explain how Docker images are built and how they work.
What is a Docker image?
You may already be familiar with the term ‘image’ in the context of virtualisation with virtual machines (VMs). Usually, a VM image is a copy of an operating system. A VM image may contain other installed components such as databases and web servers. The term comes from a time when software was distributed on optical data carriers like CD-ROMs and DVDs. If you wanted to create a local copy of the data carrier, you had to create an ‘image’ with special software.
Container virtualisation is the logical further development of VM virtualisation. Instead of virtualising a virtual computer (machine) with its own operating system, a Docker image usually consists of just one application. This could be an individual binary file or a combination of several software components.
To run the application, a container is first created from the image. All containers running on a Docker host use the same operating system kernel. As a result, Docker containers and Docker images are usually significantly more lightweight than comparable virtual machines and their images.
Docker containers and Docker images are closely linked concepts. As such, not only can a Docker container be created from a Docker image, but a new image can also be created from a running container. This is why we say that Docker images and Docker containers have a chicken-and-egg relationship:
Docker command | Description | Chicken-egg analogy |
docker run <image-id> | Create a Docker container from an image | Chick hatches from an egg |
docker commit <container-id> | Create a Docker image from a container | Hen lays a new egg |
In the biological chicken-and-egg system, exactly one chick is produced from one egg. The egg is lost in the process. In contrast, a Docker image can be used to create an unlimited number of similar containers. This reproducibility makes Docker an ideal platform for scalable applications and services.
A Docker image is an unchangeable template that can be used repeatedly to create Docker containers. The image contains all the information and dependencies needed to run a container, including all basic program libraries and user interfaces. There is usually a command-line environment (‘shell’) and an implementation of the C standard library on board. Here is an overview of the official ‘Alpine Linux’ image:
Linux kernel | C standard library | Unix command |
From the host | musl libc | BusyBox |
Alongside these basic components that supplement the Linux kernel, a Docker image usually also contains additional software. Below are a few examples of software components for different areas of application. Please note that a single Docker image usually contains a small selection of the components shown:
Area of application | Software components |
Programming languages | PHP, Python, Ruby, Java, JavaScript |
Development tools | node/npm, React, Laravel |
Database systems | MySQL, Postgres, MongoDB, Redis |
Web servers | Apache, nginx, lighttpd |
Caches and proxies | Varnish, Squid |
Content management systems | WordPress, Magento, Ruby on Rails |
How does a Docker image differ from a Docker container?
As we have seen, Docker images and Docker containers are closely related. So, how do the two concepts differ?
First of all, a Docker image is inert. It takes up some storage space but does not use any system resources. In addition, a Docker image cannot be changed after creation and as such is a ‘read-only’ medium. As a side note, it is possible to add changes to an existing Docker image, but this will create new images. An original, unmodified version of the image will remain.
As we already mentioned, a Docker image can be used to create an unlimited number of similar containers. How exactly is a Docker container different from a Docker image? A Docker container is a running instance (i.e. an instance in the process of execution) of a Docker image. Like any software executed on a computer, a running Docker container uses the system resources, working memory and CPU cycles. Furthermore, the status of a container changes over its lifecycle.
If this description seems too abstract, use this example from your day-to-day life to help: Think of a Docker image like a DVD. The DVD itself is inert – it sits in its case and does nothing. It permanently occupies the same limited space in the room. The content only becomes ‘alive’ when the DVD is played in a special environment (DVD player).
Like the film generated when a DVD is played, a running Docker container has a status. In the case of a film, this includes the current playback time, selected language, subtitles, etc. This status changes over time, and a playing film constantly consumes electricity. Just like how an unlimited number of similar containers can be created from a Docker image, the film on a DVD can be played over and over again. What’s more, the running film can be stopped and started, as can a Docker container.
Docker concept | Analogy | Mode | Status | Resource consumption |
Docker image | DVD | Inert | ‘“Read-only’”/unchangeable | Fixed |
Docker container | ‘“living’” | Playing film | Changes over time | Varies depending on use |
How and where are Docker images used?
Today, Docker is used in all phases of the software lifecycle, including during the development, testing, and operation phases. The central concept in the Docker ecosystem is the container which is always created from an image. As such, Docker images are used everywhere that Docker is used. Let’s look at a few examples.
Docker images in local development environments
If you develop software on your own device, you will want to keep the local development environment as consistent as possible. Most of the time, you’ll need perfectly matching versions of the programming language, libraries, and other software components. If just one of the many interacting levels is changed, it can quickly disrupt the other levels. This can cause the source code to not compile or the web server to not start. Here, the unchangeability of a Docker image is incredibly useful. As a developer, you can be sure that the environment contained in the image will remain consistent.
Large development projects can be carried out by teams. In this case, using an environment that stays stable over time is crucial for comparability and reproducibility. All developers in a team can use the same image, and when a new developer joins the team, they can find the right Docker image and start working straight away. When changes are made to the development environment, a new Docker image is created. The developers can then obtain the new image and are thus immediately up to date.
Docker images in service-orientated architecture (SOA)
Docker images form the basis of modern service-orientated architecture. Instead of a single monolithic application, individual services with well-defined interfaces are developed. Each service is packaged into its own image. The containers launched from this communicate with each other via the network and establish the overall functionality of the application. By enclosing the services in their own individual Docker images, you can develop and maintain them independently. The individual services can even be written in different programming languages.
Docker images for hosting providers/PaaS
Docker images can also be used in data centres. Each service (e.g., load balancers, web servers, database servers, etc.) can be defined as a Docker image. The resulting containers can each support a certain load. Orchestration software monitors the container, its load, and its status. When the load increases, the orchestrator launches additional containers from the corresponding image. This approach makes it possible for you to rapidly scale services to respond to changing conditions.
How is a Docker image built?
In contrast to images of virtual machines, a Docker image is normally not a single file. Instead, it is made up of a combination of several different components. Here is a quick overview (more details will follow later):
- Image layers contain data added by operations carried out on the file system. Layers are superimposed and then reduced to a consistent level by a union file system.
- A parent image prepares the basic functions of the image and anchors it in the main file directory of the Docker ecosystem.
- An image manifest describes the image composition and identifies the image layers.
What should you do if you want to convert a Docker image into a single file? You can do this with the ‘docker save’ command on the command line. This creates a .tar save file which can then be easily moved between systems. With the following command, a Docker image with the name ‘busybox’ is written into a ‘busybox.tar’ file:
docker save busybox > busybox.tar
Often, the output of the ‘docker save’ command is pipelined to Gzip on the command line. This way, the data is compressed after it has been output to the .tar file:
docker save myimage:latest | gzip > myimage_latest.tar.gz
An image file created via ‘docker save’ can be fed into the local Docker host as a Docker image with ‘docker load’:
docker load busybox.tar
Image layers
A Docker image is made up of read-only layers. Each layer describes the successive changes to the file system of the image. For each operation that leads to a change to the file system, a new layer is created. The approach used here is usually referred to as ‘copy-on-write’: a write access creates a modified copy of the image in a new layer while the original data remains unchanged. If this principle sounds familiar to you, it’s because the version control software Git works in the same way.
We can display the layers of a Docker image by using the ‘Docker image inspect’ command on the command line. This command returns a JSON document that we can process with the standard tool jq:
Docker image inspect <image-id> | jq -r '.[].RootFS.Layers[]'
A special file system is used to merge the changes in the layers again. This union file system overlays all layers to produce a consistent folder and file structure on the interface. Historically, various technologies known as ‘storage drivers’ were used to implement the union file system. Today, the storage driver ‘overlay2’ is recommended in most cases:
Storage driver | Comment |
overlay2 | Recommended for use today |
aufs, overlay | Used in earlier versions |
It is possible to output the used storage driver of a Docker image. We can use the ‘Docker image inspect’ command on the command line to do this. It returns a JSON document that we can then process with the standard tool jq:
Docker image inspect <image-id> | jq -r '.[].GraphDriver.Name'
Each image layer is identified with a clear hash which is calculated from the changes that the layer contains. If two images use the same layer, the layer will only be stored locally once. Both images will then use the same layer. This ensures efficient local storage and reduces transfer volumes when obtaining images.
Parent images
A Docker image usually has an underlying ‘parent image’. In most cases, the parent image is defined by a FROM directive in the docker file. The parent image defines a basis that the derived images are based on. The existing image layers are overlaid with additional layers.
When ‘inheriting’ from the parent image, a Docker image is placed in a file directory that contains all existing images. Perhaps you are wondering where the main file directory begins? Its roots are determined by a few special ‘base images’. In most cases, a base image is defined with the ‘FROM scratch’ directive in the docker file. There are however other ways to create a base image. You can find out more about this in the section ‘Where do Docker images come from?’.
Image manifests
As we have seen, a Docker image is made up of several layers. You can use the ‘Docker image pull’ command to pull a Docker image from an online registry. In this case, no single file is downloaded. Instead, the local Docker daemon downloads the individual layers and saves them. So, where does the information about the individual layers come from?
The information about which image layers a Docker image is made up of can be found in the image manifest. An image manifest is a JSON file that fully describes a Docker image and contains the following:
- Information about the version, scheme, and size
- Cryptographic hashes of the image layers used
- Information about the available processor architectures
To clearly identify a Docker image, a cryptographic hash of the image manifest is created. When the ‘Docker image pull’ command is used, the manifest file is downloaded. The local Docker daemon then obtains the individual image layers.
Where do Docker images come from?
As we have seen, Docker images are an important part of the Docker ecosystem. There are many different ways to obtain a Docker image. There are two basic methods that we will take a closer look at below:
- Pulling existing Docker images from a registry
- Creating new Docker images
Pulling existing Docker images from a registry
Often, a Docker project starts when an existing Docker image is pulled from a registry. This is a platform that can be accessed via the network that provides the Docker images. The local Docker host communicates with the registry to download a Docker image after a ‘Docker image pull’ command has been executed.
There are publicly accessible online registries that offer a wide selection of existing Docker images for use. At the time that this article was written, there were more than eight million freely available Docker images on the official Docker registry ‘Docker Hub’. In addition to Docker images, Microsoft‘s ‘Azure Container Registry’ includes other container images in a variety of different formats. You can also use the platform to create your own private container registries.
In addition to the online registries mentioned above, you can also host a local registry yourself. For example, larger organisations often use this option to give their teams protected access to self-created Docker images. Docker has created the Docker Trusted Registry (DTR) for exactly this purpose. It is an on-premises solution for the provision of an in-house registry in your own computer centre.
Creating new Docker images
You may sometimes want to create a specially-adapted Docker image for a specific project. Usually, you can use an existing Docker image and adapt it to meet your needs. Remember that Docker images are unchangeable and that when a change is made, a new Docker image is created. There are several different ways to create a new Docker image:
- Build on the parent image with Dockerfile
- Generate one from the running container
- Create a new base image
The most common approach to creating a new Docker image is to write a Dockerfile. A Dockerfile contains special commands which define the parent image and any changes required. Calling up the ‘Docker image build’ command will create a new Docker image from the Dockerfile. Here is a quick example:
# Create Dockerfile on the command line
cat <<EOF > ./Dockerfile
FROM busybox
RUN echo "hello world"
EOF
# Create a Docker image from a Dockerfile
Docker image build
Historically, the term ‘image’ comes from the ‘imaging’ of a data carrier. In the context of virtual machines (VMs), a snapshot of a running VM image can be created. A similar process can be done with Docker. With the ‘docker commit’ command, we can create an image of a running container as a new Docker image. All modifications made to the container will be saved:
docker commit <container-id>
Furthermore, we can pass on Dockerfile instructions with the ‘docker commit’ command. The modifications encoded with the instructions become part of the new Docker image:
docker commit --change <dockerfile instructions> <container-id>
We can use the ‘Docker image history’ command to trace which modifications have been made to a Docker image later:
Docker image history <image-id>
As we have seen, we can base a new Docker image on a parent image or on the status of a running container. But how do you create a new Docker image from scratch? There are two different ways to do this. You can use a Dockerfile with the special ‘FROM scratch’ directive as described above. This creates a new minimal base image.
If you would prefer not to use the Docker scratch image, you must use a special tool like debootstrap and prepare a Linux distribution. This will then be packaged into a tarball file with the tar command and imported into the local Docker host via ‘Docker image import’.
The most important Docker image commands
Docker image command | Explanation |
Docker image build | Creates a Docker image from a Dockerfile |
Docker image history | Shows the steps taken to create a Docker image |
Docker image import | Creates a Docker image from a tarball file |
Docker image inspect | Shows detailed information for a Docker image |
Docker image load | Loads an image file created with ‘ “Docker image save’” |
Docker image ls / Docker images | Lists the images available on the Docker host |
Docker image prune | Removes unused Docker images from the Docker host |
Docker image pull | Pulls a Docker image from the registry |
Docker image push | Sends a Docker image to the registry |
Docker image rm | Removes a Docker image from the local Docker host |
Docker image save | Creates an image file |
Docker image tag | Tags a Docker image |