Top Dockerfile best practices
Docker is a miracle of modern engineering. It’s amazingly easy to get started with and simple to extend. However, perhaps precisely because of that simplicity, many of its best practices are often forgotten or overlooked. To help you take advantage of them, we’ve put together this guide.
Following the practices here will help you improve container stability, speed up deploy processes, cut down on image sizes, and tighten security. Where appropriate, we've also included links to further reading and resources to get you the most bang for your buck valuable time. So strap in, grab a notebook or maybe your CTO, and enjoy!
# 0) FROM: should have a tag and it shouldn't be latest
The FROM command in Docker enables you to set the base image for a build stage. Although it's possible to specify an image without a tag, we suggest not doing this, as you can introduce breaking changes and produce other unexpected results. Specifying "latest" as the image tag is similarly useless.
Instead, find the version of the image you want to use and specify that exact version in your Dockerfile. Datree can help here as well -- you can prevent members of your organization from merging code that contains images with nonspecific tags.
# 1) RUN: apt / yum: installed packages should have a version
`RUN apt-get` is the primary method of package installation in most (Linux-based) Dockerfiles. As with any other package management system, it's critical to specify the version of each package you install to promote stability. And no, using "latest" doesn't count -- it's subject to the same issues as failing to specify a version at all.
You should also combine all of your `apt-get` statements into a single line, separated by newlines.
Instead of this:
Doing so prevents you from creating unnecessary additional layers and speeds up the build process. The `apt` command also takes time to initialize, so the fewer times you call it, the better.
# 2) FROM: every image should be pulled from the organization's private registry
Another argument you can pass to the FROM command is the registry you want the image to be pulled from. By default, this is a public registry, usually Dockerhub. For security purposes, however, you may wish to enforce that every image is pulled from your registry. We highly recommend large organizations make use of a private registry for security purposes.
# 3) USER property should be specified, and it should not be root
Any Dev Ops engineer worth his salt knows the security vulnerabilities created by running services as the root user. Compartmentalized containers mitigate these risks somewhat, but they are still severe enough that running services as the root user is unsafe.
By default, the default user in most base images (the image that you build the container on, specified by FROM) is root, which means that the user will remain root in that container until otherwise specified. The USER command allows us to manually set the user's ID within the Dockerfile at any point. We strongly suggest you enforce the use of this command in every Dockerfile as early as possible.
You can read up on the USER command in the official Docker documentation (the Dockermentation?). Benjamin Cane of American Express also did a great writeup of the security vulnerabilities of running a service as a root user in a blog post here.
# 4) HEALTHCHECK property should exist
With the rise of containers and independent self-deployed and maintained microservices, it's more important than ever to have visibility into the state of every part of your infrastructure.
Fortunately, Docker provides a small but fully functional API for exposing the status of the container to allow you to inspect whether or not it's ready to do work. This provides more useful information than monitoring whether or not the process is simply running does, since “running” covers a range of states from “it's working”, to “still launching”, to even “stuck in a broken state”.
You can access this API via the HEALTHCHECK instruction. A Dockerfile can only contain one HEALTHCHECK instruction; if more than one is specified, only the last one is used.
You can read up on exactly how the instruction works in the official docs here, but our recommended best practice is that every Dockerfile should contain a healthcheck. Read more about the importance of healthchecks in this article by the New Relic engineering team.
# 5) LABEL property should exist
The LABEL property is a feature of Docker allows you to specify custom metadata in the form of key/value pairs. In brief, after creating a label, you can reference that label's value wherever you need to using its key.
Labels allow you to produce readable and easily maintainable Dockerfiles. They make explicit the purpose of a value wherever its corresponding label is used. They also make it easier to update a value across a Dockerfile, since you only need to change it once -- in the place of its declaration.
Another useful property of labels is that `docker inspect` can extract them from Docker Images at build time. Images become much simpler to understand and organize when you extract their organizational metadata into labels.
Because of these features, using labels to organize and notate your Dockerfiles is strongly recommended. Datree gives you a way to do this with the "LABEL property should exist" rule.
# 6) Specify the container's maintainer in a LABEL
One piece of metadata is important enough to warrant being discussed on its own -- the "maintainer" property.
In the early days of Docker, this property used to be an actual instruction called `MAINTAINER`. After labels were introduced back in 2015, MAINTAINER was deprecated in favor of just creating a label with a key of maintainer, like this:
You should include maintainer metadata in every Dockerfile to indicate who is responsible for the container. Code ownership is critical, especially as organizations scale in size. As more people contribute to an un-owned piece of infrastructure, critical knowledge about how that code works (such as pitfalls, bugs, and other context) is further distributed among them until it becomes so diluted that it can be lost entirely. To avoid situations like this, it’s best to assign an owner to each piece of software within your organization. Usually, this will be the person who originally developed it, or somebody to whom they pass down the relevant context and knowledge.
# 7) Keep your image lightweight with a .dockerignore file
If your Dockerfile contains ADD or COPY instructions, make sure you keep the size of your image to a minimum by including a .dockerignore file. This will speed up your build and prevent you from doing unnecessary work. This file will give you fine-grained control over what is copied or added to the container by allowing you to specify glob patterns matching files and directories you want to ignore.
For example, let's say we usually COPY the entire contents of the root directory into the container. There are a number of directories which we don't need at all that currently we'd just be copying and storing for no reason -- for example, we don't need the `.git/` directory in a production environment. Similarly, there's usually nothing to be gained by including your `tests/` directory, so you don't want to copy that over either.
In smaller projects, we aren't saving that much work by excluding directories like these, but in larger ones, the differences can be in the hundreds of megabytes. You can read up on exactly how to create a .dockerignore file in the official docs here.
# (bonus) FROM: Stop using Python 2.7
The Python team have been phasing out 2.7 for a while, and support will officially cease at the end of this year, 2019. The occasional tool or process still lives in the dark ages, but the vast majority have been upgraded to 3 and would break if another version is in use. Between the upcoming lack of support and the fact that so many people are using Python 3, one of our recommended best practices is to enforce the use of 3 in your Dockerfiles. And we've created a rule in Datree to allow you to do just that.
As you can see, there are a lot of best practices you can take advantage of to improve the quality of Dockerfiles across your code. This article is by no means comprehensive, but if you follow the rules we've suggested above, container build times will decrease and your infrastructure’s stability and security will improve.
Best of all, Datree allows you to make sure they’re are all enforced at pull request time with our custom Dockerfile rules. You can also perform an automatic audit of the current state of your Dockerfiles now to see if there’s low-hanging fruit you can take advantage of today.
Thanks for reading, and happy Dockering!
Special thanks goes to Eran Bibi, Director of DevOps at Aqua Security, for helping with identifying the most critical security practices and Elliot Bonneville for helping write the post.