Reducing Docker Image Size

Posted on May 4, 2020

GeekThis happily runs on Vultr. Get $300 of free hosting credits to try out their cloud compute, kubernetes engine, or managed databases. Try Vultr today to claim your free $300.

With Docker, it’s easy to end up with images many times larger than they need to be. Even if you remove unnecessary files and packages, you’ll still see your image size be much larger than expected. The size of the image may not seem too important to some, but there are many benefits to having smaller docker images.

A smaller image will allow you to upload and download the images faster. When deploying your apps, the time difference between a 3GB and a 150MB image is very noticeable. Containers are meant to be as lightweight and small as possible. Along with the time saved when deploying, you are also saving disk space. Disk space is cheap but the IO cost behind it can be expensive.

When you think about your docker image size, it forces you into using only the libraries and tools required for your app to run isolated inside of its container. This helps reduce possible issues down the line with unnecessary packages, libraries, and tools that you don’t need.

What is a Docker Image

Other than being an output line in your terminal from which containers can be created, an image is a union file system that is built in layers, essentially performing a patch from each layer. The image is built in series from each layer, either adding, removing, or changing permissions of files (that last point is very important to remember later on).

A layer is a set of changes to apply to the previous layer. Almost all directives in your Dockerfile will create a new layer, but the ones you want to focus most on are ADD, COPY and RUN. The other directives in the file still create new layers, but they only modify metadata which will amount to a negligible change to the image size.

A simple way to see how an image is built is to simply use docker save and then inspect the TAR file it produced. You’ll see each layer saved in a folder identified by its checksum. Those folders contain the layer’s file system and metadata such as environment variables, command, and entrypoint.

Starting Point

When creating your Dockerfile, start from the smallest image possible but don’t repeat yourself. There are many existing images that will get you the software you need for your project, such as the node, php, and python images. Check the tags for the image you need, and find one that is acceptable for your usage. Alpine versions of most of these images are a really good starting point and are usually at least 3x smaller than the debian-slim counterparts.

Using recommended practices, it’s impossible to have your final image smaller than your starting point because every command creates a new layer. When you start off with a 300MB image, your image is going to stay above 300MB even if you delete the entire file system.

Reducing Image Size Tips

You’ve likely seen all of these somewhere before, but for completeness, I’m adding them to this post.

1. Cleanup Files in same RUN command

When running a command, perform all cleanup in the same command otherwise the layer with all the junk will exist in your image. If you write a temporary file, delete that temporary file inside the same command.

RUN set -eux; \
    touch /tmp/example.txt; \
    # other commands
    rm /tmp/example.txt

2. Combine RUN Commands

Try to combine as many RUN commands together to prevent new layers from being created. It’s a lot easier to clean up all your installed development packages, temporary files, and build artifacts if they all happen together instead of having to do it at each stage.

Don’t overdo the act of condensing all the commands into one though. If there are slow commands that would have to be run again frequently if you combine your RUN commands together, split up that slow command into its own, just be sure to clean up properly.

3. Only Install Required Packages

It’s tempting to install wget, vim, netcat, and other tools into your container for testing purposes, but your final image should not contain these. If you need a development version of your image, create a multi-stage Dockerfile that is from your release version that installs your developer tools.

4. Watch Out for CHMOD and CHOWN

These commands will quickly increase the size of your container because the files with changed permissions will be saved to the layer. It may not always be easy to avoid, but be sure to use the --chown argument when calling COPY to avoid easily duplicating the size of the copied files in your image.

COPY --chown=www-data:www-data ./public /var/www

A good way to reduce size and keep custom permissions is to use a multi-stage build. Create a stage that won’t be used in the final image that copies files, sets permissions and ownership. Then your final stage can copy files from that temporary stage.

FROM alpine:3.11 as files
COPY ./public /var/www
RUN set -eux; \
    Chown -R www-data:www-data /var/www; \
    chmod -R u=rw,g=rw,o=,u+X,g+X /var/www

FROM alpine:3.11 as release
COPY --chown=www-data:www-data --from=files /var/www /var/www

5. Use Multi-stage Builds

With a multi-stage build, you can be sloppy in previous stages as long as the final stage only copies and installs the necessary files. You would usually have a “build” stage that installs all the developer dependencies to compile your final program, then your “release” stage will simply copy the compiled program.

Viewing Image Layers

You can’t improve the size of your image if you don’t know what’s causing the problems. Luckily Docker has a built in command to help you view each layer of an image. The output will show the layer’s checksum, when it was created, how it was created (the RUN, COPY, ADD directives), along with the size of the layer.

# docker history <image>

Anything in layers that are before the first line in your Dockerfile can’t be changed easily unless you own the image that your Dockerfile is being created from.

You’ll want to find layers that have large sizes and determine why that layer is so large. You can see the full command by using the --no-trunc flag with docker history, but I generally look inside the Dockerfile because it’s formatted.

Viewing Added Files in Layer If looking at the command alone doesn’t show where there can be improvements, you can see what files were added in that layer. Using the IMAGE checksum for the layer you want to inspect, enter the below command.

# docker inspect --format "{{ .GraphDriver.Data.UpperDir }}" <image>

Using the directory returned by the above command, you can see what files were added to the file system. By listing all the files, you can see what was added by mistake or not removed in certain layers.

Flatten / Squash Docker Image

It’s possible to remove past layers and only keep the final layer with all your changes. There are two methods to go about this and they both have their problems. I don’t recommend either of them unless you have a very specific need to do so.

Docker Squash Option

This is an experimental option by docker to “squash” the newly built layers into a single layer. Because it’s experimental, it could have unforeseen problems, which is risky in production environments. When executing docker build, add the option --squash to perform the operation. You’ll need to enable experimental options inside your docker configuration for it to work.

Export / Import Image

This method works by exporting a running container’s filesystem to a tarball, and then importing it as a new image to Docker. Those two steps alone will remove all layers, environment variables, entrypoint, and command data. For this to work properly in most cases, you’ll need to create another Dockerfile that sets those values again from your newly created image. The downside of this method is you need additional Dockerfiles and you lose all history for the image. With a proper Dockerfile, you can avoid this trouble and have similarly sized final images.

# docker export <container> > container.tar
# cat container.tar | docker import - [REPOSITORY[:TAG]]

At this point, to set the entrypoint, environment variables, and command, your Dockerfile will be created FROM the new repository and tag you imported.