Cover for article Get started with Docker multistage & distroless

Get started with Docker multistage & distroless

Docker image size too big? Multistage build and distroless base image can help you reduce the overall image size!

Last modified: 2025/04/02 10:30 PM

Created at: 2025/04/01

docker
linux
container
server
multistage
ci/cd

Imagine you finally finished your project, and you want to pack it into a beautiful Docker image.

You wrote a Dockerfile, ran the docker build command, 1 minute passed. After the image is built, you realize that the image is huge. It’s probably taking up 1GB for a simple Node.js application.

You are probably wondering: “Is there a way to reduce the image size?”

The old school way: single-stage builds

Back in the old days when Docker was first introduced, we only had single-stage builds. It’s straightforward and easy to work with. You only need to write a Dockerfile, and build the image.

Here’s an example of a single-stage build that builds a C++ application:

FROM debian:12

COPY . /app
RUN apt-get update && apt-get install -y build-essential && \
  gcc -o /app/donut /app/donut.c

ENTRYPOINT ["/app/donut"]

And we are using this donut.c as example C program:

           i,j,k,x,y,o,N;
        main(){float z[1760],a
      #define R(t,x,y) f=x;x-=t*y\
   ;y+=t*f;f=(3-x*x-y*y)/2;x*=f;y*=f;
  =0,e=1,c=1,d=0,f,g,h,G,H,A,t,D;char
 b[1760];for(;;){memset(b,32,1760);g=0,
h=1;memset(z,0,7040);for(j=0;j<90;j++){
G=0,H=1;for(i=0;i<314;i++){A=h+2,D=1/(G*
A*a+g*e+5);t=G*A        *e-g*a;x=40+30*D
*(H*A*d-t*c);y=          12+15*D*(H*A*c+
t*d);o=x+80*y;N          =8*((g*a-G*h*e)
*d-G*h*a-g*e-H*h        *c);if(22>y&&y>
 0&&x>0&&80>x&&D>z[o]){z[o]=D;b[o]=(N>0
  ?N:0)[".,-~:;=!*#$@"];}R(.02,H,G);}R(
  .07,h,g);}for(k=0;1761>k;k++)putchar
   (k%80?b[k]:10);R(.04,e,a);R(.02,d,
    c);usleep(15000);printf('\n'+(
      " donut.c! \x1b[23A"));}}
        /*3D-spinning-donut */

Reference: limiteci/donut.c

It looks simple, doesn’t it? This image install build-essential package which includes almost every compiler you need to build a C++ application. Then it copies the source code, compiles it, and runs the compiled binary.

But there’s a problem. build-essential isn’t required to run the /app/donut binary. It’s only necessary during the build process. So, we are shipping unnecessary dependencies in the final image.

Removing unnecessary dependencies

To solve this problem, we need to remove the unnecessary dependencies after the build process. Here’s how you can do it:

FROM debian:12

COPY . /app
RUN apt-get update && apt-get install -y build-essential && \
  gcc -o /app/donut /app/donut.c && \
  apt remove -y build-essential && apt autoremove -y && apt-get clean

ENTRYPOINT ["/app/donut"]

In the above example, we removed build-essential package after the build process and apt cache to reduce the image size.

Using Alpine Linux

There is another choice is to use Alpine Linux as the base image. Alpine Linux is designed to be lightweight and secure. It’s only 5MB in size (when +busybox), which is much smaller than Debian.

Here’s how you can convert the above Dockerfile to use Alpine Linux:

FROM alpine:3

COPY . /app
RUN apk add --no-cache build-base && \
  gcc -o /app/donut /app/donut.c && \
  apk del build-base

ENTRYPOINT ["/app/donut"]

The result image should be much smaller than the Debian image.

But here’s the catch again: Alpine Linux uses musl as the standard C library, which is different from glibc used in most Linux distributions. It may cause compatibility issues with some C/C++ applications.

Specifically, if you got a built binary from an external source that depends on glibc, good luck with that.

The new way: multistage builds

Then Docker introduced multistage builds in version 17.05. It allows you to build your application in multiple stages and copy the artifacts from one stage to another.

We can think it in this way: We can have a big kitchen full of tools and ingredients to cook a meal, but we only need a small box to serve a meal.

Here’s how you can convert the above Dockerfile to use multistage builds:

FROM debian:12 AS builder

RUN apt-get update && apt-get install -y build-essential

COPY . /app
RUN gcc -o /app/donut /app/donut.c

FROM debian:12

COPY --from=builder /app/donut /app/donut

ENTRYPOINT ["/app/donut"]

What this Dockerfile does is it builds the application in the first stage, and then copies the compiled binary to the second stage. The second stage is started from scratch and should be smaller than the first stage because it doesn’t have build-essential package.

Using distroless images

If you managed to try the example above, you may notice that the final image is still quite big. Since we only removed build-essentail that we installed additionally, the base image size is still big.

In 2017, Google introduced distroless images, which are minimal images that contain only the necessary libraries to run the application. The image only contains dependencies that are required to run the application, nothing more. Purer than pure.

Since distroless images don’t have a package manager, you can’t install additional packages in the image. You can only copy the artifacts from the previous stage. This is the point where multistage is an important part of distroless images.


To cook a distroless image, we need to use debian:12 to build binary first, and then copy the binary to gcr.io/distroless/static-debian12 image:

FROM debian:12 AS builder

RUN apt-get update && apt-get install -y build-essential

COPY . /app
RUN gcc -static -o /app/donut /app/donut.c

FROM gcr.io/distroless/static-debian12

COPY --from=builder /app/donut /app/donut

ENTRYPOINT ["/app/donut"]

Note that we added a -static to gcc here, if you don’t add it, the binary will be dynamically linked, and it will require glibc to run. (TL;DR, results larger image size)

Or you can use gcr.io/distroless/base-debian12 image

The final image should be much smaller than the previous Debian-based image or Alpine-based image. Since it’s just a “donut” binary and nothing more.

Going bare bone: from scratch

No, we aren’t talking the orange furry cat Scratch.

scratch is an interesting base image, it’s actually not a real image. The scratch is essentially just a special image name that tells Docker to create a blank filesystem.

It doesn’t come with any runtime, only the ability to execute binary files. This made the image extremely compact and secure.

We can use this small example to explore the scratch base image:

FROM debian:12 AS builder

RUN apt-get update && apt-get install -y build-essential

COPY . /app
RUN gcc -static -o /app/donut /app/donut.c

FROM scratch

COPY --from=builder /app/donut /app/donut

ENTRYPOINT ["/app/donut"]

Binary run inside scratch bash image MUST be statically linked! Dynamically linked binary won’t work at here unless you add those libraries manually.

This Dockerfile will build a docker image with only donut binary, nothing else.

Final comparison

Base imageSize
Scratch767kB
Distroless + multistage2.76MB
Alpine7.4MB
Debian with multistage117MB
Debian with build-essential removed140MB
Debian with build-essential534MB

Real size may vary, the above size is just an example. Image is built on my laptop with Docker 28.0.1

As the data shown above, scratch one got the smallest image size. Distroless ranked second, and Alpine build processes ranked third.

Though in this comparison, alpine and distroless is not the smallest one, but it’s still a good choice considering it only got a few megabytes in size.

Conclusion

So what should you choose? Scratch or distroless or Alpine or Debian as base?

My suggestion is, If you sure your application does not rely on anything else that a normal Linux system will have (eg. generic library and header), consider using scratch.

Use distroless if Google already prebuilds your runtime. Currently, distrobox got python3, java-{17, 21} and nodejs-{18, 20, 22}. These images include these runtimes that are probably optimized for distroless base image.

Check available distroless images on GitHub readme

Use alpine if you depend on a special runtime that provides a prebuilt Alpine docker base image.

What about Debian? Unless you have an exceptional use case, avoid using it as a base image.

Wants to chat with me?

Fire an email to me@wolf-yuan.dev!