Imagine you finally finished your project, and you want to pack it into a beautiful Docker image.
You wrote a Dockerfile
, ran the docker build
command, 1 minute passed. After the image is built, you realize that the image is huge. It’s probably taking up 1GB for a simple Node.js application.
You are probably wondering: “Is there a way to reduce the image size?”
The old school way: single-stage builds
Back in the old days when Docker was first introduced, we only had single-stage builds. It’s straightforward and easy to work with. You only need to write a Dockerfile
, and build the image.
Here’s an example of a single-stage build that builds a C++ application:
FROM debian:12
COPY . /app
RUN apt-get update && apt-get install -y build-essential && \
gcc -o /app/donut /app/donut.c
ENTRYPOINT ["/app/donut"]
And we are using this donut.c
as example C program:
i,j,k,x,y,o,N;
main(){float z[1760],a
#define R(t,x,y) f=x;x-=t*y\
;y+=t*f;f=(3-x*x-y*y)/2;x*=f;y*=f;
=0,e=1,c=1,d=0,f,g,h,G,H,A,t,D;char
b[1760];for(;;){memset(b,32,1760);g=0,
h=1;memset(z,0,7040);for(j=0;j<90;j++){
G=0,H=1;for(i=0;i<314;i++){A=h+2,D=1/(G*
A*a+g*e+5);t=G*A *e-g*a;x=40+30*D
*(H*A*d-t*c);y= 12+15*D*(H*A*c+
t*d);o=x+80*y;N =8*((g*a-G*h*e)
*d-G*h*a-g*e-H*h *c);if(22>y&&y>
0&&x>0&&80>x&&D>z[o]){z[o]=D;b[o]=(N>0
?N:0)[".,-~:;=!*#$@"];}R(.02,H,G);}R(
.07,h,g);}for(k=0;1761>k;k++)putchar
(k%80?b[k]:10);R(.04,e,a);R(.02,d,
c);usleep(15000);printf('\n'+(
" donut.c! \x1b[23A"));}}
/*3D-spinning-donut */
Reference: limiteci/donut.c
It looks simple, doesn’t it? This image install build-essential
package which includes almost every compiler you need to build a C++ application. Then it copies the source code, compiles it, and runs the compiled binary.
But there’s a problem. build-essential
isn’t required to run the /app/donut
binary. It’s only necessary during the build process. So, we are shipping unnecessary dependencies in the final image.
Removing unnecessary dependencies
To solve this problem, we need to remove the unnecessary dependencies after the build process. Here’s how you can do it:
FROM debian:12
COPY . /app
RUN apt-get update && apt-get install -y build-essential && \
gcc -o /app/donut /app/donut.c && \
apt remove -y build-essential && apt autoremove -y && apt-get clean
ENTRYPOINT ["/app/donut"]
In the above example, we removed build-essential
package after the build process and apt cache to reduce the image size.
Using Alpine Linux
There is another choice is to use Alpine Linux as the base image. Alpine Linux is designed to be lightweight and secure. It’s only 5MB in size (when +busybox), which is much smaller than Debian.
Here’s how you can convert the above Dockerfile to use Alpine Linux:
FROM alpine:3
COPY . /app
RUN apk add --no-cache build-base && \
gcc -o /app/donut /app/donut.c && \
apk del build-base
ENTRYPOINT ["/app/donut"]
The result image should be much smaller than the Debian image.
But here’s the catch again: Alpine Linux uses musl
as the standard C library, which is different from glibc
used in most Linux distributions. It may cause compatibility issues with some C/C++ applications.
Specifically, if you got a built binary from an external source that depends on glibc
, good luck with that.
The new way: multistage builds
Then Docker introduced multistage builds in version 17.05. It allows you to build your application in multiple stages and copy the artifacts from one stage to another.
We can think it in this way: We can have a big kitchen full of tools and ingredients to cook a meal, but we only need a small box to serve a meal.
Here’s how you can convert the above Dockerfile to use multistage builds:
FROM debian:12 AS builder
RUN apt-get update && apt-get install -y build-essential
COPY . /app
RUN gcc -o /app/donut /app/donut.c
FROM debian:12
COPY --from=builder /app/donut /app/donut
ENTRYPOINT ["/app/donut"]
What this Dockerfile does is it builds the application in the first stage, and then copies the compiled binary to the second stage. The second stage is started from scratch and should be smaller than the first stage because it doesn’t have build-essential
package.
Using distroless images
If you managed to try the example above, you may notice that the final image is still quite big. Since we only removed build-essentail
that we installed additionally, the base image size is still big.
In 2017, Google introduced distroless
images, which are minimal images that contain only the necessary libraries to run the application. The image only contains dependencies that are required to run the application, nothing more. Purer than pure.
Since distroless images don’t have a package manager, you can’t install additional packages in the image. You can only copy the artifacts from the previous stage. This is the point where multistage is an important part of distroless images.
To cook a distroless image, we need to use debian:12
to build binary first, and then copy the binary to gcr.io/distroless/static-debian12
image:
FROM debian:12 AS builder
RUN apt-get update && apt-get install -y build-essential
COPY . /app
RUN gcc -static -o /app/donut /app/donut.c
FROM gcr.io/distroless/static-debian12
COPY --from=builder /app/donut /app/donut
ENTRYPOINT ["/app/donut"]
Note that we added a
-static
togcc
here, if you don’t add it, the binary will be dynamically linked, and it will requireglibc
to run. (TL;DR, results larger image size)Or you can use
gcr.io/distroless/base-debian12
image
The final image should be much smaller than the previous Debian-based image or Alpine-based image. Since it’s just a “donut” binary and nothing more.
Going bare bone: from scratch
No, we aren’t talking the orange furry cat Scratch.
scratch
is an interesting base image, it’s actually not a real image. The scratch
is essentially just a special image name that tells Docker to create a blank filesystem.
It doesn’t come with any runtime, only the ability to execute binary files. This made the image extremely compact and secure.
We can use this small example to explore the scratch
base image:
FROM debian:12 AS builder
RUN apt-get update && apt-get install -y build-essential
COPY . /app
RUN gcc -static -o /app/donut /app/donut.c
FROM scratch
COPY --from=builder /app/donut /app/donut
ENTRYPOINT ["/app/donut"]
Binary run inside
scratch
bash image MUST be statically linked! Dynamically linked binary won’t work at here unless you add those libraries manually.
This Dockerfile
will build a docker image with only donut
binary, nothing else.
Final comparison
Base image | Size |
---|---|
Scratch | 767kB |
Distroless + multistage | 2.76MB |
Alpine | 7.4MB |
Debian with multistage | 117MB |
Debian with build-essential removed | 140MB |
Debian with build-essential | 534MB |
Real size may vary, the above size is just an example. Image is built on my laptop with Docker
28.0.1
As the data shown above, scratch one got the smallest image size. Distroless ranked second, and Alpine build processes ranked third.
Though in this comparison, alpine and distroless is not the smallest one, but it’s still a good choice considering it only got a few megabytes in size.
Conclusion
So what should you choose? Scratch or distroless or Alpine or Debian as base?
My suggestion is, If you sure your application does not rely on anything else that a normal Linux system will have (eg. generic library and header), consider using scratch
.
Use distroless if Google already prebuilds your runtime. Currently, distrobox got python3
, java-{17, 21}
and nodejs-{18, 20, 22}
. These images include these runtimes that are probably optimized for distroless base image.
Use alpine if you depend on a special runtime that provides a prebuilt Alpine docker base image.
What about Debian? Unless you have an exceptional use case, avoid using it as a base image.