You must have read the tutorials that start with “docker pull ubuntu:14.04”, continue with apt-get update, and after a couple of apt-get installs end up with a docker image larger than a gigabyte.

There is a much leaner way – using Ubuntu as the host operating system and pulling in only the binaries and libraries that you will use.

The results are impressive: my last apache/python image was 1.6GB, but using the following method I ended up with a 0.3GB image.

The trick is that you can copy the executables into a docker image as long as you also copy the system libraries they depend on. To find out what libraries an executable depends on can be queried with the ldd command:

$ ldd /usr/sbin/apache2

linux-vdso.so.1
libpcre.so.3 => /lib/x86_64-linux-gnu/libpcre.so.3
libaprutil-1.so.0 => /usr/lib/x86_64-linux-gnu/libaprutil-1.so.0
libapr-1.so.0 => /usr/lib/x86_64-linux-gnu/libapr-1.so.0
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6
libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1
libexpat.so.1 => /lib/x86_64-linux-gnu/libexpat.so.1
libuuid.so.1 => /lib/x86_64-linux-gnu/libuuid.so.1
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2
/lib64/ld-linux-x86-64.so.2

Apart from linux-vdso.so.1, a virtual library that the kernel emulates, all other libraries are actual files that can be copied into the Docker image. linux-vdso.1 will be taken care of by the kernel.

Instructions

You have to build your image on an Ubuntu host. I still run Ubuntu 14.0.4 LTS. Make sure you keep it up to date with apt-get update and apt-get upgrade!

Install your apache/mod_wsgi/python server on your host, if it works correctly on its own it will most likely work correctly in your container as well.

NOTE: remember to stop your webserver on your host before starting up your docker container – you don’t want the host’s webserver to conflict with the container’s webserver!

We will use a 5MB docker image as base that supports Ubuntu glibc and conveniently has busybox and an entire init system built in: busybox:glibc

Then we will create a directory and copy all our files and dependent libraries into it from our host system.

Finally we run docker build to create our new image.

1. Create the build directory and build config

Create a directory for your new build where your Dockerfile and all other files will reside.

touch Dockerfile
mkdir root

Your Dockerfile will be simple:

FROM busybox:glibc
COPY root /
// these are from /etc/apache2/envvars
ENV LANG C
ENV APACHE_RUN_USER www-data
ENV APACHE_RUN_GROUP www-data
ENV APACHE_LOG_DIR /var/log/apache2
ENV APACHE_LOCK_DIR /var/run/apache2
ENV APACHE_PID_FILE=/var/run/apache2/apache2.pid
//
CMD /usr/sbin/apache2 -D FOREGROUND

2. Copy the executables and config files into build directory

First the executable:

mkdir -p root/usr/sbin
cp -a /usr/sbin/apache2 root/usr/sbin/

Then the loadable modules:

mkdir -p root/usr/lib/apache2
cp -a /usr/lib/apache2/modules root/usr/lib/apache2/

Then the configuration files:

mkdir -p root/etc
cp -a /etc/mime.types root/etc/
cp -a /etc/apache2 root/etc/

Then the html directory:

mkdir -p root/var/www
cp -a /var/www/html root/var/www/

3. Copy the library dependencies into build directory

This is the section that makes people stay away from hand-building Docker images: library files look arcane at first, but they are pretty straightforward.

First a simple one-line script to find and copy the dependencies of all executables in the build directory:

    mkdir -p root/lib
    for i in `find root -type f -executable | xargs ldd | grep -v "linux-vdso" | grep "=>" | awk ' { print $3 } '`; do
        cp -a $i* root/lib/
    done

Apache has some loadable modules that are not executable but they still pull in other libraries. A slight modification of the above script pulls those libraries in as well:

    for i in `find root/usr/lib/apache2/modules/ -type f | xargs ldd | grep -v "linux-vdso" | grep "=>" | awk ' { print $3 } '`; do
        cp -a $i* root/lib/
    done

Then make sure we copied the actual libraries not just the symlinks pointing to them:

    for i in `find root/lib -type l`; do
        if [ ! -e "$i" ]; then
           missing=`readlink $i`
           cp `find /lib -name $missing` root/lib/
        fi
    done

And finally the one missing library that somehow still got out:

    cp -a /lib/x86_64-linux-gnu/libgcc_s.so.1 root/lib/

4. Add a few missing directories

mkdir -p root/var/log/apache2
mkdir -p root/var/run/apache2

5. Build the image

This is the easiest part:

    docker build --rm --no-cache -t tiny-apache:latest .

6. Test the image

We run the image interactively to see all error messages and use net=host to skip having to specify port mapping. Of course you can specify port mapping if you prefer.

    docker run -ti --net=host -P tiny-apache:latest

The resulting apache Docker image is 21 megabytes. The equivalent ubuntu image is 233 megabytes.

Where to go from here

I use these instructions to build and debug mysql, nginx, redis, elasticsearch and other docker images.

I prefer to combine programs that depend on each other in the same container, for example I run nginx, gunicorn, celery and cron in one container. For this I use the busybox runit init system and I start runsvdir as the main command that starts everything else.

For logging I simply map my host’s syslog socket /dev/log into /dev/log inside the container as a volume: -v /dev/log:/dev/log

If you have any questions, ping me on twitter: imreFitos

Imre Fitos

Startups, Technology and Organization

Tutorial: How to create the smallest possible Ubuntu Docker image with apache, nginx, python, php, java or anything else you want in it