Yet Another Brief History of container(d)

This story is written to myself from the past. I had some understanding of what a kernel is, what a process is, what a Virtual Machine is. I knew a container is “just a process” — I mean, you can see the process from the container on your host. Besides that, I was using Docker for years, but didn’t really know how we ended up here, what made it possible and what the 🗣 is containerd. Hi!

The Primal Need

One thing which was obvious from the early stages of digital age is that we need better utilization of expensive hardware. In the beginning, programs were batched — executing one after another. We want to run things in parallel. When some program is not using full machine power, others can step in, but must not touch resources of others. This is best achieved by assuring that one program is not even seeing other program’s resources — like it has the whole machine for itself. This is virtualization.

Main things that are required for successful virtualization are

  1. isolation — programs not seeing each other (including their resources)
  2. resource limitation — programs must not use as much of machine’s native resources as they like, there have to be limits (for CPU, network…)

From the 60s, virtualization, in the form of Virtual Machines, allowed better utilization of mainframes. Mainframes seemed a bit out of spotlight when the PC era came.

Virtual machines still existed in the 1980s and 1990s, but garnered only a bare minimum of activity and interest. DOS, OS/2, and Windows all offered a limited form of DOS virtual machines during that time, though it might be more fair to categorize those as emulation. The rise of programming languages like Smalltalk and Java re-purposing the term “virtual machine” — to refer to an abstraction layer of a language runtime, rather than a software replication of a real hardware architecture — may be indicative of how dead the original concept of virtual machines was in that period. After a hiatus lasting nearly two decades, the late 1990s brought a resurgence of interest in virtual machines, but for a new purpose adapted to the technology of the time.

The Ideal Versus the Real: Revisiting the History of Virtual Machines and Containers, Allison Randal

Year 2000±1

The “2nd wave” of virtualization was driven by internet and emergence of data centers everywhere. There was an enormous need for performance and portability (the second biggest advantage of virtualization — e.g. in no time change location of a running machine).

VMware started inventing like crazy with focus on Virtual Machines, but we will continue this story on containers’ branch of history.

When we want to completely isolate programs from each other, we can make a Virtual Machine for each. For example, we put MySQL in one, then in the other we put good old Apache, etc. In the peak times, we can spin up multiple VMs with Apache to serve more users. Those VMs have full operating systems with considerably long boot times and large disk space needs. Demanding future of 2000s asked for something smaller and faster.

This need was recognized by many who were thinking: what would it look like if you could run processes in a single operating system, and allow them to see only resources you want them to see? We could reuse whole OS kernel, no need for a full copy of OS. That was the goal of FreeBSD, Virtuozzo, Linux VServer, Solaris… They all presented some kind of container. We will come back to that, but let’s first mention some inspirations for this “new wave”.

In many articles you will see chroot as first step/inspiration towards containers, but some other building block of containers had its predecessor even earlier. Just a fun fact I stumbled upon: process capabilities are something that you can give a process to limit its, you guessed correctly, capabilities. And they came before chroot.

chroot is a Linux feature that makes it possible to give a process only part of your filesystem tree. So, you run a process, give it a folder and it only can access files in that folder. chroot is sometime called chroot jails.

Let’s call that the first move toward isolation and see what a couple of mentioned projects did to improve isolation and make some progress on resource limitation.

In 2000, FreeBSD added Jails, which isolated filesystem namespaces (using chroot), but also isolated processes and network resources, in such a way that a process might be granted root privileges inside the jail, but blocked from performing operations that would affect anything outside the jail. In 2001, Linux VServer patched the Linux Kernel to add resource usage limits and isolation for filesystems, network addresses, and memory. Around the same time, Virtuozzo (later released as OpenVZ) also patched the Linux Kernel to add resource usage limits and isolation for filesystems, processes, users, devices, and interprocess communication (IPC).

The Ideal Versus the Real: Revisiting the History of Virtual Machines and Containers, Allison Randal

Also later, Solaris Zones (with ZFS filesystem) supported snapshotting of disk and thus fast creation of “containers”.

So, they all contributed and showed the opportunity and feasibility of resource limiting and isolation on the process (OS) level, but they all patched their kernels, because the official Linux kernel was not ready yet.

Until…

… something called namespaces started being added to Linux kernel in 2002, and it was influenced by Plan 9’s name space (chroot similarities there).

Namespaces are a kernel feature for isolation. In simple words: you create a namespace for some resource, attach processes to the namespace, and they can only see that resource.

In Linux there are types of namespaces, depending on type of resources: filesystem mount namespace (like chroot), network namespace, PID (process ID) namespace… Now, there are 8 namespaces but they were not released at once, they came in waves.

Here is an excellent explanation of namespaces with a little demo I encourage you to — add to Watch “later”: Containers unplugged: Linux namespaces — Michael Kerrisk.

After the first types of namespaces, cgroups were developed at Google (2006.) as “process containers”, and soon merged to kernel.

While Linux provides copious monitoring and control options for individual processes, it has less support for applying the same operations efficiently to related groups of processes. This has led to multiple proposals for subtly different mechanisms for process aggregation for resource control and isolation. Even though some of these efforts could conceptually operate well together, merging each of them in their current states would lead to duplication in core kernel data structures/routines.

Adding Generic Process Containers to the Linux Kernel, Paul B. Menage (Google)

As nicely summarized in this blog post, a control group (cgroup) is a Linux kernel feature that limits, accounts for, and isolates the resource usage (CPU, memory, disk I/O, network, and so on) of a collection of processes.

Cgroups provide the following features (same blog post):

  • Resource limits — You can configure a cgroup to limit how much of a particular resource (memory or CPU, for example) a process can use.
  • Prioritization — You can control how much of a resource (CPU, disk, or network) a process can use compared to processes in another cgroup when there is resource contention.
  • Accounting — Resource limits are monitored and reported at the cgroup level.
  • Control — You can change the status (frozen, stopped, or restarted) of all processes in a cgroup with a single command.

Great! Now we have almost all features we need for today’s containers — let’s use them as developers (maybe watch later 😄):

Love this video by Liz Rice. Although the goal was to show how simple underlying tech is, I want you to consider one thing — imagine if you would like to make something that deals with containers, e.g. “deployment manager for your data center”. Lot of work. Not so user friendly.

One layer above

In 2008 tool called LXC (Linux Containers) was made to facilitate creating and running containers, by combining the kernel’s cgroups and support for isolated namespaces.

In contrast to OpenVZ and others, LXC works in the vanilla Linux kernel requiring no additional patches to apply to the kernel source.

Fun fact: that is true, but not from the start. From the First commit you can see it needed patched kernel to run with full functionality:

The LXC relies on a set of functionnalies provided by the kernel which needs to be active. Depending of the missing functionnality the LXC will work with a restricted number of functionnality or will simply fails.

But they worked diligently to implement containers by contributing to the mainstream Linux kernel.

What LXC provided is an easy way to use all those kernel features we talked about, by writing simple configuration file and running CLI command — this gets a shape we are familiar with today.

With this tool for container manipulation, we can make our so-called “deployment manager for our data center” easier. Now imagine you, a software engineer, using LXC for managing containers on some VM in the cloud, or for local development, collaboration, etc.

You can see where this is going.

The Next Big Thing

PaaS was taking over the world. PaaS (Platform as a Service) saves you from managing infrastructure. You get a platform with environment you need — just put code and it will run (think Heroku, AWS Elastic Beanstalk, and originally the Zimki).

One of PaaS companies was dotCloud. Watch Solomon Hykes in June 2011:

While dealing with infrastructure, they had some marvellous idea around which a tool called Docker was built. They made it open source in March 2013. With Docker, they wanted to hide complexity behind some nice HTTP API and CLI.

Take a look at its first commit.

What do you think? A journey of a thousand miles begins with a single step 😅

As you can see, this is Moby commit. Think of Moby as the main repo of all components that Docker product comprises, and also what others can use for building more cool stuff out of it. Read more.

From the first commit you can see LXC was used for actually running containers. So what’s new?

Biggest feature, huge (can’t emphasize this enough), that Docker brought was Images and Layering we know today.

To remind you, if you wanted to run a container you need to give it some filesystem tree (with some rules — rootfs). In that tree, in most cases, you would like to have:

  1. some low-level code available in a container, something that everyone would need to set up for an application like yours;
  2. then, over the top of that, libraries that are used throughout apps in your company;
  3. at the end (top), specific app code.

If we could just put those three parts into layers and reuse them for different containers, but also not allow other containers to overwrite them… Docker enabled exactly that with union mounting technology together with copy-on-write. So, now we can have read-only layers that could be reused, and as the last layer there is a thin container layer — when container wants to change some file it first copies it in his top layer. This is layering and images tech in short. (confusing for the sake of brevity 😄)

Docker also gave us the ability to build our own images. As I said, huge.

In May 2013. dotCloud releases Docker 0.4, and releases the Docker Index, a public registry for finding, publishing, and deploying Dockerized applications.

Docker’s goal was to make containers easy. We can judge now.

Jérôme Petazzoni summerizes on slide 66 (although the whole presentation is valuable):

In October 2013 dotCloud changed the name of the company to Docker Inc. That was the sign in what direction the company wanted to go — full pedal on open source project that was becoming immensely popular. Next year, they sold the PaaS part of the company so they could focus on the open source project.

So, now we know Docker used LXC with an added union file system, and a beautiful way to create and share images. Take a look at the company milestones until Feb 2014.

Collaboration

OK, then?

Then, as containers picked up steam, LXC development (which was pretty much dead, or at least making very slow progress) came to life, and in a few months, there were more LXC versions than in a few years before. This broke Docker a few times, and that’s what led to the development of libcontainer, allowing to directly program cgroups and namespaces without going through LXC.

From dotCloud to Docker, Jérôme Petazzoni

So that brings us to the libcontainer, library inside Docker repository.

Here is a blog post that introduced libcontainer but what it was is basically replacement for LXC, written in Go.

The most important thing that happened with libcontainer: Docker started collaborating and standardizing things. Look at DockerCon 2014 speakers and their companies to see what a good collaboration looks like.

You can see here initial commit. OK, now we have a nice library, written in Go, for running low level containers. And others can use it.

Standardization

Collaboration that started around libcontainer came up with OCI — “The Open Container Initiative is an open governance structure for the express purpose of creating open industry standards around container formats and runtimes.” It is a part of the Linux Foundation Projects. For more information you can read Docker And Broad Industry Coalition Unite To Create Open Container Project.

They first came up with standard for runtime and container (runtime-spec). Runtime specification specifies

  • how to build something that can manipulate containers (with a specified interface for creating, starting, stopping containers)
  • what container configuration should look like (standard way to specify resource and performance isolation).

So, runtime-spec defined how something that operates containers on a low level should look like. As Docker already had libcontainer library, they just made a wrapper around it complying this new specification. It was called runc and was donated to OCI: https://github.com/opencontainers/runc.

If you peek in the initial commit of that repository, you can see that it indeed is libcontainer. Here is the commit of wrapping it with runc — same day OCI was launched.

Now, what Docker did when runc was created? It just continued to import libcontainer and worked with it directly. Makes sense, as runc was only wrapper.

Here is a great series of posts written on runc, by Murat Kilic.

OK, here we are. Containerd?

Fragmentation

It started in December 2015, when we introduced containerd™, a daemon to control runC. This was part of our effort to break out Docker into small reusable components.

Docker 1.11: The first runtime built on containerd and based on OCI technology, Docker Blog

Docker is a lot of things. It has that significant feature of building images, their management, it has very nice API, etc. On top of that (or underneath) it does some standard things that others could reuse. One of those things is image pulling and layer storage in union file system of your choice, and that was separated in containerd.

So, containerd is a daemon that takes care of containers and images (except building them). That makes it “container runtime”, and that makes runc “low level container runtime”.

For details, here is a blog post by Michael Crosby, who you could see as a committer if you opened some links to initial commits.

OK, it is time for a couple of more commits: Initial containerd commit. Couple of days later you can see it calling exec for runc in this commit:

func (r *runcRuntime) Create(id, bundlePath string, stdio *runtime.Stdio) (runtime.Container, error) {
cmd := exec.Command("runc", "--root", r.stateDir, "--id", id, "start")
...

Look at pull request that made Docker use containerd — switch from libcontainer.

containerd has a nice Go client, GRPC API and CLI (mostly for experimenting and debugging). Ivan Velichko shows Why and How to Use containerd From Command Line. Ivan wrote a lot of very useful articles about containers. Check them out.

Thank you so much for reading. I hope this article is helpful and not just confusing to myself from the past 😅

This was a history lesson. Maybe I will write about details in future articles.

Todo (for you)

  • Please DM me on Twitter (@erzeghi) or comment here when you notice some mistake.
  • If you know who first said “chroot on steroids” please let me know in comments. There, I used that famous syntagma.

Extra

  • Pro Tip: Google Search up to some date
  • Pro Tip: archive.org (I am not aware, maybe there is a non-written rule of not using this, as it could be compared to embarrassing pictures on Facebook from 15 years ago)
  • New Guilty Pleasure: Initial commits of gigantic projects

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store