Containers

This came about because one of the companies we collaborated with, were looking into using containers to provide a way to host backend on premise for the devices they manufacture.

Technology

We are only going to consider Linux containers on this page, primarily focusing on Docker, since it is the industry standard these days.

To understand how to secure something, you need to understand a bit about the technology.

Containers are often explained as a form of light-weight virtual machine (VM). While this explanation is good enough for an initial understanding (assuming prior knowledge of virtualization), it is not the full story.

Cgroups & namespaces

Containers are an abstract concept rather than a concrete technology. In most Linux container runtimes, the concept of a container is build on top of two features of the Linux kernel. Those are cgroups and namespaces. The short version is that, namespaces provide isolation for processes and cgroups allows limiting of resources. With these runtimes, containers are just regular processes (or process trees) with isolation and resource control. Therefore, instead of thinking about containers as "light-weight VMs", it is more useful in a security context to think of them as process sandboxes.

Great care should be taking when deploying containers when using something like Docker Engine. Because it is very easy to misconfigure. Docker is not really secure by default. Even with all the appropriate security measures, isolation is not as strong as VMs. In fact some cloud providers (AWS & Fly) "secretly" deploy containers to micro-VMs using technologies like Firecracker.

Images

Containers are commonly distributed as OCI compliant containers images. They consist of some metadata known as Image Manifest and a filesystem changeset.

You have probably heard about layers in docker. When writing a Dockerfile, each instruction becomes a new layer when building. A layer is really a tar archive, optionally compressed with either zstd or gzip. The files in a layer archive represents changes to a filesystem (add, update & remove/whiteout). All the layers for an image stacked on top of each other provides the initial root filesystem for a container. A tool like dive can be used to inspect the content of each layer.

This is important because anything added in a layer can always be extracted from an image, even if the file was removed in another layer. It might be tempting to add secrets (encryption keys etc) in a layer, but then they will then be distributed as part of the images. It is therefore strongly discouraged to reference secrets in Dockerfiles.

Good security requires defense in depth. Meaning to have multiple layers of defenses. When deploying with containers it certainly important to secure the application using best practices for coding and software development. Another layer of defense is securing the container images itself. The blast radius should be kept as small as possible in case a container gets compromised. Care should be taken not to leak secrets by accidentally baking them into container images.

Harden containers

Base image

Using something familiar (Ubuntu, Debian etc) as base image for containers is tempting. It has been the recommendation for a while now, to use smaller base-images such as Alpine. Even though the Alpine image is small, it still contains plenty of tools that attackers use to do further damage after a compromise. For instance, it has a busybox including nc (aka netcat), which are commonly used by attackers to create a backdoors or pivot their attack. They can also just install whatever tool they want with apk.

Using a base image with lots of tools that makes it convenient for debugging, also makes it convenient for attackers.

Compilers and SDKs are needed for building the application, but aren't needed to run it. It is therefore recommended to use multi-stage build.

A distroless base-image should be used for the final build. Distroless images don't have debugging tools, package managers, shells and other such utilities. Using a distroless base image for the final container image, will limit an attackers' ability to do lateral movement after compromising the container.

Attestation

Attestation provides proof of how an image was built and what's in it. Adding attestation as part of the image build process, allows verification that the images is what is claimed, and have not been tampered with.

There are two types of attestation.

Provenance attestation: record facts about the build process, such as when the image was built and how it was produced. See Docker Docs Provenance attestations.
SBOM attestation: contains a list of software components/artifacts inside the container. An SBOM can be used to scan for known vulnerabilities. See Docker Docs SBOM attestations.

The authenticity and integrity of images and attestation can be proved with signatures.

Images should be built with attestation and signed using for example cosign in as part of CI pipeline.

Scanning

There are a number of tools that can do vulnerability scanning on container images. The official solution for Docker is Docker Scout (requires subscription). Some other options we've come across a lot are: Trivy, Grype and Snyk.

There are many other tools and vendors that act in this area. They can vary in the capabilities. It is therefore advisable to do some research. Some of the features that container scanning tools commonly provide are:

Scan image for known vulnerability
Detect (some) misconfiguration
Find secrets baked into images
Software license compliance

Trivy is an open source container image scanner with all the aforementioned capabilities. It might be a good place to start to get familiar with these kinds of tools. Though it is worth noting that Trivy has been involved in a couple of security incidents.

See our page on security testing.

Runtime hardening

Non-root user

By default, Docker run commands inside a container as root. It is advisable to specify another user when running containers. There are a couple of different ways this can be done.

When running a container, you can specify a UID with -u options. Example: docker run -u 4000 alpine.
When writing a Dockerfile, a non-root user should be specified with the USER instruction.
By enabling User namespace remapping on Docker daemon.
By enabling Enhanced Container Isolation (requires subscription).

Availability

Availability is an important security goal. In addition to preventing access to unauthorized users, it is just as important to ensure access for authorized users.

For any long-lived process, there is a chance that at some point, it will become unresponsive. The first step for solving this issue for a container, is to be able to detect when the container has become "unhealthy". Unhealthy - meaning that the container has stopped responding to requests. This is done with healthcheck/liveness probes. A common strategy for dealing with these issues, is simply to restart the container when it stops responding.

How health check/liveness probes are declared, depends a bit on how you run the container.

For Docker, it is called healthcheck. And there are two ways to declare such a check. Either with the HEALTHCHECK instruction in Dockerfile. Or with the healthcheck attribute in Compose file.

It is important that you carefully consider how to implement the health check, as you don't want it to become an attack vector it it self. Healthchecks in Docker runs a command (often curl) inside the container. Including curl, nc etc in a container image also provides a tool that can be abused by attackers.

For Kubernetes, use the liveness command.

Monitoring

There is a plethora of monitoring services such as: Middleware, sematext, datadog, Splunk etc. However, these can be pricey.

For a self-hosted monitoring solution, you can use Prometheus, plus either Grafana or Perses.

Support for observability beyond simple healthcheck probes can be added by implementing the OpenTelemetry standard. It is important though, that you very carefully read the documentation on security. Because otherwise it can provide new attack vectors and break GDPR compliance.

Secrets

It is important to be aware that any secrets set with ENV or ARG in a Dockerfile will persist in the final image. Setting secrets that way is strongly discouraged.

Using secrets with Docker is done via mounts. See Build secrets and Secrets in Compose.

Use a scanning tool to check for secrets in your images. See Scanning section above.

Cloud providers generally provide their own way of managing secrets for containers. See the documentation for your provider for details.

Secure Computing (seccomp)

Docker integrates with seccomp. Which is a feature of the Linux kernel restricting which system calls can be made by a process (or container). See Seccomp security profiles for Docker.

Alternative container runtime

By default, Docker uses containerd which uses runc as the container runtime. It uses the standard Linux features of cgroups and namespaces to facilitate containers.

Depending on deployment scenario it can be beneficial to change the default runtime to provide additional isolation for containers. Some alternatives are:

See Alternative container runtimes.