Dissecting a Docker Container Image

The Docker container format is revolutionizing the IT world. That statement can be backed up by a lot of facts. But how do Docker images actually work under the hood? Let’s dissect a container image and see what makes it tick.

The Image Structure

For this exploration, I’ll use a PHP demo application I built on CentOS as an example - the jeduncan/php-demo-app image (529.3 MB). When you pull and extract a Docker image, you’ll find it’s essentially a collection of layered tarballs.

Running docker history on this image reveals 26 layers. When you extract the image, you’ll find 26 corresponding directories, each representing one of these layers.

What’s Inside Each Layer?

Each layer directory contains three key components:

VERSION File

A simple text file indicating the schema version being used.

JSON Metadata File

This file is where things get interesting. It stores comprehensive configuration data including:

Environment variables
Exposed ports
Labels
Parent layer IDs
The actual command that was executed to create this layer
The UUID of the container used to run the command

All of this metadata is preserved in a structured format, creating a complete audit trail of how the image was built.

layer.tar Archive

This is the actual filesystem changes introduced by this specific layer. Here’s the key insight: each layer.tar contains only the files that changed in that particular layer, not the entire filesystem.

This is what makes Docker images so efficient - you’re not duplicating entire filesystems 26 times. You’re stacking incremental changes.

Layer Naming and Organization

The layer directories are named using unique identifiers that match the layer IDs you see in docker history output. This isn’t coincidental - it’s how Docker maintains the relationships between layers and ensures the correct assembly order.

How Docker Creates Images

When you build a Docker image, here’s what happens:

Start with a base image
Execute the first command from your Dockerfile
Capture the filesystem changes as a tarball
Record the metadata in JSON format
Create a unique layer identifier
Move to the next command
Repeat steps 2-6 for each instruction
Link everything together with parent references
Store the final manifest
Tag the complete image
Make it available for docker pull
Each layer is individually addressable
Layers are reusable across different images

Why This Matters

Understanding this schema has practical implications. It means Docker images aren’t magic - they follow a documented, reproducible format. This enables:

Alternative tools to build Docker-compatible images
Better understanding of image size and optimization opportunities
Debugging of build issues by examining specific layers
Custom tooling that works with the Docker image format

The beauty of Docker’s approach is that it takes something complex (application packaging and distribution) and breaks it down into a simple, repeatable pattern: tarballs + metadata + unique identifiers.

The Takeaway

Docker container images are remarkably straightforward once you peek under the hood. They’re just a stack of filesystem diffs with metadata, organized by UUID, following a well-defined schema.

No proprietary magic. Just good engineering.

This post originally appeared on my old blog. The concepts remain relevant for understanding container internals, though Docker’s image format has evolved since 2017 with improvements like multi-stage builds and BuildKit.