The Docker container format is revolutionizing the IT world. That statement can be backed up by a lot of facts. But how do Docker images actually work under the hood? Let’s dissect a container image and see what makes it tick.
The Image Structure
For this exploration, I’ll use a PHP demo application I built on CentOS as an example - the jeduncan/php-demo-app image (529.3 MB). When you pull and extract a Docker image, you’ll find it’s essentially a collection of layered tarballs.
Running docker history on this image reveals 26 layers. When you extract the image, you’ll find 26 corresponding directories, each representing one of these layers.
What’s Inside Each Layer?
Each layer directory contains three key components:
VERSION File
A simple text file indicating the schema version being used.
JSON Metadata File
This file is where things get interesting. It stores comprehensive configuration data including:
- Environment variables
- Exposed ports
- Labels
- Parent layer IDs
- The actual command that was executed to create this layer
- The UUID of the container used to run the command
All of this metadata is preserved in a structured format, creating a complete audit trail of how the image was built.
layer.tar Archive
This is the actual filesystem changes introduced by this specific layer. Here’s the key insight: each layer.tar contains only the files that changed in that particular layer, not the entire filesystem.
This is what makes Docker images so efficient - you’re not duplicating entire filesystems 26 times. You’re stacking incremental changes.
Layer Naming and Organization
The layer directories are named using unique identifiers that match the layer IDs you see in docker history output. This isn’t coincidental - it’s how Docker maintains the relationships between layers and ensures the correct assembly order.
How Docker Creates Images
When you build a Docker image, here’s what happens:
- Start with a base image
- Execute the first command from your Dockerfile
- Capture the filesystem changes as a tarball
- Record the metadata in JSON format
- Create a unique layer identifier
- Move to the next command
- Repeat steps 2-6 for each instruction
- Link everything together with parent references
- Store the final manifest
- Tag the complete image
- Make it available for
docker pull - Each layer is individually addressable
- Layers are reusable across different images
Why This Matters
Understanding this schema has practical implications. It means Docker images aren’t magic - they follow a documented, reproducible format. This enables:
- Alternative tools to build Docker-compatible images
- Better understanding of image size and optimization opportunities
- Debugging of build issues by examining specific layers
- Custom tooling that works with the Docker image format
The beauty of Docker’s approach is that it takes something complex (application packaging and distribution) and breaks it down into a simple, repeatable pattern: tarballs + metadata + unique identifiers.
The Takeaway
Docker container images are remarkably straightforward once you peek under the hood. They’re just a stack of filesystem diffs with metadata, organized by UUID, following a well-defined schema.
No proprietary magic. Just good engineering.
This post originally appeared on my old blog. The concepts remain relevant for understanding container internals, though Docker’s image format has evolved since 2017 with improvements like multi-stage builds and BuildKit.