The Docker container format is revolutionizing the IT world. That statement can be backed up by a lot of facts. But how do Docker images actually work under the hood? Let’s dissect a container image and see what makes it tick.

The Image Structure

For this exploration, I’ll use a PHP demo application I built on CentOS as an example - the jeduncan/php-demo-app image (529.3 MB). When you pull and extract a Docker image, you’ll find it’s essentially a collection of layered tarballs.

Running docker history on this image reveals 26 layers. When you extract the image, you’ll find 26 corresponding directories, each representing one of these layers.

What’s Inside Each Layer?

Each layer directory contains three key components:

VERSION File

A simple text file indicating the schema version being used.

JSON Metadata File

This file is where things get interesting. It stores comprehensive configuration data including:

  • Environment variables
  • Exposed ports
  • Labels
  • Parent layer IDs
  • The actual command that was executed to create this layer
  • The UUID of the container used to run the command

All of this metadata is preserved in a structured format, creating a complete audit trail of how the image was built.

layer.tar Archive

This is the actual filesystem changes introduced by this specific layer. Here’s the key insight: each layer.tar contains only the files that changed in that particular layer, not the entire filesystem.

This is what makes Docker images so efficient - you’re not duplicating entire filesystems 26 times. You’re stacking incremental changes.

Layer Naming and Organization

The layer directories are named using unique identifiers that match the layer IDs you see in docker history output. This isn’t coincidental - it’s how Docker maintains the relationships between layers and ensures the correct assembly order.

How Docker Creates Images

When you build a Docker image, here’s what happens:

  1. Start with a base image
  2. Execute the first command from your Dockerfile
  3. Capture the filesystem changes as a tarball
  4. Record the metadata in JSON format
  5. Create a unique layer identifier
  6. Move to the next command
  7. Repeat steps 2-6 for each instruction
  8. Link everything together with parent references
  9. Store the final manifest
  10. Tag the complete image
  11. Make it available for docker pull
  12. Each layer is individually addressable
  13. Layers are reusable across different images

Why This Matters

Understanding this schema has practical implications. It means Docker images aren’t magic - they follow a documented, reproducible format. This enables:

  • Alternative tools to build Docker-compatible images
  • Better understanding of image size and optimization opportunities
  • Debugging of build issues by examining specific layers
  • Custom tooling that works with the Docker image format

The beauty of Docker’s approach is that it takes something complex (application packaging and distribution) and breaks it down into a simple, repeatable pattern: tarballs + metadata + unique identifiers.

The Takeaway

Docker container images are remarkably straightforward once you peek under the hood. They’re just a stack of filesystem diffs with metadata, organized by UUID, following a well-defined schema.

No proprietary magic. Just good engineering.


This post originally appeared on my old blog. The concepts remain relevant for understanding container internals, though Docker’s image format has evolved since 2017 with improvements like multi-stage builds and BuildKit.