User Namespaces & ID Mapping
This is the engine room of rootless containers. It is the mechanism that allows a process to feel like root inside a container while remaining a standard, unprivileged user on the host.
The Illusion of Root
Inside the container, I want files to look like they are owned by root (UID 0), bin (UID 1), daemon (UID 2), etc.
Outside the container, on the shared host system, I cannot actually let a user own files as root. That would be a massive security hole.
User Namespaces solve this by creating a translation layer (a map) between the UIDs inside the container and the UIDs on the host.
The UID Map (/proc/self/uid_map)
When a process is in a User Namespace, it has a file /proc/self/uid_map.
This file contains three numbers per line:
[Inside-ID] [Outside-ID] [Length]
A Simple Map (Single User)
If I just map my own user:
0 1000 1
- Inside ID 0 (Container Root) maps to Outside ID 1000 (Vagmi).
- Length is 1.
This works for basic things. whoami inside says “root”. But if I try to chown a file to user nobody (UID 65534), it fails. Why? Because I only mapped ONE id. I don’t have permission to be UID 65534 on the host.
The Subuid/Subgid Mechanism
To run a full Linux distro, a container needs to own thousands of UIDs (for postgres, nginx, www-data, etc.).
Since a normal user (UID 1000) only owns one UID, system administrators grant them a range of Subordinate UIDs (subuids).
These are defined in /etc/subuid:
vagmi:100000:65536
This grants user vagmi ownership of 65,536 UIDs starting from ID 100,000.
The Complex Map (Full Distro)
To support a full container, I create a map with two ranges:
- Map Root: Container UID 0 -> Host UID 1000.
- Map Users: Container UIDs 1..65536 -> Host UIDs 100000..165535.
The command newuidmap (a setuid helper binary) writes this to the kernel:
# newuidmap <pid> 0 1000 1 1 100000 65536
Resulting uid_map:
0 1000 1
1 100000 65536
Visualizing the Mapping
| Container Reality (Inside) | Host Reality (Outside) |
|---|---|
root (UID 0) | vagmi (UID 1000) |
bin (UID 1) | 100000 |
daemon (UID 2) | 100001 |
nobody (UID 65534) | 165533 |
The newuidmap Security Gate
You might ask: “Why can’t I just write any mapping I want?”
The kernel allows a user to write to their own uid_map, BUT they can only map UIDs they actually own (which is usually just their own current UID).
To map the range 100,000+, I need privilege.
This is why newuidmap has the SUID bit set (owned by root, executable by users).
newuidmapstarts as root.- It checks
/etc/subuid. - Does
vagmiown the range 100,000-165536? - If yes, it writes the privileged mapping to
/proc/[pid]/uid_map.
This mechanism delegates the ability to isolate specific ID ranges without giving full root access.