|
Packit Service |
fee338 |
Bubblewrap
|
|
Packit Service |
fee338 |
==========
|
|
Packit Service |
fee338 |
|
|
Packit Service |
fee338 |
Many container runtime tools like `systemd-nspawn`, `docker`,
|
|
Packit Service |
fee338 |
etc. focus on providing infrastructure for system administrators and
|
|
Packit Service |
fee338 |
orchestration tools (e.g. Kubernetes) to run containers.
|
|
Packit Service |
fee338 |
|
|
Packit Service |
fee338 |
These tools are not suitable to give to unprivileged users, because it
|
|
Packit Service |
fee338 |
is trivial to turn such access into to a fully privileged root shell
|
|
Packit Service |
fee338 |
on the host.
|
|
Packit Service |
fee338 |
|
|
Packit Service |
fee338 |
User namespaces
|
|
Packit Service |
fee338 |
---------------
|
|
Packit Service |
fee338 |
|
|
Packit Service |
fee338 |
There is an effort in the Linux kernel called
|
|
Packit Service |
fee338 |
[user namespaces](https://www.google.com/search?q=user+namespaces+site%3Ahttps%3A%2F%2Flwn.net)
|
|
Packit Service |
fee338 |
which attempts to allow unprivileged users to use container features.
|
|
Packit Service |
fee338 |
While significant progress has been made, there are
|
|
Packit Service |
fee338 |
[still concerns](https://lwn.net/Articles/673597/) about it, and
|
|
Packit Service |
fee338 |
it is not available to unprivileged users in several production distributions
|
|
Packit Service |
fee338 |
such as CentOS/Red Hat Enterprise Linux 7, Debian Jessie, etc.
|
|
Packit Service |
fee338 |
|
|
Packit Service |
fee338 |
See for example
|
|
Packit Service |
fee338 |
[CVE-2016-3135](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2016-3135)
|
|
Packit Service |
fee338 |
which is a local root vulnerability introduced by userns.
|
|
Packit Service |
fee338 |
[This March 2016 post](https://lkml.org/lkml/2016/3/9/555) has some
|
|
Packit Service |
fee338 |
more discussion.
|
|
Packit Service |
fee338 |
|
|
Packit Service |
fee338 |
Bubblewrap could be viewed as setuid implementation of a *subset* of
|
|
Packit Service |
fee338 |
user namespaces. Emphasis on subset - specifically relevant to the
|
|
Packit Service |
fee338 |
above CVE, bubblewrap does not allow control over iptables.
|
|
Packit Service |
fee338 |
|
|
Packit Service |
fee338 |
The original bubblewrap code existed before user namespaces - it inherits code from
|
|
Packit Service |
fee338 |
[xdg-app helper](https://cgit.freedesktop.org/xdg-app/xdg-app/tree/common/xdg-app-helper.c)
|
|
Packit Service |
fee338 |
which in turn distantly derives from
|
|
Packit Service |
fee338 |
[linux-user-chroot](https://git.gnome.org/browse/linux-user-chroot).
|
|
Packit Service |
fee338 |
|
|
Packit Service |
fee338 |
Security
|
|
Packit Service |
fee338 |
--------
|
|
Packit Service |
fee338 |
|
|
Packit Service |
fee338 |
The maintainers of this tool believe that it does not, even when used
|
|
Packit Service |
fee338 |
in combination with typical software installed on that distribution,
|
|
Packit Service |
fee338 |
allow privilege escalation. It may increase the ability of a logged
|
|
Packit Service |
fee338 |
in user to perform denial of service attacks, however.
|
|
Packit Service |
fee338 |
|
|
Packit Service |
fee338 |
In particular, bubblewrap uses `PR_SET_NO_NEW_PRIVS` to turn off
|
|
Packit Service |
fee338 |
setuid binaries, which is the [traditional way](https://en.wikipedia.org/wiki/Chroot#Limitations) to get out of things
|
|
Packit Service |
fee338 |
like chroots.
|
|
Packit Service |
fee338 |
|
|
Packit Service |
fee338 |
Users
|
|
Packit Service |
fee338 |
-----
|
|
Packit Service |
fee338 |
|
|
Packit Service |
fee338 |
This program can be shared by all container tools which perform
|
|
Packit Service |
fee338 |
non-root operation, such as:
|
|
Packit Service |
fee338 |
|
|
Packit Service |
fee338 |
- [Flatpak](http://www.flatpak.org)
|
|
Packit Service |
fee338 |
- [rpm-ostree unprivileged](https://github.com/projectatomic/rpm-ostree/pull/209)
|
|
Packit Service |
fee338 |
- [bwrap-oci](https://github.com/projectatomic/bwrap-oci)
|
|
Packit Service |
fee338 |
|
|
Packit Service |
fee338 |
We would also like to see this be available in Kubernetes/OpenShift
|
|
Packit Service |
fee338 |
clusters. Having the ability for unprivileged users to use container
|
|
Packit Service |
fee338 |
features would make it significantly easier to do interactive
|
|
Packit Service |
fee338 |
debugging scenarios and the like.
|
|
Packit Service |
fee338 |
|
|
Packit Service |
fee338 |
Usage
|
|
Packit Service |
fee338 |
-----
|
|
Packit Service |
fee338 |
|
|
Packit Service |
fee338 |
bubblewrap works by creating a new, completely empty, mount
|
|
Packit Service |
fee338 |
namespace where the root is on a tmpfs that is invisible from the
|
|
Packit Service |
fee338 |
host, and will be automatically cleaned up when the last process
|
|
Packit Service |
fee338 |
exits. You can then use commandline options to construct the root
|
|
Packit Service |
fee338 |
filesystem and process environment and command to run in the
|
|
Packit Service |
fee338 |
namespace.
|
|
Packit Service |
fee338 |
|
|
Packit Service |
fee338 |
There's a larger [demo script](./demos/bubblewrap-shell.sh) in the
|
|
Packit Service |
fee338 |
source code, but here's a trimmed down version which runs
|
|
Packit Service |
fee338 |
a new shell reusing the host's `/usr`.
|
|
Packit Service |
fee338 |
|
|
Packit Service |
fee338 |
```
|
|
Packit Service |
fee338 |
bwrap --ro-bind /usr /usr --symlink usr/lib64 /lib64 --proc /proc --dev /dev --unshare-pid bash
|
|
Packit Service |
fee338 |
```
|
|
Packit Service |
fee338 |
|
|
Packit Service |
fee338 |
This is an incomplete example, but useful for purposes of
|
|
Packit Service |
fee338 |
illustration. More often, rather than creating a container using the
|
|
Packit Service |
fee338 |
host's filesystem tree, you want to target a chroot. There, rather
|
|
Packit Service |
fee338 |
than creating the symlink `lib64 -> usr/lib64` in the tmpfs, you might
|
|
Packit Service |
fee338 |
have already created it in the target rootfs.
|
|
Packit Service |
fee338 |
|
|
Packit Service |
fee338 |
Sandboxing
|
|
Packit Service |
fee338 |
----------
|
|
Packit Service |
fee338 |
|
|
Packit Service |
fee338 |
The goal of bubblewrap is to run an application in a sandbox, where it
|
|
Packit Service |
fee338 |
has restricted access to parts of the operating system or user data
|
|
Packit Service |
fee338 |
such as the home directory.
|
|
Packit Service |
fee338 |
|
|
Packit Service |
fee338 |
bubblewrap always creates a new mount namespace, and the user can specify
|
|
Packit Service |
fee338 |
exactly what parts of the filesystem should be visible in the sandbox.
|
|
Packit Service |
fee338 |
Any such directories you specify mounted `nodev` by default, and can be made readonly.
|
|
Packit Service |
fee338 |
|
|
Packit Service |
fee338 |
Additionally you can use these kernel features:
|
|
Packit Service |
fee338 |
|
|
Packit Service |
fee338 |
User namespaces ([CLONE_NEWUSER](http://linux.die.net/man/2/clone)): This hides all but the current uid and gid from the
|
|
Packit Service |
fee338 |
sandbox. You can also change what the value of uid/gid should be in the sandbox.
|
|
Packit Service |
fee338 |
|
|
Packit Service |
fee338 |
IPC namespaces ([CLONE_NEWIPC](http://linux.die.net/man/2/clone)): The sandbox will get its own copy of all the
|
|
Packit Service |
fee338 |
different forms of IPCs, like SysV shared memory and semaphores.
|
|
Packit Service |
fee338 |
|
|
Packit Service |
fee338 |
PID namespaces ([CLONE_NEWPID](http://linux.die.net/man/2/clone)): The sandbox will not see any processes outside the sandbox. Additionally, bubblewrap will run a trivial pid1 inside your container to handle the requirements of reaping children in the sandbox. This avoids what is known now as the [Docker pid 1 problem](https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/).
|
|
Packit Service |
fee338 |
|
|
Packit Service |
fee338 |
|
|
Packit Service |
fee338 |
Network namespaces ([CLONE_NEWNET](http://linux.die.net/man/2/clone)): The sandbox will not see the network. Instead it will have its own network namespace with only a loopback device.
|
|
Packit Service |
fee338 |
|
|
Packit Service |
fee338 |
UTS namespace ([CLONE_NEWUTS](http://linux.die.net/man/2/clone)): The sandbox will have its own hostname.
|
|
Packit Service |
fee338 |
|
|
Packit Service |
fee338 |
Seccomp filters: You can pass in seccomp filters that limit which syscalls can be done in the sandbox. For more information, see [Seccomp](https://en.wikipedia.org/wiki/Seccomp).
|
|
Packit Service |
fee338 |
|
|
Packit Service |
fee338 |
Related project comparison: Firejail
|
|
Packit Service |
fee338 |
------------------------------------
|
|
Packit Service |
fee338 |
|
|
Packit Service |
fee338 |
[Firejail](https://github.com/netblue30/firejail/tree/master/src/firejail)
|
|
Packit Service |
fee338 |
is similar to Flatpak before bubblewrap was split out in that it combines
|
|
Packit Service |
fee338 |
a setuid tool with a lot of desktop-specific sandboxing features. For
|
|
Packit Service |
fee338 |
example, Firejail knows about Pulseaudio, whereas bubblewrap does not.
|
|
Packit Service |
fee338 |
|
|
Packit Service |
fee338 |
The bubblewrap authors believe it's much easier to audit a small
|
|
Packit Service |
fee338 |
setuid program, and keep features such as Pulseaudio filtering as an
|
|
Packit Service |
fee338 |
unprivileged process, as now occurs in Flatpak.
|
|
Packit Service |
fee338 |
|
|
Packit Service |
fee338 |
Also, @cgwalters thinks trying to
|
|
Packit Service |
fee338 |
[whitelist file paths](https://github.com/netblue30/firejail/blob/37a5a3545ef6d8d03dad8bbd888f53e13274c9e5/src/firejail/fs_whitelist.c#L176)
|
|
Packit Service |
fee338 |
is a bad idea given the myriad ways users have to manipulate paths,
|
|
Packit Service |
fee338 |
and the myriad ways in which system administrators may configure a
|
|
Packit Service |
fee338 |
system. The bubblewrap approach is to only retain a few specific
|
|
Packit Service |
fee338 |
Linux capabilities such as `CAP_SYS_ADMIN`, but to always access the
|
|
Packit Service |
fee338 |
filesystem as the invoking uid. This entirely closes
|
|
Packit Service |
fee338 |
[TOCTTOU attacks](https://cwe.mitre.org/data/definitions/367.html) and
|
|
Packit Service |
fee338 |
such.
|
|
Packit Service |
fee338 |
|
|
Packit Service |
fee338 |
Related project comparison: Sandstorm.io
|
|
Packit Service |
fee338 |
----------------------------------------
|
|
Packit Service |
fee338 |
|
|
Packit Service |
fee338 |
[Sandstorm.io](https://sandstorm.io/) requires unprivileged user
|
|
Packit Service |
fee338 |
namespaces to set up its sandbox, though it could easily be adapted
|
|
Packit Service |
fee338 |
to operate in a setuid mode as well. @cgwalters believes their code is
|
|
Packit Service |
fee338 |
fairly good, but it could still make sense to unify on bubblewrap.
|
|
Packit Service |
fee338 |
However, @kentonv (of Sandstorm) feels that while this makes sense
|
|
Packit Service |
fee338 |
in principle, the switching cost outweighs the practical benefits for
|
|
Packit Service |
fee338 |
now. This decision could be re-evaluated in the future, but it is not
|
|
Packit Service |
fee338 |
being actively pursued today.
|
|
Packit Service |
fee338 |
|
|
Packit Service |
fee338 |
Related project comparison: runc/binctr
|
|
Packit Service |
fee338 |
----------------------------------------
|
|
Packit Service |
fee338 |
|
|
Packit Service |
fee338 |
[runC](https://github.com/opencontainers/runc) is currently working on
|
|
Packit Service |
fee338 |
supporting [rootless containers](https://github.com/opencontainers/runc/pull/774),
|
|
Packit Service |
fee338 |
without needing `setuid` or any other privileges during installation of
|
|
Packit Service |
fee338 |
runC (using unprivileged user namespaces rather than `setuid`),
|
|
Packit Service |
fee338 |
creation, and management of containers. However, the standard mode of
|
|
Packit Service |
fee338 |
using runC is similar to [systemd nspawn](https://www.freedesktop.org/software/systemd/man/systemd-nspawn.html)
|
|
Packit Service |
fee338 |
in that it is tooling intended to be invoked by root.
|
|
Packit Service |
fee338 |
|
|
Packit Service |
fee338 |
The bubblewrap authors believe that runc and systemd-nspawn are not
|
|
Packit Service |
fee338 |
designed to be made setuid, and are distant from supporting such a mode.
|
|
Packit Service |
fee338 |
However with rootless containers, runC will be able to fulfill certain usecases
|
|
Packit Service |
fee338 |
that bubblewrap supports (with the added benefit of being a standardised and
|
|
Packit Service |
fee338 |
complete OCI runtime).
|
|
Packit Service |
fee338 |
|
|
Packit Service |
fee338 |
[binctr](https://github.com/jfrazelle/binctr) is just a wrapper for
|
|
Packit Service |
fee338 |
runC, so inherits all of its design tradeoffs.
|
|
Packit Service |
fee338 |
|
|
Packit Service |
fee338 |
What's with the name?!
|
|
Packit Service |
fee338 |
----------------------
|
|
Packit Service |
fee338 |
|
|
Packit Service |
fee338 |
The name bubblewrap was chosen to convey that this
|
|
Packit Service |
fee338 |
tool runs as the parent of the application (so wraps it in some sense) and creates
|
|
Packit Service |
fee338 |
a protective layer (the sandbox) around it.
|
|
Packit Service |
fee338 |
|
|
Packit Service |
fee338 |
![](bubblewrap.jpg)
|
|
Packit Service |
fee338 |
|
|
Packit Service |
fee338 |
(Bubblewrap cat by [dancing_stupidity](https://www.flickr.com/photos/27549668@N03/))
|