Setting up Systemd Nspawn Container

Peter Boy, Jan Kuparinen Verze F34-F36 Last review: 2022-07-05

The systemd-nspawn container runtime is part of the systemd system software. It has been offloaded into its own package, systemd-container, a while ago.

The prerequisite is a fully installed basic system. A standard interface of the host to the public network is assumed, via which the container receives independent access (own IP). In addition an interface for an internal, protected net between containers and host is assumed, usually a bridge. It may be a virtual network within the host, e.g. libvirts virbr0, or a physical interface connecting multiple hosts.

But of course a container can also be operated with other variations of a network connection or even without a network connection at all.

1. Setting up the nspawn container infrastructure

  1. Create a container storage area

    The systemd-nspawn tools like machinctl look for containers in /var/lib/machines first. This directory is also created during the installation of the programs if it does not exist.

    Following the Fedora server storage scheme, create a logical volume, create a file system and mount it to /var/lib/machines. The tools can use BTRFS properties, so this can be used as a filesystem in this case. If you don’t want to follow the Fedora Server rationale, skip this step.

     […]# dnf install btrfs-progs
     […]# lvcreate -L 20G -n machines  {VGNAME}
     […]# mkfs.btrfs -L machines /dev/mapper/{VGNAME}-machines
     […]# mkdir /var/lib/machines
     […]# vim /etc/fstab
     (insert)
     /dev/mapper/{VGNAME}-machines   /var/lib/machines  auto  0 0
    
     […]# mount -a
  2. Check and, if necessary, correct the SELinux labels

    Ensure that the directory belongs to root and can only be accessed by root (should be done by the installer).

    […]# restorecon  -vFr /var/lib/machines
    […]# chown root:root /var/lib/machines
    […]# chmod 700 /var/lib/machines
  3. Adding configuration for nspawn to the etc/systemd directory

     […]# mkdir /etc/systemd/nspawn

2. Creating a nspawn container

2.1 Creating a container directory tree

The creation of a container filesystem or the provision of a corresponding image is treated as "out of scope" by systemd-nspawn. There are a number of alternative options. By far the easiest and most efficient way is simply to use the distribution specific bootstrap tool, DNF in case of fedora, in the container’s directory. This is the recommended procedure.

  1. Creating a BTRFS subvolume with the name of the container

     […]# cd /var/lib/machines
     […]# btrfs subvolume create {ctname}
  2. Creating a minimal container directory tree

    Fedora 34 / 35

     […]# dnf --releasever=35 --best --setopt=install_weak_deps=False --installroot=/var/lib/machines/{CTNAME}/ \
    install dhcp-client dnf fedora-release glibc glibc-langpack-en glibc-langpack-de iputils less ncurses passwd systemd systemd-networkd systemd-resolved vim-default-editor

    F34 installs 165 packages (247M) and allocates 557M in the file system. F35 installs 174 packages (270M) and allocates 527M in the file system.

    Fedora 36

     […]# dnf --releasever=36 --best --setopt=install_weak_deps=False --installroot=/var/lib/machines/{CTNAME}/ \
    install dhcp-client dnf fedora-release glibc glibc-langpack-en glibc-langpack-de iputils less ncurses passwd systemd systemd-networkd systemd-resolved util-linux vim-default-editor

    F36 installs 171 packages (247M) and allocates 550M in the file system.

    CentOS 8-stream

    First create a separate CentOS repository file (e.g. /root/centos.repo) and import CentOS keys.On this basis, perform a standard installation using DNF.

      […]# vim  /root/centos8.repo
       <insert>
       [centos8-chroot-base]
       name=CentOS-8-Base
       baseurl=http://mirror.centos.org/centos/8/BaseOS/x86_64/os/
       gpgcheck=1
       gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-centosofficial
       #
       [centos8-chroot-appstream]
       name=CentOS-8-stream-AppStream
       #baseurl=http://mirror.centos.org/$contentdir/$stream/AppStream/$basearch/os/
       baseurl=http://mirror.centos.org/centos/8-stream/AppStream/x86_64/os/
       gpgcheck=1
       gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-centosofficial
       #
       [epel8-chroot]
       name=Epel-8
       baseurl=https://ftp.halifax.rwth-aachen.de/fedora-epel/8/Everything/x86_64/
       gpgcheck=1
       gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-8
    
      […]# dnf install http://mirror.centos.org/centos/8-stream/BaseOS/x86_64/os/Packages/centos-gpg-keys-8-2.el8.noarch.rpm
    
      […]# rpm -Uvh --nodeps https:/dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm
    
      […]# dnf -c /root/centos8.repo --releasever=8-stream --best  --disablerepo=*    --setopt=install_weak_deps=False --enablerepo=centos8-chroot-base --enablerepo=centos8-chroot-appstream --enablerepo=epel8-chroot --installroot=/var/lib/machines/{CTNAME}  install  centos-release dhcp-client dnf glibc-langpack-en glibc-langpack-de  iproute iputils less passwd systemd  systemd-networkd  vim-enhanced

    This installs 165 packages that occupy 435 M in the file system. The message: install-info: File or directory not found for /dev/null appears several times. The cause is that the /dev/ file system is not yet initialized at this point. You may savely ignore the message.

2.2 Configuration and commissioning of a system container

  1. Setting the password for root

    This requires temporarily setting SELinux to permissive, otherwise passwd will not make any changes.

      […]# setenforce 0
      […]# systemd-nspawn -D /var/lib/machines/{ctname}   passwd
      […]# setenforce 1
  2. Provision of network interfaces for the container within the host

    If only a connection to an internal, protected network is needed (replace the host bridge interface name accordingly):

      […]# vim /etc/systemd/nspawn/{ctname}.nspawn
      (insert)
      [Network]
      Bridge=vbr6s0

    If a connection to the external, public network is also required, two corresponding interfaces must be provided, whereby a mac-vlan is used on the interface of the host for the external connection (again, replace the host interface names accordingly).

      […]# vim /etc/systemd/nspawn/{ctname}.nspawn
      (insert)
      [Network]
      MACVLAN=enp4s0
      Bridge=vbr6s0
  3. Configuration of the connection to the internal network within the container

    […]# vim /var/lib/machines/{ctname}/etc/systemd/network/20-host0.network
     (insert)
     # {ctname}.localnet
     # internal network interface via bridge
     # static configuration, no dhcp defined
     [Match]
     Name=host0*
    
     [Network]
     DHCP=no
     Address=10.10.10.yy/24
     #Gateway=10.10.10.10
    
     LinkLocalAddressing=no
     IPv6AcceptRA=no

    If the internal network is also to be used for external access via NAT, the gateway entry must be commented in. Otherwise do not!

  4. Optionally, configure an additional connection to the public network via Mac Vlan

    In this case, the gateway entry must be commented out in the configuration of the internal network, as mentioned in item 3.

     […]# vim /var/lib/machinec/{ctname}/etc/systemd/network/10-mv.network
      (insert)
      # {ctname}.sowi.uni-bremen.de
      # public interface via mac-vlan
      # static configuration, no dhcp available
      [Match]
      Name=mv-enp*
    
      [Link]
      ARP=True
    
      [Network]
      DHCP=no
    
      # IPv4 static configuration, no DHCP configured!
      Address=134.102.3.zz/27
      Gateway=134.102.3.30
      # without Destination specification
      # treated as default!
      #Destination=
    
      # IPv6 static configuration
      Address=2001:638:708:f010::zzz/64
      IPv6AcceptRA=True
      Gateway=2001:638:708:f010::1
      # in case of issues possible workaround
      # cf https://github.com/systemd/systemd/issues/1850
      #GatewayOnlink=yes
    
      [IPv6AcceptRA]
      UseOnLinkPrefix=False
      UseAutonomousPrefix=False

    Don’t forget to adjust interface names and IP addresses accordingly!

  5. Boot the container and log in

    Check if container boots without error messages

     […]# systemd-nspawn -D /var/lib/machines/{ctname}  -b
     OK Spawning container {ctname} on /var/l…01.
     OK …
     {ctname} login:
  6. Checking the status of systemd-networkd

    If inactive, activate and start the service.

     […]# systemctl  status  systemd-networkd
     …
     […]# systemctl  enable  systemd-networkd
     […]# systemctl  start   systemd-networkd
     […]# systemctl  status  systemd-networkd
  7. Check if all network interfaces are available

     […]# ip a
  8. Check for correct routing

     […]# ip route show
  9. Configure default DNS search path

    Specify a search domain to appended to a unary hostname without domain part, usually the internal network domain name, e.g. example.lan. Adjust the config file according to the pattern below:

     […]# vim /etc/systemd/resolved.conf
    
     [Resolve]
     ...
     #dns.quad9.net
     #DNS=
     #FallbackDNS=
     #Domains=
     Domains=example.lan
     #DNSSEC=no
     ...
  10. Check if name resolution is configured correctly

     […]# ls  -al  /etc/resolv.conf
     lrwxrwxrwx. 1 root root 39 29. Dez 12:15 /etc/resolv.conf -> ../run/systemd/resolve/stub-resolv.conf

    If the file is missing or is a text file, correct it.

     […]# cd  /etc
     […]# rm  -f  resolv.conf
     […]# ln  -s  ../run/systemd/resolve/stub-resolv.conf  resolv.conf
     […]# ls  -al  /etc/resolv.conf
     […]# cd

    Ensure that systemd-resolved service is enabled.

      […]# systemctl status systemd-resolved

    Activate the service if necessary.

      […]# systemctl enable systemd-resolved
  11. Set the intended hostname

     […]# hostnamectl
     […]# hostnamectl set-hostname <FQDN>
  12. Terminate the container

     […]# <CTRL>+]]]
     Container <CTNAME> terminated by signal KILL.

2.2 Configuration and commissioning of an application container

  1. Setting the password for root

    This requires temporarily setting SELinux to permissive, otherwise passwd will not make any changes.

      […]# setenforce 0
      […]# systemd-nspawn -D /var/lib/machines/{ctname}   passwd
      […]# setenforce 1
  2. Configuration of container properties

    Specifying private user configuration and shared network access.

      […]# vim /etc/systemd/nspawn/{ctname}.nspawn
      (insert)
      [Exec]
      PrivateUsers=false
      [Network]
      Private=off
      VirtualEthernet=false
  3. Boot the container and log in

    Check if container boots without error messages

     […]# systemd-nspawn -b -D /var/lib/machines/{ctname}
     OK Spawning container {ctname} on /var/l…01.
     OK …
     {ctname} login:
  4. Checking the status of systemd-networkd

    If active, deactivate the service.

     […]# systemctl  status  systemd-networkd
     …
     […]# systemctl  disable  systemd-networkd
     […]# systemctl  stop   systemd-networkd
     […]# systemctl  status  systemd-networkd
     […]# systemctl  status  systemd-resolved
     …
     […]# systemctl  disable  systemd-resolved
     […]# systemctl  stop   systemd-resolved
     […]# systemctl  status  systemd-resolved

    If file /etc/resolv.conf is a link, remove it.

     […]# rm /etc/resolv.conf

    Create (or edit an existing) file /etc/resolv.conf

     […]# vim /etc/resolv.conf
    
    nameserver 127.0.0.53
    options edns0 trust-ad
    search <YOUR_DOMAIN>
  5. Check if all network interfaces are available

     […]# ip a

    You should see the same interfaces and IP addresses as on the host system.

  6. Check if name resolution is working correctly

     […]# ping spiegel.de
     PING spiegel.de (128.65.210.8) 56(84) bytes of data.
     64 bytes from 128.65.210.8 (128.65.210.8): icmp_seq=1 ttl=59 time=19.8 ms
     ...
  7. Set the intended hostname

     […]# hostnamectl
     […]# hostnamectl set-hostname <FQDN>
  8. Terminate the container

     […]# <CTRL>+]]]
     Container <CTNAME> terminated by signal KILL.

3. Starting the container as a system service for productive operation

  1. Booting the container using systemctl

    In this step, a separate UID/GID range is automatically created for the container.

     […]# systemctl  enable systemd-nspawn@{ctname}
     […]# systemctl  start  systemd-nspawn@{ctname}
     […]# systemctl  status systemd-nspawn@{ctname}

    On first boot after installing systemd-container, a SELinux bug currently (Fedora 34/35) blocks execution. The solution is to fix the SELinux label(s).

    • Select the SELinux tab in Cockpit, preferably before booting the container for the first time.

    • There, the AVCs are listed and solutions are offered, such as:

      type=AVC msg=audit(1602592088.91:50075): avc: denied { search } for pid=35673 comm="systemd-machine" name="48865" dev="proc" ino=1070782 scontext=system_u:system_r:systemd_machined_t:s0 tcontext=system_u:system_r:unconfined_service_t:s0 tclass=dir permissive=0

      The proposed solution is roughly as follows:

      […]# ausearch -c 'systemd-machine' --raw | audit2allow -M my-systemdmachine
      […]# semodule -i my-systemdmachine.pp
    • The operation must be repeated until no SELinux error is reported and the container starts as a service.

      Alternatively, the SELinux CLI tool can be used, which also suggests these solutions.

  2. Enable automatic start of the container at system startup

     […]# systemctl enable systemd-nspawn@{ctname}
     […]# systemctl status systemd-nspawn@{ctname}
  3. Log in to the container

     […]# setenforce 0
     […]# machinectl  login  {ctname}

    When machinectl is called with parameters for the first time, an SELinux bug (Fedora 34/35) also blocks execution. The correction is done in the same way as for the container start.

  4. Completing and finalizing the container configuration

    Within the container, perform other designated software installations and customizations.

    In case of a CentOS 8-stream container, the epel repository should be installed (dnf install epel-release-latest-8) so that systemd-networkd is provided with updates.

  5. Logging off from the container

    After finishing all further work inside the container press <ctrl>]]] ( Mac: <ctrl><alt>666) to exit the container and reactivate SELinux.

     […]# setenforce 1

3.1 Autostart of the container on reboot of the host

An autostart of the container in the "enabled" state fails on Fedora 35 and older. The cause can be seen in a status query after rebooting the host, which issues an error message according to the following example:

 […]# systemctl status systemd-nspawn@CT_NAME
 systemd-nspawn[802]: Failed to add interface vb-{CT_NAME} to bridge vbr6s0: No such device

This means that systemd starts the container before all required network interfaces are available.

Resolution for (physical) interfaces managed by NetworkManager

  1. The service file requires an amendment (Bug #2001631). In section [Unit], for the Wants= and After= configurations, add a target network-online.target at the end of each line. The file must then look like this (ignore the commented out marker rows):

     […]# systemctl  edit systemd-nspawn@  --full
     ...
     [Unit]
     Description=Container %i
     Documentation=man:systemd-nspawn(1)
     Wants=modprobe@tun.service modprobe@loop.service modprobe@dm-mod.service network-online.target
     ###                                                                      ^^^^^^^^^^^^^^^^^^^^^
     PartOf=machines.target
     Before=machines.target
     After=network.target systemd-resolved.service modprobe@tun.service  modprobe@loop.service  modprobe@dm-mod.service network-online.target
     ###                                                                                                                ^^^^^^^^^^^^^^^^^^^^^
     RequiresMountsFor=/var/lib/machines/%i
     ...

    Important is the character "@" after nspawn! In the opening editor make the insertions and save them.

  2. Then execute

     […]# systemctl daemon-reload

At the next reboot the containers will be started automatically.

Resolution for virtual interfaces managed by libvirt

For such interfaces (usually the bridge virbr0) the addition mentioned above does not help. The container must be started by script in an extra step after Libvirt initialization is complete. For this you can use a hook that Libvirt provides.

[…]# mkdir -p /etc/libvirt/hooks/network.d/
[…]# vim /etc/libvirt/hooks/network.d/50-start-nspawn-container.sh
(INSERT)
#!/bin/bash
# Check defined nspawn container in /var/lib/machines and
# start every container that is enabled.
# The network-online.target in systemd-nspawn@ service file
# does not (yet) wait for libvirt managed interfaces.
# We need to start it decidely when the libvirt bridge is ready.

# $1 : network name, eg. Default
# $2 : one of "start" | "started" | "port-created"
# $3 : always "begin"
# see  https://libvirt.org/hooks.html

set -o nounset

network="$1"
operation="$2"
suboperation="$3"

ctdir="/var/lib/machines/"
ctstartlog="/var/log/nspawn-ct-startup.log"

echo " P1: $1 - P2: $2 - P3: $3   @  $(date)  "
echo "     "                                    >  $ctstartlog
echo "======================================="  >>  $ctstartlog
echo " Begin  $(date)  "                        >>  $ctstartlog
echo " P1: $1 - P2: $2 - P3: $3 "               >> $ctstartlog

if [ "$network" == "default" ]; then
  if [ "$operation" == "started" ] && [ "$suboperation" == "begin" ]; then
    for file in  $ctdir/* ; do
      echo "Checking: $file "  >> $ctstartlog
      echo " Filename: $(basename  $file)  "   >> $ctstartlog
      echo " Status: $(systemctl is-enabled systemd-nspawn@$(basename $file) ) "  >> $ctstartlog

      if [ "$(systemctl is-enabled systemd-nspawn@$(basename $file) )" == "enabled" ]; then
        echo " Starting Container $(basename  $file) ...  "  >> $ctstartlog
        systemctl  start  systemd-nspawn@$(basename $file)
        echo "Container $(basename  $file) started"  >> $ctstartlog
      fi
    done
  fi
fi

[…]# chmod +x /etc/libvirt/hooks/network.d/50-start-nspawn-container.sh

You may also use the attached script instead of typing.

4. Troubleshooting

4.1 RPM DB problem in a CentOS 8-stream container on Fedora host

For dnf / rpm queries the error message is displayed: warning: Found SQLITE rpmdb.sqlite database while attempting bdb backend: using sqlite backend

The cause is that Fedora’s dfn, which is used for the installation, uses sqlite while CentOS/RHEL use the Berkeley (bdb) format.

Check configuration within the running container:

 […]# rpm -E "%{_db_backend}"

The output must be bdb. Then fix it executing

 […]# rpmdb  --rebuilddb

4.2 Error message dev-hugepages

You will find message such as

dev-hugepages.mount: Mount process exited, code=exited, status=32/n/a
dev-hugepages.mount: Failed with result 'exit-code'.
[FAILED] Failed to mount Huge Pages File System.
See 'systemctl status dev-hugepages.mount' for details.

DFN installs this by default, but it is not applicable inside a container. It is a general kernel configuration that cannot be changed by a container (at least as long as it is not configurable within namespaces).

The messages can be safely ignored.

4.3 Package update may fail

Some packages, e.g. the filesystem package, may not get updated in a container (error message "Error: Transaction failed"), see also https://bugzilla.redhat.com/show_bug.cgi?id=1548403 and https://bugzilla.redhat.com/show_bug.cgi?id=1912155.

Workaround: Run before update:

 […]# echo '%_netsharedpath /sys:/proc' > /etc/rpm/macros.netshared

When an update has already been performed, execute this command and update the package again.

As of Fedora 35, the bug should be fixed.