Fedora Minimization Objective

Objective lead: Adam Samalik (asamalik)

Vision

Thousands of individual and corporate contributors collaborate in the Fedora community to explore new problems and to build a fast-moving modern OS with a rich ecosystem allowing them to experiment on modernising their infrastructure.

The Fedora community produces the number one Linux OS for modern deployments such as containers and IoT. Fedora enables users to build small container and other images with the desired functionality by providing optimised dependency trees of its rich ecosystem, and an ability to fine-tune various installations to achieve the right balance between features and size.

The problem

While Fedora is well suited for traditional physical/virtual workstations and servers, it is often overlooked for use cases beyond traditional installs.

Patching footprint

There is a direct relationship between the installation footprint and attack surface & relevant CVEs. Many users and use cases benefit from reducing the number of patches necessary to keep the system or image updated. Also, reboots are expensive for certain environments and these environments also benefit from a reduced number of patches.

Container Image Size

Image size is certainly not the only factor that matters in the container space but it definitely matters especially as the number of images being maintained and regularly respun grows. The Fedora base image has grown to over 300 MB and is roughly three times larger than other mainstream distributions like Ubuntu, Debian, & openSUSE which range from 91-113 MB.

Package Dependencies

RPMs in Fedora often use hard dependencies for cases where that’s not necessary — making the resulting installation bigger.

Goals

Problem exploration: Explore the problem of dependency trees becoming too big over time and minimize them. The ultimate goal is to make the size of things built on top of our images (containers, ISOs, etc.) smaller.

Packaging optimization: Optimize dependencies of selected use cases to minimize their installation footprints, making the maintenance of their production deployments simpler (less things to worry about), and also their development faster (less things for CI to test).

Data and feedback for smaller images: Provide feedback backed by data to other teams responsible for creating images (containers, ISOs, etc.) to help them maintain the right balance between the image size and including useful content.

Mindshare: Write blog posts, present at conferences, and use other communication channels to show that Fedora cares about minimal but usable installations.

Outcomes

Fedora could become even more popular: Thanks to targeting the emerging use cases such as containers and IoT, and communicating that through different channels, Fedora could become even more popular for these use cases.

Patching footprint minimized: Running Fedora in production will mean having fewer dependencies present on a running system.

Small yet useful container base and application images: Optimized dependency trees and data generated by this objective will help define and build smaller images that still contain useful components that are generally expected to be there.

Better packaging experience: Tooling and services, and potential infrastructure improvements will help packagers to make the right decisions regarding dependencies with less effort.

Faster Fedora CI: Focusing on the minimized installations of the most popular use cases optimized by this objective the Fedora CI will be able to target those specific use cases and test them faster because there will be less packages in total.

Strategy

Technical Strategy

Step zero: focus on the most popular use cases: We will focus on the most popular use cases instead of the distribution in general. Defining these use-cases is part of this objective. We will then optimize those use cases. As a bonus, the area where they overlap will very roughly define a good starting point for content for our images that we potentially look into as a part of one of our stretch goals.

Step one: packages and their dependencies: Inspect various dependency trees of the most popular use-cases. Because multiple packages can potentially provide a certain dependency, an exploration of multiple installations will be necessary. There are multiple ways how to achieve different results — pre-installing certain packages on the target system, providing additional parameters to libsolv regarding weights of individual packages or the criteria of the overall transaction in terms of size and other aspects. This will be explored more deeply.

Step two: files and packaging optimization: Optimize the most viable dependency trees by looking at relations on the filesystem level, searching for ways how to potentially use weak dependencies, %doc macros, and other mechanisms such as splitting packages into smaller units in order to make the final installation smaller, with features being installable as optional.

Usefulness over size: Regarding images, we believe they should be useful first, and minimal second. Producing tiny images without the expected functionality would not be helpful. However, delivering the expected functionality in a minimal image is definitely beneficial.

Images are not our direct goal: We will not create smaller base images directly as a part of this very objective. Instead, we use the data we collect when optimizing the apps and runtimes we focus on and make suggestions backed by data to the people maintaining the images. But we might build some preview images on the side for demonstration purposes.

Using RPM: We’re doing this with RPM, using features such as --nodocs, potentially creating alternative module streams with slimmed-down versions if that makes sense, etc. We’re not achieving minimization by deleting files after installation. This might be obvious, but still worth mentioning.

Execution Strategy

Scripts over manual labour: Write scripts that will perform dependency analysis rather than doing the analysis manually. Even though this means more work short-term, the ability to inspect new versions of content as it appears (and potentially use this script in CI) is more beneficial long-term.

Large changes on the side first: In case there is a substantial change required to be made to the infrastructure or elsewhere, we deploy our own infrastructure using other resources rather than testing in the production environment. We will also do large experiments on the side. We believe that staging is not for experiments, it is to validate production-ready deployments before actually going in to production. We need a space to fail fast. After we test and validate, we will present a change proposal to the community.

Building an environment first: Our primary output is vision, tooling, services, and people being on board. We explore and demonstrate what is possible, socialise our vision and strategy, and develop tooling and services that help packagers make the right decisions. The team will use those tools to achieve results, of course, but enabling members of the community to help with this objective at their own pace is the primary goal.

Mentoring and consulting: With whatever we come up with, we offer active help to Fedora packagers to make sure they’re able to get the full benefit of our tooling and services, etc. We will then write testimonials together explaining what packagers have achieved and how.

Do not reinvent the wheel: This feels very obvious, but from experience keeping this in mind helps us to actively search for existing resources before creating something new.

Deliverables & Timeline

Phase 1: Discovery (summer)

Preliminary discovery and analysis of the most popular use cases (a very basic one has already started on [github asamalik/container-randomness](https://github.com/asamalik/container-randomness) producing a [report](https://asamalik.fedorapeople.org/container-randomness/report.html).

Published blog posts (and maybe videos) about the process, about what is possible, and about our intentions. This will likely result in feedback and engagement from the community.

Talk at Flock 2019 about the objective, featuring some of the discoveries and plans for what’s next.

Specific plans for the next phase.

Phase 2: Experiments (fall)

Blog posts and conference talks about what works and what doesn’t.

A set of use cases with optimized dependencies (potentially built on the side) available.

Initial versions of tooling and services, and a set of guiding principles for packagers to help them make the right decisions.

Specific plans for the next phase.

Phase 3: Stabilization (winter)

Out of the experiments from the previous phase, we will choose the ones to focus on and formulate the plan.

Improved tooling and services that are usable by packagers.

Even more blog posts and conference talks (DevConf.cz and FOSDEM?).

Specific plans for the next phase.

Phase 4: Integration (spring/f32)

A set of requirements for the Fedora Infra and RelEng teams to implement changes that have been proven working and needed.

Potential changes in the Packaging Guidelines applied.

Tooling and services available in Fedora.

The dependency chains of the selected apps and runtimes in Fedora should become smaller at this point.

Active help to packagers in form of mentorship, workshops, and other means.

Testimonials that we write together with packagers that will demonstrate what they have achieved and how.