Fedora 最小化的目标

第二阶段提议

Objective lead: Adam Samalik (asamalik)

现有问题

虽然 Fedora 可以很好的运行在传统的实体或虚拟工作站和服务器上,但对那些传统安装环境之外的使用环境就显得缺乏竞争力了。

像物联网还有容器等当下的一些部署方式对系统大小要求较高。比如物联网环境下,数据连接量通常比较低(针对更新或管理),但在云服务或者容器环境下,却有着大数量级的数据交换。

举个实际的例子,像 Systemd 这样很有用而且在实体系统中经常出现的程序,却很少在容器中使用。问题并不在于软件包只是需要 Systemd 的 systemd-sysusers 来创建用户。只是在容器中,这意味着系统尺寸的明显增加。

此外,基本上所有部署情形都可受益于尺寸的减小。因为安装所用的空间大小和攻击范围以及相关关键漏洞都有直接关系。

前景

Thousands of individual and corporate contributors collaborate in the Fedora community to explore new problems and to build a fast-moving modern OS with a rich ecosystem allowing them to experiment on modernising their infrastructure.

任务

协助开源开发人员、系统管理员和 Linux 分发维护人员关注与他们相关的事项。

结果

Fedora is a popular platform because its ecosystem is both cutting-edge and well optimized for modern deployments such as IoT and containers. That makes many people use Fedora rather than to build and assemble their own artifacts directly from upstream projects. And that relieves the pressure on open source developers caused by users who would otherwise ask for their specific security and other issues to be fixed quickly.

因此:

  • 开源开发人员能够注重特性开发

  • Sysadmins can easily consume pre-built bits that also get regular updates

  • Fedora contributors (vendors and individuals) can collaborate within the Fedora community on exploring and developing open source solutions to problems of the future

Outputs

Specific use cases are defined in Fedora. The community then focuses on those use cases with development and maintenance, optimisation (like minimisation), and testing (like CI and gating). These use cases can be transparently prioritized for infrastructure resources based on community interests.

Feedback Pipeline actively monitors each use case and records the size and the dependencies required for it to run. Data history is kept and shown to see changes over time. And to keep things small over time, Feedback Pipeline also automatically detects size increases and potentially automatically opens Bugzilla bugs to track/fix/justify such increases transparently.

An active focus on minimization means that our maintainers produce size-optimised content with the same or lower amount of effort. Tooling, services, and data help them to make the right decision about dependencies easily, and to keep things smaller over time.

Actions

Identify relevant use cases and allow the community (meaning not just the Minimization Team) to define their own. We think of a use case as a set of packages installed in a specific context, having a specific purpouse — such as Apache HTTP Server Container. Define use cases at least for:

  • httpd

  • nginx

  • MariaDB

  • PostgreSQL

  • Fedora IoT

  • Python 3

Also, consider looking at container-native use cases, such as:

  • GO for container apps

  • Rust for container apps

  • Quarkus

Collect specific use cases by talking to people at tech events, internet forums, and any other viable venues.

Extend monitoring services (Feedback Pipeline) that:

  • Visualize dependencies and a total size for each use case

  • Monitor size changes over time

  • Auto-detect large size changes

  • Notifies maintainers about unexpected size increases

除了新功能,我们还需要

  • 创建测试,以大幅简化贡献复杂度

  • do performance optimizations for the service to scale well

  • explore the use of CI and Rawhide Gating

Being able to see what’s going on is a prerequisite of implementing any changes. Seeing all the relevant opportunities helps us to focus on the ones having the most impact, and a transparent tracking helps us prove the usefulnes of our work, and to further focus on the most impactful activities.

Minimize the installation size of the use cases by optimizing RPM dependencies, features, software architecture, and other factors. Specifically, look for:

  • 不必要的 RPM 依赖(尽管这种情形可能不是很多)

  • Multiple implementations of the same functionality required by various packages — try to make them use the same one

  • Context-specific requirements — such as requiring Systemd on traditional deployments being fine vs. requiring it in containers means significant size increases. Leverage weak dependencies in those cases (that might require code changes).

  • Dependencies on large things while only using a fraction of the functionality — such as requiring the whole Perl stack to run a single script — such script can be rewritten to Python which is everywhere mostly because of DNF

Engage with upstream developers regarding bigger changes in packaging and architecture. An example is Systemd and splitting the systemd-sysuser package.

Implement process and policy changes reflecting bigger, more general changes. Again, a good example is using Systemd in containers, or the general issue of creating users in containers.

Provide guidance for the Fedora community in form of blog posts, videos, and conference talks. Even though we might have guidelines and policies in place, spreading the word is always important.

Resources and Inputs

Cloud resources to prototype services. We are not going to change the existing Fedora infrastructure in any way before whatever we develop proves useful and worth the hustle of stabilization and changing production.

No existing Fedora Infra or Release Engineering resources are needed at the moment. However, we might need help with setting up (or getting access to) the cloud resources.

Active support from our maintainers, the FPC, and other community members is definitely needed. This is obviously not something we can "request", but it’s still a necessary input.

指导原则

Usefulness over size: There is a balance between the usefulness and size. We take that in mind and will not implement drastic changes that would prevent our users from using Fedora. However, nothing prevents us from producing additional very specific and minimal artifacts.

Using RPM: We’re doing this with RPM. We’re not achieving minimization by deleting files after installation. This might be obvious, but still worth mentioning.

First Phase Accomplishments

See the status page for detailed info and historic weekly updates. Summary below.

Better understanding — Yes, we now have much better understanding of the problem and a better, more specific idea about the next steps.

Feedback Pipeline — A service that monitors use cases for size and dependencies. Includes various views in tables and interactive dependency graphs.

Systemd and containers — We dag into the issue of Systemd vs. containers, especially for packages requiring it just to create users in containers using systemd-sysuser. Working with upstream on splitting the package out. Thought about, but not yet proposed, a wider policy around this.

Policy thinking:

  • 一:如果软件包只是用 systemd 启动服务的话,则该软件包应仅“推荐”使用 systemd。这可让容器安装该软件包时无需安装 systemd。

  • 二:如果一个程序只使用 systemd 的库,那么应只要求 systemd-libs。比如:libusb

  • C - If a package wants to use systemd-sysusers to create users/groups, only require systemd-sysusers.  (NOTE:  This subpackage isn’t implemented yet)

initial-setup — If an image is built without users, there needs to be some way to add a user at startup.  initial-setup does a good job of that, but at the expense of size.  It pulls in anaconda-tui and anaconda-core.  Those two packages then commence to pull in a lot of other, rather large, packages. This is for the IoT images, as well as others. We currently do not have a recommendation, but it is being worked on.

Use pcre2 instead of pcre — The minimization effort is trying to trim things down to just one pcre, and that is pcre2.

Polkit and mozjs60 — Let’s explain this one with a terrible analogy! Polkit is this lovely person (.5M) that rings your doorbell and says they will wash the windows of your house.  After you agree, they bring out their elephant (mozjs60 30M) and use it to spray your windows with water. Polkit pulls in mozjs60, which is a rather large package. So, we’re trying to sort this one out, too.