We use DNF Counting to get statistics about the number of Fedora installations.
Fedora Infrastructure Team
#fedora-admin, #fedora-noc, firstname.lastname@example.org
Give interested parties information about the number of Fedora installations.
DNF Counting is a way for us to gather statistics about the number of Fedora
installations, differentiated by version, spin, etc. On the infrastructure
side this is implemented by a bunch of scripts and a Python package
This SOP concerns itself with the infrastructure side of the equation. For any issues with the various frontends logging in to be counted (DNF, PackageKit, …), contact their respective maintainers or upstreams.
Clients (DNF, PackageKit, …) have been modified so they add a
variable in their requests to
mirrors.fedoraproject.org once a week. This
ends up in our webserver log data which lets us generate usage statistics.
Cron jobs are set up on
log01 which collect http log files from the various
web proxies, combine them (accesses to different backend services including
mirrors.fedoraproject.org are scattered across the proxy logs), and produce
statistics from them. The various pieces live in a) the
project (Python package and related scripts to generate statistics from the
log data) and b) shell scripts in the
web-data-analysis role in Ansible:
sync-http-logs.py(Ansible) syncs individual log files from various hosts including proxies to
combineHttpLogs.sh(Ansible) combines the logs for the different web sites which are scattered across the proxy hosts.
mirrorlist.py(Ansible) extracts hosts from the combined log data.
mirrors-countme) which generates statistics.
mirrors-countme) to trim the intermediary database file (
The “traditional“ statistics which were done before DNF learned about the
countmevariable were reimplemented: Count any individual IP sighted, no matter if with or without
countme. This is necessary to count systems which don’t have that feature in their DNF or YUM, and – while giving different numbers – gives us an idea how things develop when compared to the same numbers for more modern OSes.
countme-trim-rawtool was implemented, to trim the intermediary database
raw.dbwhich contains necessary information gleamed from parsing the merged log files. This database grows steadily and – with the brought back counting of any individual IP sighted – quickly, so once these data have been safely turned into the final statistics, we wanted a way to remove them so that the local volume were it is stored doesn’t fill up completely.
The project repository was cleaned up, i.e. large data files used in integration tests were removed because they made cloning the repository unnecessarily slow, for a couple hundred KB of code, the repo was more than 300 MB in size. In the context, the repository was moved from Pagure to GitHub.
Unused code was removed, the remaining code was refactored and condensed to remove redundancies and comprehensive unit tests were added so that the barrier to contributing is lower and changes are less risky.
During the Q3/2021 DNF Counting Initiative, a number of changes were implemented which improved the DNF Counting backend in the areas of monitoring & debugging, performance & robustness.
The involved scripts send messages about state changes and errors to the fedora-messaging bus. State changes are e.g. start and finish of a complete script or of its individual steps.
The shell script which syncs log files from various hosts to
syncHttpLogs.sh) was reimplemented in Python (as
sync-http-logs.py), with several improvements which reduced the time it takes for syncing from 6-7 hours to little more than 30 minutes per day:
All log files for one date of one host are synced in one call to
rsync. This greatly reduces overhead.
The reason to sync these files one-by-one previously was because
rsynconly allows differing file names when syncing single files, which we have: the log files on the hosts contain their date in the name, on
log01they don’t but are stored in directories for each date.
To overcome this limitation,
sync-http-logs.pymaintains a shadow structure of hard links with dates in their names, and
rsyncoperates on this structure instead, which are linked back to "date-less" file names afterwards for further processing.
Because syncing log files from some hosts is pretty slow, several hosts are synced in parallel.
combineHttpLogs.shwere run from individual cron jobs which were set to run a couple of hours apart. Sometimes, this caused problems because the former wasn’t finished when the latter started to run (i.e. a race condition). Now,
combineHttpLogs.share run from one cron job to avoid this.
Previously, the scripts where scattered across the
baseroles. All of the deployment has been consolidated into the
awstatshas been removed.
mirrors-countmePython package and scripts are packaged as RPM packages in Fedora, previously they were deployed from a local clone of the upstream git repository.
Yes, just reboot. Or don’t. There are no continuously running services, everything is regularly run as cronjobs.
sync-http-logs.py script sends relatively verbose output to syslog.
Other than that, the closest anything comes to logs are mails sent if cronjobs
produce (error) output and messages sent to the bus.
The scripts send messages with a topic prefix of
logging.stats to the bus,
in various stages of their operation. If anything doesn’t work as it should,
review if every step started is also finished, compare run times between days.
If anything crashes, cron should have sent mails to the recipients configured
email@example.com), which could also contain valuable
Generated CSV reports and images are in
/var/www/html/csv-reports which are
exposed on https://data-analysis.fedoraproject.org/ – but they get regenerated
with every cycle of the scripts that is run.
All combined http log data is kept on the
/fedora_stats NFS share. Log
files from the proxy hosts are synced to
but these are just copies of what exists elsewhere already.
The scripts only process data from the previous three days (roughly). If they don’t run for a longer time, there might be gaps in the generated statistics which can be plugged by temporarily adjusting the respective settings in the scripts and re-running them.
Here :) and at https://github.com/fedora-infra/mirrors-countme
Yes, but it’s on the
/fedora_stats file share, so it’s assumed to get backed
up regularly already.
mirrors-countme shell and Python scripts create statistics from the
already combined log data.
Prerequisites: A change (bug fix or feature) is available in the
Publish an upstream release
From a clone of the upstream repository:
0.1.2) and commit the change, e.g.:
git commit -s -m "Version 0.1.2" -- pyproject.toml
Tag the previous change with a GPG-signed tag:
git tag -s 0.1.2
Push both the change and the tag:
git push origin main 0.1.2
Create a source tarball (this will be created as e.g.
From the list of tags, select “Create release” in the menu for the respective tag, and attach the created tarball and wheel files to the created release.
Update and Build the
From a clone of the Fedora package repository, in the
Bump the version in
python-mirrors-countme.spec. No other changes are necessary, the packages uses automatic release fields and changelog.
Download the source tarball, either manually or one of:
spectool -g python-mirrors-countme.spec
rpmspectool get python-mirrors-countme.spec
Upload the source tarball to the lookaside cache:
fedpkg new-sources mirrors_countme-0.1.2.tar.gz
Commit the changes to the repository, e.g.:
git commit -s -m "Version 0.1.2" -- python-mirrors-countme.spec
Push the changes and build:
git push && fedpkg build
For any other active Fedora and EPEL branch, fast forward them to the state of the
rawhidebranch, push and build, e.g.:
git checkout epel8 \ && git merge --ff-only rawhide \ && git push \ && fedpkg build
Submit Fedora/EPEL Package Updates
Either submit the update via the Bodhi web interface, or from the command line in the respective checked out Fedora or EPEL branch, e.g.:
fedpkg update --type bugfix --notes 'Put in some notes!'
Tag with Infra-Tags in Koji
Tag the build into the respective infra candidate tag in Koji, e.g.:
koji tag-build epel8-infra-candidate
Check that the build was picked up and signed (this should take no more than a few minutes), e.g.:
koji buildinfo python-mirrors-countme-0.1.2-1.el8
The build must be tagged with the corresponding
Tag the build into the respective infra production tag in Koji, e.g.:
koji tag-build epel8-infra
When the respective infra tag repository is updated, the new version should be ready to be installed/updated in our infrastructure.
Scripts other than what is contained in
mirrors-countme live in the
web-data-analysis role in Ansible. Simply "upgrade" them in place.
The scripts send out status messages over
fedora-messaging with a topic
All of this runs on
log01.iad2.fedoraproject.org and is deployed through the
web-data-analysis role and the
mirrors-countme upstream project publishes source tarballs to their
corresponding releases in the repository on GitHub:
These are packaged in Fedora as the
python-mirrors-countme (SRPM) and
python3-mirrors-countme (RPM) packages.
Other scripts are located directly in the Fedora Infrastructure Ansible
repository, in the
Report bugs with
mirrors-countme at its upstream project:
Anything concerning the cron jobs or other scripts should probably go into our Infrastructure tracker:
The same as anything else that deals with log data.
Want to help? Learn how to contribute to Fedora Docs ›