License audit tooling for Fedora packages

This page describes some of the tools that have been used to audit licensing of packages in Fedora Linux.

Packaging tools

The following tools are specifically designed for use with Fedora Linux packages. They use the Fedora License Data as a source of data on valid licenses.

rpmlint

rpmlint is the standard tool used for evaluating Fedora Linux packages for well-known issues for packagers to fix. In the context of licensing, rpmlint evaluates the License: field in the spec file and ensures the values conform to the known set of allowed licenses.

This is packaged in Fedora Linux as rpmlint.

rpminspect

rpminspect is the tool used to evaluate Fedora Linux packages for policy compliance, differences as compared to previous builds, and common packaging errors as they are built in the Fedora Build System. In the context of licensing, rpminspect evaluates the License: field in RPMs and ensures the values conform to the known set of allowed licenses.

This is packaged in Fedora Linux as rpminspect. To use it, you need both rpminspect and rpminspect-data-fedora.

License and source inspection tools

The following tools have been used by Fedora Project contributors to analyze the licensing of current and proposed Fedora Linux packages. All of these tools are distribution-agnostic.

Licensecheck

Licensecheck is a tool used to analyze the licensing of source files. This tool is principally used in the Fedora context for the initial package review for packages proposed for inclusion in Fedora Linux. Licensecheck is run automatically as part of FedoraReview.

By default, licensecheck provides license reports with full license names, but can be used to produce output using any number of license identifier schemes.

This is packaged in Fedora Linux as licensecheck.

SPDX-license-diff

SPDX-license-diff is a Firefox and Chromium/Chrome plugin that takes license text you highlight on a web page and attempts to find close matches to license identifiers or exception identifiers on the SPDX License List. If a match to an SPDX identifier is presented as less than 100%, SPDX-license-diff will display differences between your highlighted text and SPDX’s plain text rendition of the identifier.

SPDX-license-diff will obviously be inconvenient if there is no web interface to the upstream source repository of your package, or your workflow does not involve use of a web browser.

Another limitation of SPDX-license-diff is that it does not fully implement the SPDX matching guidelines. As a result, SPDX-license-diff will typically show textual differences in cases where the highlighted text actually is a match to the SPDX identifier. In cases of close matches, it is generally useful and often necessary to check the XML file for the SPDX identifier in the SPDX license-list-XML repository. For example, many SPDX identifier XML files make use of regular expressions. Bear in mind that the SPDX matching guidelines include rules which are not necessarily reflected in these XML files.

If SPDX-license-diff identifies a license or exception text as a match to an SPDX identifier, you can then use the SPDX identifier to search in the allowed and not-allowed license lists for Fedora.

SPDX Check License

SPDX Check License is a web application (source code) that displays SPDX License List matches to a license or exception text pasted into a text box. As with SPDX-license-diff, the tool does not fully implement the SPDX matching guidelines. This tool may take more time to give an answer than SPDX-license-diff. It will say whether there is a match, or a close match, to an identifier, but it doesn’t display a diff.

askalano

Askalono, packaged in Fedora as askalono-cli, is a simple license scanning tool written in Rust. It is most useful for quick analysis of packages coming out of ecosystems featuring projects known to have (1) highly standardized approaches to layout of license information (it specifically looks only for files that are named LICENSE or COPYING or some obvious variant on those), (2) generally simple license makeup, and (3) cultural preferences for a highly limited set of licenses (for example, Rust crates that don’t bundle legacy C code, Go modules, or Node.js npm packages).

Askalono has some significant shortcomings. It can’t recognize or understand: (1) license notices/license texts that are comments in source files, (2) license notices/license texts in README files, (3) license files that contain multiple license texts (or it will only recognize the first of them), and (4) nonstandard/archaic/legacy licenses (which covers most of the licenses being reviewed in issues in fedora-license-data)

FOSSology

FOSSology is a license compliance software system and tooklit that includes license scanning. It can be run locally and also can be set up as a hosted service. The FOSSology website provides information on using a test instance hosted by the OSU Open Source Lab.

FOSSology is good for scanning an entire package for licenses (or texts that look like licenses). Files can be viewed easily in the FOSSology interface. FOSSology has the ability to remember past license inspection decisions.

Some tips on using FOSSology: * In Options: #5 - check "Ignore SCM files" #7 - check Monk, Nomos, Ojo License Analysis and Package Analysis ** #8 - check first two options re: "Scanners matches…​" * Go to License Browswer view. Look for license matches that are suspicious or unexpected, such as things that are not an SPDX identifier or ambiguous. You can then view the files with those matches and inspect what was found to determine if there is a license that needs to be recorded or if it is a false match. Basic Workflow has some helpful information.

FOSSology is not packaged in Fedora.

ScanCode toolkit and ScanCode.io

ScanCode toolkit is a command-line Python tool and library for detecting licenses, copyrights, package metadata and related information in source and binary code. ScanCode detects licenses by doing a diff against a database of 37,000 license texts and notices.

ScanCode output includes SPDX documentat and a variety of other formats including JSON, YAML, HTML, CSV, CycloneDX and Debian machine readable copyright files.

ScanCode reports detected license information using valid SPDX license expressions (and also ScanCode’s own extended list of license keys that includes licenses that are not included in the SPDX License List). It applies the SPDX matching guidelines when possible and provides detailed information on the matched texts and detected licenses.

ScanCode toolkit is also embedded in other tools such as FOSSology, tern, ORT, and FOSSLight.

ScanCode toolkit packaging in Fedora is in progress. It can be installed using pip or an app tarball.

ScanCode.io is a web application to run ScanCode toolkit pipelines on multiple projects. A pipeline is a script to organize the code analysis of large codebases, binaries, containers (or single packages) with optional code matching against an index of pre-existing FOSS code. ScanCode.io is useful for scanning an entire package for licenses notices, license texts and "license clues". Files can be viewed easily in the ScanCode.io interface. See Documentation to get started.

ScanCode.io is not yet packaged in Fedora. It can be installed using containers and podman.