Product SiteDocumentation Site

Chapter 2. RPM Overview

2.1. Understanding the Package File
2.1.1. RPM file format
2.1.2. Binary RPMs and Source RPMs
2.1.3. Source RPMs
2.2. Querying the RPM Database
2.3. Running RPM Commands
2.3.1. Working with the rpm command
2.3.2. Other RPM commands
2.4. Summary
This chapter covers:
Working with RPM packages, files, commands, and databases can be complicated. There are thousands of files, for hundreds if not thousands of packages, installed on your system. You need some way to manage it all. The RPM system can help you do that.
This chapter provides an overview of the components that make up the RPM system for package management: package files, databases, and RPM commands.

2.1. Understanding the Package File

RPM provides for installing, upgrading and removing packages. Typically, each package is an application and all the necessary files associated with that application. For example, the Apache Web server comes with a number of configuration files, a large set of documentation files, and the Apache server itself. All of this fits into one RPM package.
One of the main advantages of the RPM system is that each .rpm file holds a complete package. For example, the following file holds the mlocate package:
mlocate-0.22.2-2.i686.rpm
Based on the naming conventions discussed in Chapter 1, Introduction to RPM, this package represents mlocate package, version 0.22.2, second build of an RPM package, for i686 (Intel) architecture systems.
With a single command, you can copy an .rpm file to another Linux system and install it, getting the complete contents of the package, or you can use other commands to remove or update the package.

2.1.1. RPM file format

RPM files hold a number of tagged data items and a payload, the files to install on your system. The tagged data items describe the package and can contain optional features. For example, the NAME tag holds the package name. The optional PRE tag holds a pre-installation script, a script that the rpm command runs prior to installing the files in the package payload.
Under the covers, RPM package files contain four sections. The first is a leading identification area that marks the file as an RPM package (created with a particular version of the RPM system). The remaining sections are the signature, the tagged data (called the header), and the payload. Each of these sections has important information about the package, although the payload section contains the actual content of the package.
Signature
The signature appears after the lead or identifier section, which marks the file as an RPM file. Like your signature when you sign a check, the RPM signature helps verify the integrity of the package. No, the signature doesn’t check for bugs in software applications. Instead, it ensures that you have downloaded a valid RPM archive.
The signature works by performing a mathematical function on the header and archive sections of the file. The mathematical function can be an encryption process, such as PGP (Pretty Good Privacy), or a message digest in MD5 format.
Header
The header contains zero or more tagged blocks of data that pertain to the package. The header contains information such as copyright messages, version numbers, and package summaries.
Payload
The payload section contains the actual files used in the package. These files are installed when you install the package. To save space, data in the payload section is compressed in GNU gzip format.
Once uncompressed, the data is in cpio format, which is how the rpm2cpio command (introduced in Section 2.3.2, “Other RPM commands” later in this chapter) can do its work.

2.1.2. Binary RPMs and Source RPMs

There are two main types of RPM packages: binary (or applications) and source. A binary RPM has been compiled for a particular architecture. For example, the Apache Web server compiled for an Intel Pentium, or i686, architecture won’t work on a Sharp Zaurus, which runs an Intel ARM processor. To run on both systems, you would need two separate packages: one for the Pentium i686 and one for the ARM.
In addition to binary RPMs, you can get source code RPMs. These RPMs are packages that provide the source code for other packages.

2.1.2.1. Binary RPMs

Binary RPMs hold complete applications or libraries of functions compiled for a particular architecture. Most binary RPMs contain complete applications, such as the Apache Web server or the AbiWord word processor. These application binary RPMs usually depend on a number of system libraries which are, in turn, also provided by binary RPMs.

Finding More Software

Chapter 7, RPM Management Software covers a number of locations where you can find RPM applications galore. Your Linux installation CDs or DVDs are also a great source for applications. Most Linux distributions come with more applications than you can imagine using.
Although most binary RPMs are complete applications, others provide libraries. For example, the Simple DirectMedia Layer library (SDL), which provides graphics for many games, can be packaged as an RPM file. A number of programs, mostly games, use this library for enhanced multimedia such as rich graphics. RPMs that provide libraries allow multiple applications to share the same library. Typically, the libraries are packaged into separate RPMs from the applications.
In addition to binary RPMs that hold applications or libraries compiled for a particular architecture, RPM supports the concept of platform-independent binary RPMs. These platform-independent RPMs, called noarch as a shorted form of “no architecture” dependencies, provide applications or libraries that are not dependent on any platform. Applications written in Perl, Python, or other scripting languages often do not depend on code compiled for a particular architecture. In addition, compiled Java applications are usually free of platform dependencies.

2.1.3. Source RPMs

The mlocate package, mentioned previously, contains the mlocate application used to search for files on the system. The source code used to create this application is stored in an mlocate source RPM, for example:
mlocate-0.22.2-2.src.rpm
By convention, source RPMs have a file name ending in .src.rpm.
Source RPMs should contain all the commands, usually in scripts, necessary to recreate the binary RPM. Having a source RPM means that you can recreate the binary RPM at any time. This is a very important goal of the RPM system.

Source RPMs and Open Source Licencing

Source RPMs have nothing to do with open-source software licenses. Linux is famous for being an open-source operating system. In RPM terms, that means the source code for the Linux kernel and most Linux applications are freely available as source RPMs. But you can also make source RPMs for proprietary programs. The key issue is that you are unlikely to distribute the source RPMs for proprietary packages.
Furthermore, a number of open-source applications are not available as source RPMs. That's a shame, since source RPMs would make these applications easier to install.
While source RPMs hold the commands necessary to create the binary RPM, there may be differences in your Linux environment that would result in rebuilding a binary RPM that is different from the original binary RPM. For example, the compile scripts for some packages may add in optional code depending on which libraries or which versions of libraries are found on your system. Chapter 13, Packaging Guidelines covers many issues in creating RPMs, and Chapter 18, Using RPM on Non-Red Hat Linuxes and Chapter 19, RPM on Other Operating Systems cover issues related to other versions of Linux and other operating systems, respectively. If you follow the guidelines when making your own RPMs, you should result in source RPMs that reproduce binary RPMs as consistently as possible.