RabbitMQ SOP

RabbitMQ is the message broker Fedora uses to allow applications to send each other (or themselves) messages.

Contact Information

Owner

Fedora Infrastructure Team

Contact

#fedora-admin

Servers

  • rabbitmq0[1-3].iad2.fedoraproject.org

  • rabbitmq0[1-3].stg.iad2.fedoraproject.org

Purpose

General purpose publish-subscribe message broker as well as application-specific messaging.

Description

RabbitMQ is a message broker written in Erlang that offers a number of interfaces including AMQP 0.9.1, AMQP 1.0, STOMP, and MQTT. At this time only AMQP 0.9.1 is made available to clients.

Fedora uses the RabbitMQ packages provided by the Red Hat Openstack repository as it has a more up-to-date version.

The Cluster

RabbitMQ supports clustering a set of hosts into a single logical message broker. The Fedora cluster is composed of 3 nodes, rabbitmq01-03, in both staging and production. groups/rabbitmq.yml is the playbook that deploys the cluster.

Virtual Hosts

The cluster contains a number of virtual hosts. Each virtual host has its own set of resources - exchanges, bindings, queues - and users are given permissions by virtual host.

/pubsub

The /pubsub virtual host is the generic publish-subscribe virtual host used by most applications. Messages published via AMQP are sent to the "amq.topic" exchange. Messages being bridged from fedmsg into AMQP are sent via "zmq.topic".

/public_pubsub

This virtual host has the "amq.topic" and "zmq.topic" exchanges from /pubsub federated to it, and we allow anyone on the Internet to connect to this virtual host. For the moment it is on the same broker cluster, but if people abuse it it can be moved to a separate cluster.

Authentication

Clients authenticate to the broker using x509 certificates. The common name of the certificate needs to match the username of a user in RabbitMQ.

Troubleshooting

RabbitMQ offers a CLI, rabbitmqctl, which you can use on any node in the cluster. It also offers a web interface for management and monitoring, but that is not currently configured.

Network Partition

In case of network partitions, the RabbitMQ cluster should handle it and recover on its own. In case it doesn’t when the network situation is fixed, the partition can be diagnosed with rabbitmqctl cluster_status. It should include the line {partitions,[]}, (empty array).

If the array is not empty, the first nodes in the array can be restartedi one by one, but make sure you give them plenty of time to sync messages after restart (this can be watched in the /var/log/rabbitmq/rabbit.log file)

Federation Status

Federation is the process of copying messages from the internal /pubsub vhost to the external /public_pubsub vhost. During network partitions, it has been seen that the Federation relaying process does not come back up. The federation status can be checked with the command rabbitmqctl eval 'rabbit_federation_status:status().' on rabbitmq01. It should not return the empty array ([]) but something like:

[[{exchange,<<"amq.topic">>},
  {upstream_exchange,<<"amq.topic">>},
  {type,exchange},
  {vhost,<<"/public_pubsub">>},
  {upstream,<<"pubsub-to-public_pubsub">>},
  {id,<<"b40208be0a999cc93a78eb9e41531618f96d4cb2">>},
  {status,running},
  {local_connection,<<"<rabbit@rabbitmq01.iad2.fedoraproject.org.2.8709.481>">>},
  {uri,<<"amqps://rabbitmq01.iad2.fedoraproject.org/%2Fpubsub">>},
  {timestamp,{{2020,3,11},{16,45,18}}}],
 [{exchange,<<"zmq.topic">>},
  {upstream_exchange,<<"zmq.topic">>},
  {type,exchange},
  {vhost,<<"/public_pubsub">>},
  {upstream,<<"pubsub-to-public_pubsub">>},
  {id,<<"c1e7747425938349520c60dda5671b2758e210b8">>},
  {status,running},
  {local_connection,<<"<rabbit@rabbitmq01.iad2.fedoraproject.org.2.8718.481>">>},
  {uri,<<"amqps://rabbitmq01.iad2.fedoraproject.org/%2Fpubsub">>},
  {timestamp,{{2020,3,11},{16,45,17}}}]]

If the empty array is returned, the following command will restart the federation (again on rabbitmq01):

rabbitmqctl clear_policy -p /public_pubsub pubsub-to-public_pubsub
rabbitmqctl set_policy -p /public_pubsub --apply-to exchanges pubsub-to-public_pubsub "^(amq|zmq)\.topic$" '{"federation-upstream":"pubsub-to-public_pubsub"}'

After which the Federation link status can be checked with the same command as before.