Ansible infrastructure SOP/Information.
Fedora infrastructure used to use func and puppet for system change management. We are now using ansible for all system change mangement and ad-hoc tasks.
Ansible runs from batcave01 or backup01. These hosts run a ssh-agent that has unlocked the ansible root ssh private key. (This is unlocked manually by a human with the passphrase each reboot, the passphrase itself is not stored anywhere on the machines). Using 'sudo -i', sysadmin-main members can use this agent to access any machines with the ansible root ssh public key setup, either with 'ansible' for one-off commands or 'ansible-playbook' to run playbooks.
Playbooks are idempotent (or should be). Meaning you should be able to re-run the same playbook over and over and it should get to a state where 0 items are changing.
Additionally (see below) there is a rbac wrapper that allows members of some other groups to run playbooks against specific hosts.
There are 2 git repositories associated with Ansible:
The Fedora Infrastructure Ansible repository and replicas.
This is a public repository. Never commit private data to this repo.
This repository exists as several copies or replicas:
The "upstream" repository on Pagure.
This repository is the public facing place where people can contribute
(e.g. pull requests) as well as the authoritative source. Members of the
sysadmin FAS group or the
fedora-infra Pagure group have commit
access to this repository.
To contribute changes, fork the repository on Pagure and submit a Pull Request. Someone from the aforementioned groups can then review and merge them.
It is recommended that you configure git to use
pull --rebase by
default by running
git config --bool pull.rebase true in your ansible
clone directory. This configuration prevents unneeded merges which can
occur if someone else pushes changes to the remote repository while you
are working on your own local changes.
Two bare mirrors on batcave01,
These are public repositories. Never commit private data to these repositories. Don’t commit or push to these repos directly, unless Pagure is unavailable.
mirror_pagure_ansible service on batcave01 receives
bus messages about changes in the repository on Pagure, fetches these
/srv/git/mirrors/ansible.git and pushes from there to
/srv/git/ansible.git. When this happens, various actions are triggered
via git hooks:
The working copy at
A mail about the changes is sent to sysadmin-members.
The changes are announced on the message bus, which in turn triggers announcements on IRC.
You can check out the repo locally on batcave01 with:
git clone /srv/git/ansible.git
If the Ansible repository on Pagure is unavailable, members of the sysadmin group may commit directly, provided this procedure is followed:
The synchronization service is stopped and disabled:
sudo systemctl disable --now mirror_pagure_ansible.service
Changes are applied to the repository on batcave01.
After Pagure is available again, the changes are pushed to the repository there.
The synchronization service is enabled and started:
sudo systemctl enable --now mirror_pagure_ansible.service
/srv/web/infra/ansibleon batcave01, the working copy from which playbooks are run.
This is a public repository. Never commit private data to this repo. Don’t commit or push to this repo directly, unless Pagure is unavailable.
+ You can access it also via a cgit web interface at: https://pagure.io/fedora-infra/ansible/
This is a private repository for passwords and other sensitive data. It is not available in cgit, nor should it be cloned or copied remotely.
This repository is only accessible to members of 'sysadmin-main'.
With use of run_ansible-playbook_cron.py that is run daily via cron we walk through playbooks and run them with --check --diff params to perform a dry-run.
This way we make sure all the playbooks are idempotent and there is no unexpected changes on servers (or playbooks).
We have in place a callback plugin that stores history for any ansible-playbook runs and then sends a report each day to sysadmin-logs-members with any CHANGED or FAILED actions. Additionally, there’s a fedmsg plugin that reports start and end of ansible playbook runs to the fedmsg bus. Ansible also logs to syslog verbose reporting of when and what commands and playbooks were run.
There’s a wrapper script on batcave01 called 'rbac-playbook' that allows non sysadmin-main members to run specific playbooks against specific groups of hosts. This is part of the ansible_utils package. The upstream for ansible_utils is: https://bitbucket.org/tflink/ansible_utils
To add a new group:
add the playbook name and sysadmin group to the rbac-playbook (ansible-private repo)
add that sysadmin group to sudoers on batcave01 (also in ansible-private repo)
To use the wrapper:
sudo rbac-playbook playbook.yml
The inventory directory tells ansible all the hosts that are managed by it and the groups they are in. All files in this dir are concatenated together, so you can split out groups/hosts into separate files for readability. They are in ini file format.
Additionally under the inventory directory are host_vars and group_vars subdirectories. These are files named for the host or group and containing variables to set for that host or group. You should strive to set variables in the highest level possible, and precedence is in: global, group, host order.
This directory contains global variables as well as OS specific variables. Note that in order to use the OS specific ones you must have 'gather_facts' as 'True' or ansible will not have the facts it needs to determine the OS.
Roles are a collection of tasks/files/templates that can be used on any host or group of hosts that all share that role. In other words, roles should be used except in cases where configuration only applies to a single host. Roles can be reused between hosts and groups and are more portable/flexable than tasks or specific plays.
In the ansible git repo there’s a directory for playbooks. The top level contains utility playbooks for sysadmins. These playbooks perform one-off functions or gather information. Under this directory are hosts and groups playbooks. These playbooks are for specific hosts and groups of hosts, from provision to fully configured. You should only use a host playbook in cases where there will never be more than one of that thing.
This directory contains one-off tasks that are used in playbooks. Some of these should be migrated to roles (we had this setup before roles existed in ansible). Those that are truely only used on one host/group could stay as isolated tasks.
TODO: add steps to make new libvirt virtuals in staging and production
TODO: merge in new-hosts.txt
See: rdiff-backup SOP