Ansible: A Rollback Scenario

Table of contents:

This is a post about making ansible more declarative, one step at a time.

I find Ansible to be good enough for configuration management ¹. But I always have reservations: For instance, although it is supposed to be declarative so that a developer defines the state of configuration, it is not always the case. Here are some examples:

When you rename a file created by Ansible that defines a configuration in a conf.d type of directory, the old file is not deleted, unless you do something about it.
When a line in a configuration file is no longer required, it’s not enough just to un-declare it. You need to make sure it is deleted.

Therefore, even if you manage the whole configuration with Ansible, it is possible to deviate away from the desired configuration over time. Old configurations, files, packages, network ports may stay there. I acknowledge that it requires quite a bit of abstraction and complexity to make it easy for the user of the configuration management tool, but there are ways to be more confident that configuration changes don’t leave old configurations unmanaged. Unfortunately, they are viable on a case-by-case basis.

An example: Consider an Nginx reverse proxy installation used as an SSL accelerator. Initially, there are a bunch of virtual hosts to add, so that each request goes directly to the backend without any alteration but only HTTPS protocols become HTTP.

I start with a playbook that contains something like this:

- hosts: nginx
  roles:
    - role: ssl_accelerator
      vars:
        virtual_hosts:
          - name: host1
            domain: host1.localdomain
            destination: http://10.0.0.1
          - name: host2
            domain: host2.localdomain
            destination: http://10.0.0.2:8080

The relevant parts of the role are:

- name: virtual hosts
  template:
    src: virtual.conf.j2
    dest: "/etc/nginx/conf.d/99-virtual-{{ item.name }}.conf"
  loop: "{{ virtual_hosts }}"
  notify: restart nginx

The virtual.conf.j2 template is:

server {
  listen 443 ssl;
  server_name {{ item.domain }};

  location / {
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
    proxy_pass {{ item.destination }};
    proxy_read_timeout 90;
  }
}

In a CentOS environment, this will end up creating additions to Nginx configuration to serve virtual hosts.

Adding new virtual hosts is straightforward: Just add a new item inside the playbook, re-run the playbook, and you’re done.

How about removing an item? Let’s imagine we don’t want to serve host1 anymore. Removing it will not change anything, since the file will still be there on the target host. One solution is to add another task to delete files before creating required files, something like this:

- name: delete virtual hosts
  file:
    path: "{{ item }}"
    state: absent
  loop: "{{ virtual_hosts_to_delete }}"
  notify: restart nginx

This should work but will (1) destroy the declarative nature of your playbook, (2) need you to maintain a separate list of deleted hosts that you shouldn’t maintain at all.

The solution I found is something like this:

- name: find old virtual hosts
  find:
    paths: /etc/nginx/conf.d
    file_type: file
    patterns:
      - "99-virtual-*"
    excludes: "{{ virtual_hosts | map(attribute='name') | map('regex_replace', '^(.*)$', '99-virtual-\\1.conf') | list }}"
  notify: restart nginx
  register: find_output

- name: remove old virtual hosts
  file:
    path: "{{ item.path }}"
    state: absent
  loop: "{{ find_output.files }}"
  notify: restart nginx

This also makes sure any files that are created out of Ansible are also deleted, preventing deviation in configuration management.

In order to find similar solutions to different problems like this:

Try to confine the solution to create files that are identifiable. In the example above, we made sure all the files created by this playbook start with 99-virtual- and end with .conf, so that other configuration files are not affected. There are still exceptions that may happen (other files that act as virtual host definitions but don’t conform to the naming convention we invented) but I keep this edge case out for the sake of example.
Try to make it so that failure doesn’t result in chaos. It’s perfectly OK to delete all files that start with 99-virtual-, re-create required files, and restart Nginx at the end. But because these steps are not atomic, you may end up with no virtual hosts at all if things go wrong on one of the steps. The rules I described above will work before or after creating the virtual hosts and won’t result in intermediate invalid states on the host.
The exclude line sure looks scary. Get proficient in data manipulation in Ansible. A good place to start is here.
When writing tasks or roles, consider rollback scenarios. This way of thinking will result in more maintainable recipes.

I find that CFEngine has a more holistic idea of configuration management. The price to pay is a steeper learning curve. ↩︎