Agiloft High Availability Configuration

This article describes a high availability (HA) configuration for those who want to host Agiloft internally. A simple HA configuration features a pair of synced machines with an Nginx Proxy.

The Nginx Proxy discussed here is not provided nor maintained by  Agiloft. It is included in this example configuration for clarity only, as an example of how you might configure your own setup if you host  Agiloft internally.

In this configuration, a primary server runs Agiloft as the default source. When all systems are working, it answers all queries and receives all traffic.

This server uses SQL Replication (E.g.  MS SQL Replication or MySQL Replication) and a filesystem sync (e.g. MS Robocopy on Windows or lsyncd on Linux) to synchronize data from the database and attached files respectively to a secondary replica server. The secondary should be in a different physical location from the primary so that events like earthquakes, fires, or other disasters don't destroy both servers.

A third server acts as a proxy to direct traffic to the appropriate server, the primary by default and the replica when the primary is not available.

Diagram showing the three servers and their relationship, as described

Scope

This setup will protect against failure cases where the primary server is rendered inoperable, such as:

  • Hardware Failure
  • Network issues, internet connectivity or ISP issues
  • Earthquakes, flooding or other natural disasters
  • Scheduled maintenance where downtime isn’t a possibility

Of course, this doesn't protect against data loss due to human error, so this solution should be used in conjunction with a backup system, such as snapshots, database and file system backups, or the Agiloft built-in KB backup system.

Failover and Restoration

A script on the proxy server monitors the primary server for unscheduled downtime. If unscheduled downtime is detected, it triggers a failover condition where the script performs the following actions:

  • Agiloft application is started on the secondary server
  • Incoming traffic from clients is redirected to the secondary server
  • Replication is halted

What constitutes unscheduled downtime could be if the primary server is not accessible over https for five minutes and isn’t in a planned downtime state (such as for scheduled maintenance), but this should be fine-tuned for your needs.

Connected users will notice a brief interruption while Agiloft starts, and they will need to re-enter unsaved data.

During this time, system administrators can work on fixing or replacing the primary server or restoring network connectivity.

Once the primary server is restored, the SQL replication and filesystem sync must be reconfigured to replicate in reverse, with data from the secondary server being synced to the primary server as to not lose work.

Once the primary server has caught up, there are two options:

  1. Leave the servers as they are, with the roles reversed. The secondary server becomes the primary and the former primary becomes the new secondary server.
  2. Schedule a brief downtime to simulate another failover to return to having the primary server as the active version.

Configuration Steps

Setting up this configuration involves the following considerations.

Licenses & SSO

The Agiloft installation on the primary server must include license keys for both the primary and replica servers. Agiloft will automatically activate and use the licenses for the URL and IP Address it is active on. When the primary server is being accessed, that set of licenses is active; when the primary server is inaccessible and traffic is routed to the secondary server, the second set of licenses becomes active automatically.

Likewise SSO has to be configured to work for either IP address. Any external connections such as an SFTP server or will have similar requirements.

Filesystem Sync & SQL Replication

The filesystem of the Agiloft installation should be synchronized from the primary server to the secondary, but it is not necessary to synchronize the operating system itself. Directory Structure details what parts of the directory structure are required for replication.

The filesystem sync should be monitored to make sure that is not falling behind due to bandwidth or other issues. For SQL replication, MS Replication Monitor can be used to warn of sync issues with a MS SQL database; for a MySQL database, monitoring the field “Seconds behind master” within MySQL replication can warn of sync issues.

Nginx Proxy

In order to properly route traffic, clients will hit a proxy that is running another separate instance of Nginx. This redirects clients to the appropriate server depending on the status of the primary server. A separate Nginx instance also provides other benefits, such as being able to display a notice when maintenance is being performed. The Nginx proxy can itself be clustered for further failure resilience.

 

  • No labels