Process
  • 10 Jan 2022
  • 3 Minutes to read
  • PDF

Process

  • PDF

Article Summary

Probench is currently hosted by OVH, data centres which have fully redundant power and network infrastructure in addition the services is covered by a 100% hardware and network availability (excluding scheduled maintenance) agreement. Should any failures occur they are to be rectified as soon as possible without additional charge from OVH. We ameliorate the risk of hardware failure by maintaining multiple configured servers, in different data centres.

Backup Plan

Click here to view the complete backup plan.

Scenario 1 – Database Corruption

In the event that the Database gets corrupted or is unreadable to steps to recover are as follows

• Take the site off line and put a holding page up
• Restore the latest Full backup copy with No Recovery
• Restore all log file backups until to the point of corruption
• Ensure the Database has been successfully restored before bringing the site back online

Potential data loss: any data collected after the logs were corrupted

Estimated downtime: 1-4 hours (depending on difficulty in restoring data)

Scenario 2 – Primary Server Failure / Data centre failure

We minimise our exposure to a single point of failure by having a secondary server in another data centre:

  • Prepare an email to to be sent to the clients informing about the downtime and we are investigating on the same.
  • Raising a ticket with OVH server provider informing about the inaccessible server.
  • (After a hour or two) Prepare an email to to be sent to the clients informing about the action we have taken and will keep them posted.
  • If we get an reply from OVH or through twitter then prepare an email to to be sent to the clients explaining the reason behind the disaster.
  • Point the IP address to the secondary server
  • Hosting the site on the secondary server's IIS
  • Put a holding page on the secondary server
  • Creating restore plan in Cloudberry backups to recover full database and files backups from Amazon S3
  1. Sync CloudBerry for latest Data (see the image attached with this article)
    !cb_sync.png!
  2. Just restore the latest database backup
  3. Second, Restore the Files from the folder of Latest Year (So that clients can start working)
  4. Restore from folders of older years
  5. Restore the log folders
  • Restore the latest Full backup copy with No Recovery
  • Ensure the Database has been successfully restored before bringing the site back online
  1. Do not set database files location path in C drive.
  2. Enable service broker
  • Pointing the hosted site to the live Probench code
  • Add new database connection to the Connection config on the secondary server
  • Change the FileStorageRoot to point to the new folder location
  • Install SSL certificates are installed on secondary server
  • If clients are using custom domain names then have to inform their IT team to point to the new server.
  • Once recovered, the secondary server is now the primary server
  • Modify the maintenance plan for the database backup and Cloudberry backup on the secondary server for newly added database.

Potential data loss: Any data collected the day of the event
Estimated downtime: 2-6 hours (depending on difficulty in restoring data)

Scenario 3 – Primary Server completely unavailable (“irrevocably destroyed”)

We cannot use our failed server anymore, so once the site is back up, provision a new server as soon as possible:

  • Follow Scenario 2
  • Provision a new server – in a different data centre to the current primary server – to become the new secondary server

Potential data loss: as for Scenario 2

Estimated downtime: as for Scenario 2

Scenario 4 – The hosting provider disappears

Should OVH become unavailable (e.g. “bankruptcy”, “legal intervention”, “terrorism”), we will restore our data on Amazon’s Web Services platform:

  • Create an elastic IP at Amazon
  • Immediately update the DNS to point to the new IP (will take time to propagate)
  • Create a VM with Windows and SQL Server on Amazon Web Services
  • Place a holding page for inform users once DNS change has spread
  • Copy the Full database backup from the encrypted S3 store
  • Restore the database
  • Copy the application data from the encrypted S3 store
  • Ensure the Database has been successfully restored before bringing the site back Online

Potential data loss: any data collected the day of the event
Estimated downtime: 4-8 hours (due to the time required to copy the data and provision the new servers)


Was this article helpful?