Disaster Recovery; The Pain Of Doing It Yourself (DIY)

Disaster Recovery; The Pain Of Doing It Yourself (DIY)

From our previous articles, we can conclude that disaster recovery is for organizations looking to go beyond protecting their bottom lines with on-premise or offsite backup. It is for smart businesses that understand that their networks’, systems’ and applications’ IT resilience is their ability to remain available through and beyond severe unforeseen disruptions of business’ critical processes and the systems that support them.

Organizations with mission-critical applications with low to zero recovery time and point objectives choose disaster recovery in order to ensure service availability and continuity of the business. Such organizations must then furthermore choose between doing it themselves such as in traditional disaster recovery or outsource a cloud service provider for cloud-based disaster recovery (DRaaS)

Traditional Disaster Recovery

Today, businesses do not really practice traditional disaster recovery but merely mimick it. Real traditional disaster recovery is an expensive and rarely practiced strategy for high availability mission-critical applications. In traditional disaster recovery planning, an organization would operate a secondary physical disaster recovery site which could be a colocation facility or data center.

This means that in the secondary disaster recovery site, the organization would have a similar infrastructure, that is, servers, applications and supporting systems as in their primary production site. Depending on the needs of the organization and cost considerations, the secondary disaster recovery site can be hot, warm or cold.

Types of secondary disaster recovery sites

A cold secondary disaster recovery site is essentially an office space with basic utilities such as power, cooling system, air conditioning, and communication equipment, etc. It does not have any pre-installed equipment. In case of a failure at the primary site, it takes a lot of time to properly set it up and make it live so as to fully resume business operations. In case of a disaster, an organization would require help from IT personnel to migrate necessary servers and make them functional in order to take on the workload of the primary site.

A hot site is a backup facility where a mirrored copy of the primary production site exists. It is equipped with all the necessary hardware, software, and network connectivity, which allows you to perform near real-time backup or replication of the critical data. This way the production workload can be failed over or you can easily switch to the DR site in a few minutes or hours. This ensures minimal downtime and data loss. A hot site is expected to be always online and running without disruption so as to ensure data synchronization between the sites.

A warm site is a backup facility that has the network connectivity and the necessary hardware equipment already pre-installed. It cannot perform at the same capacity as the production center because they are not equipped in the same way. It has less operational capacity than the primary site. Moreover, data synchronization between the primary and secondary sites is performed daily or weekly, which can result in minor data loss. A warm site is perfect for organizations that operate less critical data and can tolerate a short period of downtime.

Organizations running the traditional type of DR setup will choose either of the above secondary DR sites based on costs, recovery objectives, the distance between the two sites and the organization’s needs in terms of availability of different processes, systems, applications, and data.

Modernized methods of traditional disaster recovery

As mentioned above, traditional DR is an expensive strategy that is not common among businesses today. However, some organizations that would rather not have cloud-based disaster recovery run various setups that derive a similar concept as that of traditional DR setups and the various types of secondary DR sites involved. They include;

1. ‘Real-time’ database replication

This setup is mainly adopted by organizations with zero RTO and RPO such as banks, SACCOs and hospitals that run critical units. Such organizations operate critical data and mission-critical applications such as FOSA. The term real-time is in quotes as the term real-time is relative. It is not entirely possible to achieve real-time replication under this type of setup. It is affected by the distance between the primary and secondary sites and the delay that arises between the period it takes for data to generate and be copied. Also known as data latencies.

In this setup, the business will have a hot secondary disaster recovery site. The organization will put in place similar infrastructure as that of the primary production site such as similar applications, servers, licenses and all other supporting systems such as network connectivity and replicate their databases only in real-time.

Just like a hot secondary site, the secondary physical infrastructure is live and is always up and running. ‘Real-time’ database replication is achieved in that databases in both the primary and secondary infrastructure is synchronized. This means that after the first full initial replication of the entire database housed in the primary infrastructure, replication is done incrementally in ‘real-time’. Changes to the database are replicated as they occur. Remember, however, the replication process may not be as instant as it is hoped to be. This enables instant failover during failure at the primary site with zero downtime and data loss.

2. On-premise disaster recovery

Sounds contradictory, right? For reasons such as ignorance, misinformation or lack thereof, some organizations operate on-premise disaster recovery setups. In this setup, the organization running mission-critical applications that cannot afford downtime will have an additional backup system (server) similar to their primary system.

For this setup to work, the organization must have a valid backup of their critical data which can be restored on the backup (secondary) server infrastructure during recovery. This will take a few hours which spells more downtime for the organization.

The secondary or backup system (server) operates only during data replication or when there is a disaster. The processes involved in making such a site live when needed will also involve downtime and may have a considerable risk of data loss. It will require configuration in order to function at full capacity.

3. Secondary disaster recovery site at other organization branches

This setup is common in organizations with more than one branch. Such an organization may have a better understanding of the need for offsite backup and therefore chooses to have a physical secondary disaster recovery system at one of its branches away from its premises. The system is an exact replica of its physical primary system but may not function at the same capacity. The organization will then facilitate the copying of critical data from the primary system to the secondary system daily or weekly.

In case of failure of the primary system, we have heard cases where the business would migrate the secondary system to the primary production environment, configure it and then continue with business operations. A process that could take days. In another case, the organization would plan to move its production environment to the secondary location at another branch. Once again, this means days of downtime, high capital expenditure not to mention the losses as a result of downtime.

The pains of modernized methods of traditional disaster recovery

The above are just but some of the setups we have come across over the years of being cloud service providers. At the end of the day, organizations with these setups end up not achieving their main goals. To minimize downtime, to eliminate data loss, and most importantly to reduce capital expenditure while achieving higher operational efficiencies. Instead, they are faced with;

High CAPEX; as a result of being forced to duplicate infrastructure by purchasing additional servers, installing applications and supporting systems such as bandwidth, storage, CPU, operating systems and licenses. Acquiring technical expertise to maintain the secondary DR site/system. In the long run, costs include system upgrades and replacing the system after its lifetime value is depleted. Some organizations go as far as having a backup for their secondary DR systems.

Higher risk of data loss; organizations will choose DIY disaster recovery out of fear of losing data and experiencing downtime yet these setups increase the chances of these occurrences. Did you know upgrading critical systems increases the chances of software/hardware failure and reduces its functionality in the long run?

On-premise disaster recovery puts your organization at risk since both your primary and secondary DR systems are exposed to natural disasters, human error or system failure, just to name a few.

In real-time database replication where databases are synchronized, any occurrence at the primary location such as deletion or corruption is replicated to the secondary site as well. This is where organizations go as far as having a backup for the secondary DR site just to sleep better at night.

Increased IT complexity; even with qualified expertise, managing physical disaster recovery sites is a complex endeavor which in itself led to the development of cloud-based disaster recovery. Your IT personnel can attest to the fact that sometimes infrastructure does not function as it is supposed to.

Setting up and configuring a physical server, for example, should only take a few hours at best but can end up taking days or weeks. Your IT department then spends too much time maintaining your disaster recovery system in addition to your primary systems, therefore, taking time away from other important tasks such as support.

Long recovery hours/downtime; if your DR site is warm or cold, you are guaranteed of at least a few hours to a day or more of downtime. In case of failure at your primary location, how long will it take to mobilize funds to set up, configure and bring your DR site online? Even with a live DR site, have you envisioned the possibility of failover failure for reasons such as network disruptions, power failures, lack of prior testing, hardware or software failures, system crashes and so forth?

Failover testing is complex and can affect current operations. To properly and safely test the secondary DR systems, you will have to shut down your primary production site. This will affect your daily operations and possibly result in losses. It does not allow isolated test environments that cloud-based disaster can provide.

What modern disaster recovery setups do you have? How well is it working for you?

Click on the link below to learn more about how cloud-based disaster recovery solves the pains of traditional DR. Get to understand how Pepea Disaster Recovery as a Service and Pepea Hosted Disaster Recovery are the best solutions for your business continuity plan.