Disaster recovery
What is an IT disaster recovery plan?
IT disaster recovery plans provide step-by-step procedures for recovering disrupted systems and networks, and help them resume normal operations. The goal of these processes is to minimize any negative impacts to company operations. The IT disaster recovery process identifies critical IT systems and networks; prioritizes their recovery time objective; and delineates the steps needed to restart, reconfigure, and recover them. A comprehensive IT DR plan also includes all the relevant supplier contacts, sources of expertise for recovering disrupted systems and a logical sequence of action steps to take for a smooth recovery.
What Does Simplified Disaster Recovery (SDR) Technology Do?
– Protects key system components required for full server recovery
– Examples include boot volume, system state components, among others
- How Does SDR Technology Add Value?
– Enables bare metal and dissimilar hardware recovery of Windows servers
– Recovery is an automated process leveraging WinPE-based recovery disk
- How Do I Know If Backups Are Enabled With SDR?
– Backup Exec 2014 jobs are enabled for SDR by default
– Backup job selections screen will show green SDR “ribbon” when enabled
- Provides Base Operating Environment for Server Recovery
– Powerful operating system environment based on Microsoft WinPE
– Full networking and Windows command prompt capabilities
– Graphical, automated, wizard-driven recovery experience
– Included with Backup Exec at no additional charge
- Performs Bare Metal and Dissimilar Hardware Restore Tasks
– Enables user to connect to Backup Exec server or local storage device
– Reconstructs server automatically according to restore selections
Benefits of SDR
■ Minimize downtime: The consequences of extended downtime can be severe, not only in terms of lost business and lost productivity, but even in terms of survival for small organizations.
■ Minimize risk: Not having a disaster recovery plan often constitutes an unacceptable level of risk—but simply having a disaster recovery plan in place does not eliminate risk if its reliability is uncertain.
■ Control costs: Traditional disaster recovery plans are often limited in scope because of the costs associated with building and maintaining a recovery site, training staff members in disaster recovery processes, testing those processes, and so on.
What to be need at the time of Backup SDR
- All physical servers, or all critical physical servers, should be protected using SDR-enabled backup jobs
- Ensure recovery disk contains all necessary drivers required for a recovery event; customize as needed
- Create USB versions of recovery media for optimal boot performance when in recovery mode (manual process)
- Perform SDR recovery tests periodically to standby physical hardware or to an isolated virtual machine lab
- Virtual machines are generally not protected via agent-driven, SDR backups; VADP backups preferred
- SDR support (BMR, dissimilar hardware) currently limited to Windows 2003, Windows 2008/R2, Windows 7/8
Causes of Server Failure and Downtime
The danger of server failure is a reality for all IT professionals. There are a variety of events that can cause server failure—and natural disasters are only one example. The list of possible causes of server failure includes the following:
- • User Error – The most common form of server failure is user error. Users are people, and people make mistakes. Whether it’s the end user downloading and installing the wrong application or visiting the wrong websites, or the IT administrator setting down a cup of coffee at the wrong place at the wrong time, the human element consistently leads the way among causes of server failure.
- • Planned Downtime – Planned downtime is another common cause of server downtime. Servers require maintenance in order to perform at an optimal level over a long period of time. Sometimes planned maintenance events can inadvertently lead to server failure when maintenance tasks, for whatever reason, prevent a server from coming back online and operating correctly, or coming back online at all.
- • Hardware Failures – When it comes to hardware failures, it’s not a question of when, but how often. Hardware failures happen on a frequent basis. This can be due to defective hardware, equipment maintenance problems, power-related issues, accidents, and other causes. The risk of hardware failure becomes greater as the size and complexity of a data center increases.
- • Viruses and Malware – Other potential causes of system failure include malicious code designed specifically to exploit security vulnerabilities in IT infrastructure. Both viruses and malware can put servers at risk, even if security software is present and up to date. Some malicious code is designed to destroy data, while others are designed to steal data, and still others are designed to secretly take control of systems and compromise security over a long period of time.
- • Natural Disasters – Natural disasters are also among the threats that can cause system failure, although they are among the most unlikely. Hurricanes, floods, fires, tornados, and other natural events can certainly bring servers down and cause them to fail, and perhaps even physically destroy them.
Cost of Server Downtime
tangible Costs
Lost revenues
Unemployed employees
Loss of productivity
Fines and penalties
Legal costs
Employees working on the outage and trying to fix the problem rather than their day job
Additional Vendor Technical Support/Consulting/Technical Engineer on-site costs (unless within a fix support contract)
Intangible Costs
Potential lost revenue
Loss of contact data
Inventory data, system and data recovery costs
Failed Service Level Agreements
Lost opportunities
Lost potential customers
Loss of existing customer loyalty
Reputation – Brand damage
Goodwill
Share depreciation
Loss of supplier faith
Server Recovery Problems and Obstacles
Complexity of Manual Server Recovery
Manual server recovery can be a time-consuming and tedious process. Typically, manual recovery includes rebuilding a server by reinstalling the operating system, rebooting several times throughout the recovery process, reconfiguring the system, loading backup software, and hoping that no errors have occurred along the way. This process, which can take hours or even days, generally exceeds the capabilities of the average small business.
For larger organizations, the complexity of the server recovery problem can be exacerbated when an organization has one or more remote sites at which servers are located.
The Dissimilar Hardware Problem
Recovering to dissimilar hardware is also essential to effective server protection. It is cost-prohibitive for companies to maintain standby replicas of production server configurations for recovery purposes. Even in situations where standby hardware is available, small variations in hardware builds can cause problems for full server recovery solutions that are not equipped to deal with dissimilar hardware.