mooc-notes

Notes from online courses

View on GitHub

Using AWS for On-Premises Backup & Disaster Recovery

Introduction

With an on-premises data backup solution within your data center, it’s critical for your business to have a disaster recovery plan built into your business continuity plans. You need to have a plan in place should a disaster occur that affects your operation of the business. The same is true when you start to leverage the cloud for its storage capabilities for your backed up data.

This course explains how cloud storage fits in with DR and the different considerations when preparing to design a solution to back up your on-premises data to AWS. It will explain how Amazon S3, AWS Snowball, and AWS Storage Gateway can all be used to help with the transfer and storage of your backup data.

This course where I shall be focusing on how some of the AWS storage services can help you with disaster recovery and data backup of your on-premise production resources resulting in an effective business continuity plan by preventing data loss, whilst at the same time reducing your RTO and RPO.

Course agenda


Disaster Recovery

Where does Cloud Storage fit in DR?

The sooner your systems are operational and readily available to  resume services, the better for you as a business and your customers.

Now herein lies the problem with traditional data backup solutions.  The data you need might not be available to you:

Issues with traditional Backup methods

It’s no secret that using cloud storage services can be considerably cheaper as a backup solution than that of your own on-premise solution. But cost aside, the speed in which you can launch an environment within AWS to replicate your on-premise solution, with easy access to production backup data, is of significant value to many organizations.

Benefits of Cloud Storage

Example

Now let’s imagine this production environment, in your local data  center, experienced an outage. You could perform the following steps to  quickly recover from the disaster. You could launch a new environment with new instances, based on your  own custom AMIs, complete with custom applications within a VPC. Then  you could create EBS storage volumes, based on your backed up data, from  within AWS, and attach these volumes to those instances. You now have  your application servers up and running, with the latest production  data. Couple this with some minor networking components, and you could  then communicate with other sites you may have running.


Considerations when planning an AWS DR Storage Solution

Balance

There is a fine line between how you architect your data storage needs, which must be fit for a purpose or the data it holds, but it may also have to conform to specific governance and compliance regulations for DR.

So determining which solution or service to use to store the data and which solution to use to ensure you can recover your data effectively in the event of a disaster is a balance.

RTO and RPO

From a DR perspective, this is largely down to the particular RTO and RPO for the environment you are designing.

Depending on the values of these, it can help you select the most  appropriate storage method. For example, if your RTO was an hour, then  restoring data from Amazon Glacier may not be effective, as it can take a  number of hours to process, depending on your retrieval method.

How will you get data in/out of AWS?

Direct Connect If you have a Direct Connect connection to AWS, then you can use this to move data in and out of the environment, which can support connectivity of up to 10 gigabits per second.

VPN Connection If you don’t have a direct connection link between your data center and AWS then you may have a hardware or software VPN connection which could also be used.

Internet Connection Now if you don’t have either of these as connectivity options then you can use your own internet connection from the data center to connect and transfer the data to AWS. Depending on how much data you need to move or copy to AWS, then these lines of connectivity may not have the required bandwidth to cope with the amount of data transferred.

Large Data Transfer

AWS Snowball

For extreme situations, AWS offers an even larger solution

Snowmobile
Storage Gateway

Acts as a gateway between your on-premise storage and the AWS environment

How quickly do you need your data back?

How much data do you need to import and export?

Durability

When looking at the durability of a data backup, you’ll need to  ascertain the criticality of that data to ensure it offers the most  suitable resiliency and redundancy. For example, if I look at the Amazon  S3 service it has the following classes available:

Security

A key focus for any data you store in the Cloud is security. Ensuring  that your data has the right level of security safeguarding it from  unauthorized access is fundamental, especially if it contains sensitive  information such as customer data. You may need to abide by specific governance and compliance controls  and so you need to ensure that where you store your data in the Cloud is  able to offer the correct functionality to ensure your data remains  compliant. When working with sensitive information, you must ensure that you  have a means of encryption both in-transit and when at rest. You should  understand how your selected storage method operates and manages data  encryption if this level of security is required. A sound understanding of Cloud storage access security is a must for  your support engineers, who will be maintaining the environment. If security is not configured and implemented correctly at this  stage, it could have devastating and damaging effects to you as a  business should the data be compromised and exposed in any way, which  has already happened to many organizations who failed to understand the  implications of their security controls.

Compliance

Compliance comes into play specifically when looking at security of  your data. There are a number of different certifications, attestations,  regulation, laws, and frameworks that you may need to remain compliant  against. To check how AWS storage services stack up against this governance,  AWS has released a service called AWS Artifact, which allows customers  to view and access AWS Compliance Reports. These are freely available to  issue to your own auditors, to help you meet your controls. The service itself is accessed by the AWS Management Console, and all  of the reports available are issued by external auditors to AWS  themselves, and within each report it will contain the scope indicating  which services and region the report is associated with.


AWS Storage Services

Using Amazon S3 as a Data Backup Solution

As you probably know, Amazon S3 is a highly available and durable service, with huge capacity for scaling. It can store files from 1byte in size, up to 5TBs, with numerous security features to maintain a tightly secure environment.

This makes S3 an ideal storage solution for static content, which makes Amazon S3 perfect as a backup solution.

Amazon S3 provides three different classes of storage, each designed to provide a different level of service and benefit.

S3: Standard

S3: Standard Infrequent Access

S3: Amazon Glacier

Amazon Glacier: Data Retrieval

Costs

Among Standard, Standard - IA and Glacier though, Glacier is the cheapest.

S3: Cross Region Replication (CRR)

Depending on specific compliance controls, there may be a requirement to store backup data across a specific distance from the source. By enabling Cross Region Replication, you can maintain compliance whilst at the same time still have the data in the local region for optimum data retrieval latency.

Enable it for the following reasons:

S3: Performance

From a performance perspective, S3 is able to handle multiple concurrent, and as a result, Amazon recommends that for any file that you’re trying to upload to S3, larger than 100MB, than you should implement multipart upload. This feature helps to increase the performance of the backup process.

There are a number of benefits to multipart upload. These being,

Security

Getting your data into AWS and on to S3 is one thing, but ensuring that it can’t be accessed or exposed to unauthorized personnel is another. More and more I hear on news feeds where data has been exposed or leaked out into the public, due to incorrect security configurations made on S3 buckets, inadvertently exposing what is often very sensitive information to the general public.

Some of these security features, which can help you maintain a level of data protection are:


Using AWS Snowball for Data Transfer

Encryption & Tracking

Compliance

Data Aggregation

When sending or retrieving data, Snowball appliances can be aggregated together. For example, if you need to retrieve 400 terabytes of data from S3, then your data will be sent by five 80 terabyte Snowball appliances.

So from a disaster recovery perspective, when might you need to use AWS Snowball? Well, it all depends on how much data you need to get back from S3 to your own corporate data center and how quickly you can do that. On the other hand, how much data do you need to get into S3?

This’ll depend on the connection you have to AWS from your data center. You may have direct connect connections, a VPN, or just an internet connection. And if you need to restore multiple petabytes of data, this could take weeks or even months to complete.

As a general rule, if your data retrieval will take longer than a week using your existing connection method, then you should consider using AWS Snowball.

AWS Snowball Process

  1. Create an export job
  2. Receive delivery of your appliance
  3. Connect the appliance to your network
  4. Connect the appliance to your network when appliance is off; Power on and configure network settings
  5. You are now ready to transfer the data
  6. Access the required credentials; Install the Snowball Client; Transfer the data using the client; Disconnect appliance when the data transfer is complete
  7. Return snowball appliance to AWS

Using AWS Storage Gateway for on-premise data backup

Storage Gateway allows you to provide a gateway between your own data center’s storage systems such as your SAN, NAS or DAS and Amazon S3 and Glacier on AWS.

File Gateways

Volume Gateways

Tape Gateway - Gateway VTL - Virtual Tape Library

This allows you again to back up your data to S3 from your own corporate data center but also leverage Amazon Glacier for data archiving. Virtual Tape Library is essentially a cloud based tape backup solution replacing physical components with virtual ones.

VTL Components