Skip to Content

How often do EC2 instances fail? (Explained)

How often do EC2 instances fail? (Explained)

AWS provides an SLA for compute resources of 99.99%.

That is a guarantee for the physical infrastructure.

Customers need to adhere to the AWS Shared responsibility model and best practices – whereby they architect their AWS EC2 instances and other AWS Resources across two or more availability zones within a single Region.

This provides Fault Tolerance and High Availability.

Architecting your AWS resources across two or more availability zones provides levels of fault tolerance and high availability by distributing them in geographically disperse locations.

Specifically, this can protect against natural disasters, power, and general outages in specific data centers.

Below is a well architected example of ensuring that an AWS Deployment can tolerate a single instance failure:

The figure above depicts the AWS Cloud, AWS Region (US-East-1), and a VPC. 

Within the VPC, there are two AWS EC2 Instances that are being load balanced across two availability zones.

The traffic into the VPC enters through the Internet Gateway, hits the application load balancer – which then distributes the load to the two AWS EC2 Instances in two distinct Availability Zones.

If either of the AWS EC2 Instances were to somehow fail – the load balancer would promptly distribute the traffic coming into the VPC directly to an available instance located in an availability zone.

This is a very basic architecture, but lays out the concept quite clearly. 


What happens when an EC2 host fails?

AWS Recommends as part of the shared responsibility model and Well Architected Framework that users take ownership of their AWS Environment by designing for failure. 

“everything fails, all the time” – Werner Vogels – Amazon CTO

To automate the process of provisioning an AWS EC2 instance – it is recommended to use an AWS auto scaling group in conjunction with the above architecture.

See below for a well architected example of a fault tolerant and highly available AWS Deployment:


Can An AWS Instance Fail? Do I need Backups?

You absolutely need to have backups implemented as part of your AWS Environment.

AWS Backups are not automated with AWS EC2 Instances.

Further, you would need to ensure that the AWS EC2 Instances are architected in accordance with your uptime requirements.

If you need a highly available architecture, it is best practice to begin architecting in that way by scaling the AWS EC2 Instances across multiple Availability Zones.

Based on the amount of uptime you are trying to achieve, it is best practice to have multiple instances distributed across these multiple availability zones.

Doing this will not only achieve high availability and fault tolerance – but will ensure the user experience for your applications is consistent. 

Furthermore, AWS EC2 Instances have ephemeral storage – meaning that once an AWS EC2 instance is deleted – all data stored on the instance is deleted and lost.

As per AWS’ best practices for back up and recovery regarding AWS EC2 Instances – it is recommended to regularly use EBS Snapshots in order to save the configuration as part of creating new AWS EC2 instances with the same profile.