Appearance
Appearance
The most important data related to your game deployment is your game database and your game configuration data. Although rebuilding and deploying your infrastructure stacks is easy to accomplish, reconstructing your data is much more challenging. Because of that, it’s recommended you implement some sort of backup system to allow you to roll back to previous states of your database in the advent of a critical failure that can cause you to lose data.
Here's a quick comparison between the different backup types available for databases:
Automated RDS snapshots | Manual RDS snapshots | Aurora Backtrack | AWS Backup | |
---|---|---|---|---|
Trigger | Daily | Manual | Continuous | Periodic/Continuous |
Retention period | <35 days | Indefinite | <72 hours | Indefinite |
Recovery granularity | Daily | Whenever snapshot was triggered | Up-to-second | Daily |
Use case | Standard daily backups with minimal need for configuration | Whenever you need a guaranteed, long-term snapshot for a given point of time (e.g., before a major update) | Easy rollback for near-term issues | Long-term, infrastructure stack agnostic storage of backups |
Support for cross-account snapshots | No | Yes (but you must manually move the snapshots) | No | Yes |
Gotchas | Snapshots are associated with a cluster; deleting or recreating the cluster will cause a loss of automated snapshots. | By default, an AWS account only supports 200 manual snapshots. Request quota increases if more are needed. | Enabling Backtrack incurs an extra cost, even if you do not use it. |
Backtracks are associated with clusters; destroying a cluster will remove also Backtrack history. | Cross-account use cases require extra care in configuration. |
Metaplay supports AWS RDS Aurora MySQL for your game databases and AWS S3 buckets for your game configs and other assets. Aurora supports both MySQL and PostgreSQL, but Metaplay uses the MySQL variant for infrastructure stacks.
DANGER
AWS Aurora Backtrack is currently only supported for Aurora MySQL 2. At present Aurora MySQL, 3 is the default database backend, meaning that we will have to wait for AWS to release Backtrack for version 3 prior to the examples below working.
The Metaplay SDK ships with built-in support for configuring backups. You can copy the Terraform configuration snippet below as a starting point for protecting your data in a production environment. The snippet sets up the environments/aws-region
infra-modules.
module "infra" {
source = "git@github.com:metaplay/infra-modules.git//environments/aws-region?ref=v0.1.8"
# other configurations...
# Enable production safeguards
production = true
# AWS Aurora Backtrack
db_backtrack_enabled = true
db_backtrack_window = 7 * 24 * 60 * 60 # 7 days in seconds
# AWS Backup
aws_backup_enabled = true
aws_backup_vault_name = aws_backup_vault.backups.name
aws_backup_schedule = "cron(0 0 * * ? *)" # optional
}
resource "aws_backup_vault" "backups" {
name = "metaplay-d1-backups"
}
The above sets up the following protections for your infrastructure stack:
production = true
configures sane default values for your infrastructure stack. These include deletion protection on essential assets (such as your database) so that you cannot accidentally delete them, even if you run Terraform destroy commands. Additionally, default AWS configurations, such as automated database snapshots, are configured to retain data for the maximum amount of time.db_backtrack_enabled = true
enables AWS Aurora Backtrack for your database. This functionality enables easy rollbacks of the database within the given db_backtrack_window
.aws_backup_enabled = true
configures the stack to use AWS Backup as a destination for persisting longer-term backups of databases and S3 buckets. You can also use these for cross-account backups (see the Cross-account backups with AWS Backup section later on this page).If you have AWS Aurora Backtrack enabled, rolling back the database to an earlier point in time is very straightforward and quick. The ease of this approach makes it the most convenient option in cases like recovering from issues that may have occurred while updating the game server.
Rolling back with Backtrack can be accomplished via the AWS Web Console or the AWS CLI. Before rolling back a database, it is important to place your game servers in maintenance mode and, if possible, remove the game server installation using helm delete
. Deleting the server installation will ensure that it will not attempt to interact with the database during the rollback.
For carrying out an AWS CLI rollback, we first need to obtain the database cluster identifiers we wish to backtrack. You can either look them up from the AWS Web Console or, if you have a functioning Terraform setup for your stack, you can use Terraform and jq
to extract your cluster identifiers:
$ terraform state pull | jq -r '.resources[] | select(.type == "aws_rds_cluster") | .instances[].attributes.cluster_identifier'
metaplay-d1-eu-west-1-rds-0
metaplay-d1-eu-west-1-rds-1
With the cluster identifiers available, you can initiate the rollbacks:
$ aws rds backtrack-db-cluster \
--db-cluster-identifier metaplay-d1-eu-west-1-rds-0 \
--backtrack-to 2022-04-05T09:00:00+00:00
{
"DBClusterIdentifier": "metaplay-d1-eu-west-1-rds-0",
"BacktrackIdentifier": "ece93696-3248-43a3-8a2e-70e327d70020",
"BacktrackTo": "2022-04-05T09:00:00+00:00",
"BacktrackRequestCreationTime": "2022-04-05T10:19:25.838000+00:00",
"Status": "PENDING"
}
You can, in turn, observe the status of the Backtracks using the describe-db-cluster-backtracks
command:
$ aws rds describe-db-cluster-backtracks \
--db-cluster-identifier metaplay-d1-eu-west-1-rds-0
{
"DBClusterBacktracks": [
{
"DBClusterIdentifier": "metaplay-d1-eu-west-1-rds-0",
"BacktrackIdentifier": "ece93696-3248-43a3-8a2e-70e327d70020",
"BacktrackTo": "2022-04-05T08:59:59+00:00",
"BacktrackedFrom": "2022-04-05T10:19:25.964000+00:00",
"BacktrackRequestCreationTime": "2022-04-05T10:19:25.838000+00:00",
"Status": "COMPLETED"
}
]
}
A slightly beefier option compared to AWS Aurora Backtrack comes in the form of AWS Backup. AWS Backup is also used for persisting the S3 buckets of game servers.
Recovering a database from AWS Backup requires the recreation of an entire database and is thus a much bigger operation than rolling back using Backtrack. The process for reverting to a backup snapshot is as follows:
Using the AWS CLI and Terraform, we can run through the following steps. First, we can identify the AWS Backup Vault and snapshot details that we have:
$ aws backup list-backup-vaults
{
"BackupVaultList": [
{
"BackupVaultName": "metaplay-d1-backups",
"BackupVaultArn": "arn:aws:backup:eu-west-1:000011112222:backup-vault:metaplay-d1-backups",
"CreationDate": "2022-02-18T11:37:07.445000+02:00",
"EncryptionKeyArn": "arn:aws:kms:eu-west-1:000011112222:key/01234567-0123-0123-0123-0123456789ABC",
"NumberOfRecoveryPoints": 4,
"Locked": false
}
]
}
$ aws backup list-recovery-points-by-backup-vault \
--backup-vault-name metaplay-d1-backups \
--by-resource-type Aurora \
--by-created-after 2022-04-05T00:00:00+00:00
{
"RecoveryPoints": [
{
"RecoveryPointArn": "arn:aws:rds:eu-west-1:000011112222:cluster-snapshot:awsbackup:job-6e18c935-11a0-c63c-d96b-9b35b7c3504e",
"BackupVaultName": "metaplay-d1-backups",
"BackupVaultArn": "arn:aws:backup:eu-west-1:000011112222:backup-vault:metaplay-d1-backups",
"ResourceArn": "arn:aws:rds:eu-west-1:000011112222:cluster:metaplay-d1-eu-west-1-rds-0",
"ResourceType": "Aurora",
"CreatedBy": {
"BackupPlanId": "dfef0d3e-1abb-47d5-a71b-6c9a2c4824b2",
"BackupPlanArn": "arn:aws:backup:eu-west-1:000011112222:backup-plan:dfef0d3e-1abb-47d5-a71b-6c9a2c4824b2",
"BackupPlanVersion": "ZDMxNmNlMTMtODQwNy00OTkxLTg4NTktYmQ5MWJiZDYzMWU1",
"BackupRuleId": "bb8ce8ce-0b24-49f8-ae26-5c0928775919"
},
"IamRoleArn": "arn:aws:iam::000011112222:role/metaplay-d1-aws-backup",
"Status": "COMPLETED",
"CreationDate": "2022-04-05T03:00:00+03:00",
"CompletionDate": "2022-04-05T03:23:52.592000+03:00",
"BackupSizeInBytes": 0,
"EncryptionKeyArn": "arn:aws:kms:eu-west-1:000011112222:key/48fa73e9-c248-4807-9047-050886fc9c38",
"IsEncrypted": true
},
{
"RecoveryPointArn": "arn:aws:rds:eu-west-1:000011112222:cluster-snapshot:awsbackup:job-4b348307-c146-1c92-e9b9-05b1e79e9b06",
"BackupVaultName": "metaplay-d1-backups",
"BackupVaultArn": "arn:aws:backup:eu-west-1:000011112222:backup-vault:metaplay-d1-backups",
"ResourceArn": "arn:aws:rds:eu-west-1:000011112222:cluster:metaplay-d1-eu-west-1-rds-1",
"ResourceType": "Aurora",
"CreatedBy": {
"BackupPlanId": "dfef0d3e-1abb-47d5-a71b-6c9a2c4824b2",
"BackupPlanArn": "arn:aws:backup:eu-west-1:000011112222:backup-plan:dfef0d3e-1abb-47d5-a71b-6c9a2c4824b2",
"BackupPlanVersion": "ZDMxNmNlMTMtODQwNy00OTkxLTg4NTktYmQ5MWJiZDYzMWU1",
"BackupRuleId": "bb8ce8ce-0b24-49f8-ae26-5c0928775919"
},
"IamRoleArn": "arn:aws:iam::000011112222:role/metaplay-d1-aws-backup",
"Status": "COMPLETED",
"CreationDate": "2022-04-05T03:00:00+03:00",
"CompletionDate": "2022-04-05T03:26:39.869000+03:00",
"BackupSizeInBytes": 0,
"EncryptionKeyArn": "arn:aws:kms:eu-west-1:000011112222:key/48fa73e9-c248-4807-9047-050886fc9c38",
"IsEncrypted": true
}
]
}
Using the above commands, we have identified two RDS cluster snapshots by the RecoveryPointArn
values. We can confirm from the ResourceArn
that these correspond to the two database shards that our stack has running and that the creation timestamps align, giving us snapshots that were initiated simultaneously.
The Terraform module managing the infrastructure stack can be updated using the db_shard_initial_snapshot_identifiers
parameter to seed these two snapshots:
module "infra" {
source = "git@github.com:metaplay/infra-modules.git//environments/aws-region?ref=v0.1.9"
# ... snip
db_shard_count = 2
db_shard_initial_snapshot_identifiers = [
"arn:aws:rds:eu-west-1:000011112222:cluster-snapshot:awsbackup:job-6e18c935-11a0-c63c-d96b-9b35b7c3504e",
"arn:aws:rds:eu-west-1:000011112222:cluster-snapshot:awsbackup:job-4b348307-c146-1c92-e9b9-05b1e79e9b06",
]
}
After seeding the snapshot ARNs, we can then taint the RDS clusters using Terraform to force new clusters to be created from the snapshots and then execute a targeted apply agains the database resources to actually carry out the change:
$ terraform taint 'module.infra.module.database[0].module.aurora.aws_rds_cluster.this[0]'
Resource instance module.infra.module.database[0].module.aurora.aws_rds_cluster.this[0] has been marked as tainted.
$ terraform taint 'module.infra.module.database[1].module.aurora.aws_rds_cluster.this[0]'
Resource instance module.infra.module.database[1].module.aurora.aws_rds_cluster.this[0] has been marked as tainted.
$ terraform apply -target=module.infra.module.database
... regular Terraform cycle recreating database resources...
$ terraform apply
... untuargeted Terraform cycle to ensure all resources are brought up to date after database recovery...
If you are running a multi-AWS account setup, a best practice would be to separate the storage of long-term backups to a separate account from the account where your infrastructure stack is running. The benefits of this type of design are that it creates a restricted blast radius by protecting your backups even in the event that something happens within the account where the infrastructure stack exists.
The process for recovering database shards from RDS snapshots follows the same process as outlined above for recovering from AWS Backup snapshots. The only difference, in this case, would be that the ARNs for the database snapshots are different as they wouldn’t have been created by AWS Backup but instead by any other mechanisms (e.g., manually triggered snapshots).