Appearance
Appearance
Target Audience
This page is primarily intended for users on Private Cloud plans. If you are using Metaplay Cloud with a Pre-launch/Production plan, information from this page may not be directly relevant to your needs.
This chapter covers the most common operations related to interacting with databases and related infrastructure. It is assumed that you are familiar with Deploying Infrastructure, and it is beneficial to have a development infrastructure stack available to you.
As described in the Deploying Infrastructure, the aws-region
Terraform environment module encapsulates the logic for creating RDS database infrastructure on the AWS cloud platform.
Using the default settings, you are provided with:
This is enough for a development environment and to get going, but when running production workloads, it is advisable to improve the setup a bit. Giving the aws-region
module a parameter named production
with a value of true
will enhance the setup with the following changes:
As such, it is advisable to always run production environments with the production
switch mentioned above. The minor increases in duplicated instance costs and snapshot storage costs are greatly offset by the increased safety against any hiccups that may occur.
The aws-region
module allows you to configure the specific AWS RDS instance types that you wish to use for your RDS database cluster instances using the db_instance_type
parameter.
module "infra" {
# snip...
db_instance_type = "db.t5.large"
}
Rightsizing the instances is often a mix of art and science, but in practice, one of the best ways to build confidence in capacity planning is to run load tests using estimated player volumes with the bot clients (see Load Testing for more information).
As a rule of thumb, it is often advisable to not run database servers at more than 50-70% utilization at most, so prepare accordingly. One thing to note, however, is that the way the Metaplay game servers are designed allows the servers to reduce database actions by persisting player states on the game servers themselves, which effectively means that the design allows for a lower ratio of database server capacity to game server node capacity, which means that you should not shy away from using instance sizes that are sufficiently large.
INFO
AWS Backups require infra-modules
version v0.1.4
or later.
The AWS RDS automated snapshots are retained for at most 35 days. This period of time is often adequate for day-to-day operations, but for longer-term backing up of data, we can leverage AWS Backup. aws-region
offers configuration options to enable this. By default, if enabled, the module will also create an AWS Backup vault for you in the same AWS account and set up daily backups with an infinite retention period. These can be enabled as follows:
module "infra" {
# snip...
aws_backup_enabled = true
}
You can optionally also use an existing AWS Backup vault by using the aws_backup_vault_name
parameter and providing a valid vault name to use. The backup schedule can be adjusted using the aws_backup_schedule
parameter. The schedule is defined in the same format as AWS CloudWatch Cron expressions; further information is available on the Schedule Expressions for Rules page.
resource "aws_backup_vault" "my-vault" {
name = "my-backup-vault"
kms_key_arn = aws_kms_key.my-backup-key.arn
}
resource "aws_kms_key" "my-backup-key" {
description = "My custom KMS key for backups"
}
module "infra" {
# snip...
aws_backup_enabled = true
aws_backup_vault_name = aws_backup_vault.my-vault.name
aws_backup_schedule = "cron(0 0 * * * *)" # run backups every midnight
}
Alongside the regular RDS cluster snapshots that are taken, we also provide additional tooling for you to take periodic backups of your database contents into an S3 bucket. These are controlled with the db_backups_enabled
parameter and can be scheduled with finer granularity using the db_backups_schedule
parameter (which follows a standard cron format).
module "infra" {
db_backups_enabled = true
db_backups_schedule = "0 0 * * *"
}
The underlying routine will snapshot and export your database to an S3 bucket as Parquet files. Unlike conventional RDS snapshots, which need to be instantiated into a running RDS cluster to be operated against, Parquet files in S3 are easier to interact with, which makes them more suitable if you just wish to quickly dig into an old version of a table without the overhead of setting up new infrastructure.
By default, we do not configure any retention periods for S3-based backups. If you enable this feature, you should keep tabs on your backup bucket and perform periodic data pruning to keep costs in check.
INFO
Database sharding is an advanced topic, and the risks of getting yourself into trouble with your infrastructure increase compared to basic setups. If in doubt, test in your development environments first or ask for help!
By default, our setup is configured to provision a single database shard. For development environments and smaller games, this is often adequate enough, and by sizing the database instances appropriately, even relatively large games can be run with a single shard.
Games that become very large or have large quantities of records can put extra strain on the single-shard database cluster. These can manifest as slow table scans or schema migrations. To address these types of issues, we offer the possibility to set up sharded databases. To enable sharding on the infrastructure level, you can define the number of shards required using the db_shard_count
parameter.
module "infra" {
db_shard_count = 2
}
Increasing the db_shard_count
will create new RDS clusters in parallel and provide the game servers with a list of the clusters and their endpoints as shards to operate against.
To move to a higher number of shards, logically the process is as follows:
db_shard_count
parameter and use the db_shard_initial_snapshot_identifiers
list to provide the snapshot identifiers to seed all the database shards from.Overall, the process of sharding up can take a not-insignificant amount of time as it involves creating new infrastructure and logically removing duplicate data.
To execute the upsharding, we can take first the snapshots:
$ DATABASE_NAME_PREFIX="metaplay-d1-eu-west-1-rds"
$ DATE=$(date +%Y%m%d) # e.g. 20201020 for our example
$ CURRENT_SHARDS=2
$ for i in {0...${CURRENT_SHARDS-1}}; do
aws rds create-db-cluster-snapshot \
--db-cluster-identifier ${DATABASE_NAME_PREFIX}-${i} \
--db-cluster-snapshot-identifier ${DATE}-${DATABASE_NAME_PREFIX}-${i} &
done
$ for i in {0...${CURRENT_SHARDS-1}}; do
aws rds wait db-cluster-snapshot-available \
--db-cluster-snapshot-identifier ${DATE}-${DATABASE_NAME_PREFIX}-${i}
done
Once all snapshots are taken and available, we can update our Terraform code and tell Terraform to utilize our new snapshots as we double our shard count:
module "infra" {
db_shard_count = 4 # up from earlier 2
# we can define the two snapshots from above; they will be striped so that the
# first snapshot will be used for shards 0 and 2 while the second snapshot will
# be used for shards 1 and 3
db_shard_initial_snapshot_identifiers = [
"arn:aws:rds:eu-west-1:000011112222:cluster-snapshot:20201020-metaplay-d1-eu-west-1-rds-0",
"arn:aws:rds:eu-west-1:000011112222:cluster-snapshot:20201020-metaplay-d1-eu-west-1-rds-1",
]
# alternatively we could specify explicitly all the four snapshots:
db_shard_initial_snapshot_identifiers = [
"arn:aws:rds:eu-west-1:000011112222:cluster-snapshot:20201020-metaplay-d1-eu-west-1-rds-0",
"arn:aws:rds:eu-west-1:000011112222:cluster-snapshot:20201020-metaplay-d1-eu-west-1-rds-1",
"arn:aws:rds:eu-west-1:000011112222:cluster-snapshot:20201020-metaplay-d1-eu-west-1-rds-0",
"arn:aws:rds:eu-west-1:000011112222:cluster-snapshot:20201020-metaplay-d1-eu-west-1-rds-1",
]
# if we wanted to skip the tainting and recreating of the existing shards, we
# could also just explicitly tell to skip the snapshots for the first two shards:
db_shard_initial_snapshot_identifiers = [
null,
null,
"arn:aws:rds:eu-west-1:000011112222:cluster-snapshot:20201020-metaplay-d1-eu-west-1-rds-0",
"arn:aws:rds:eu-west-1:000011112222:cluster-snapshot:20201020-metaplay-d1-eu-west-1-rds-1",
]
}
In our case, let's go with the first option as it's pretty straightforward. If we wanted to ensure that we refresh the entire database setup, we could run targeted taint and apply Terraform loops for the new databases:
$ for i in {0...${CURRENT_SHARDS-1}}; do
terraform taint module.infra.module.database[${i}].module.aurora.aws_rds_cluster_instance.this
terraform taint module.infra.module.database[${i}].module.aurora.aws_rds_cluster.this
terraform apply \
-auto-approve \
-target=module.infra.module.database[${i}].module.aurora.aws_rds_cluster_instance.this \
-target=module.infra.module.database[${i}].module.aurora.aws_rds_cluster.this
done
$ terraform apply
The above will first mark the existing shard nodes as tainted, which will tell Terraform that the nodes should be fully recreated. Then a targeted Terraform apply will allow us to provision those modules from scratch. Finally, a broader Terraform apply should bring the new shards up as well. This broader apply will also update the Kubernetes secrets for the game servers, which store the details of the database shards to use.
Finally, re-deploying the game servers should allow the game servers to run through the new shards and reconcile the overall database to be ready to use the additional shards.
INFO
This section lacks proper documentation. Please talk to us if you are planning to shard down.
Logically, sharding down is the opposite of sharding up. We are presently working on functionality to allow game servers to compact data logically down from a larger number of shards. Using this functionality, the game servers would initially do the heavy lifting, and this would then allow you to update the Terraform db_shard_count
and carry out targeted Terraform destroys to remove the unneeded shards.
Prior to infra-modules
v0.1.0, the infrastructure was deployed as purely a single RDS cluster setup. v0.1.0 introduces the sharded database infrastructure. For infrastructure deployments that are still in an earlier version, the migration will require some additional steps to switch to the new infrastructure setup. Logically, the process is:
db_shard_initial_snapshot_identifiers
with the snapshot ARN from step 1.terraform apply -target=module.infra.module.database
. This will destroy the old database and recreate a new single-shard database.terraform apply
to bring the infrastructure up to date with the new database configurations.To recover to an earlier snapshot of a database, the process of reloading a database snapshot is very similar to the case of sharding up, with the exception that the shard count remains intact and the db_shard_initial_snapshot_identifiers
list is updated with the specific snapshots to reload. This is then followed by the same Terraform taint and apply cycle as above, which will recreate the databases from the specified snapshots.