Database Infrastructure Operations

This chapter introduces you to managing databases for your infrastructure as well as the basic operations that are common for interacting with databases.

Key Concepts

Database is a logical construct that consists of underlying infrastructure and which is utilized by the game server to persist the state. A database consists of one or more database shards.
Database shards from the game server's perspective are functional MySQL endpoints, which provide a horizontal partition of data and are implemented typically as a MySQL cluster, which can contain one or more database servers configured to create a database cluster. In practice a database shard equates to a database cluster.
Database cluster is an implementation of an AWS RDS cluster setup.
Database snapshot is a logical collection of database shard snapshots.
Database shard snapshot is an actual AWS RDS cluster snapshot, which persists the state and contents of a database cluster at a given point of time.

Introduction

This chapter covers the most common operations related to interacting with databases and related infrastructure. It is assumed that you are familiar with Deploying Infrastructure and it is beneficial to have a development infrastructure stack available to you.

Configuring the basic database

As described in the Deploying Infrastructure, the aws-region Terraform environment module encapsulates also the logic for creating RDS database infrastructure on the AWS cloud platform.

Using the default settings, you are provided with:

A single RDS database cluster with a single read-write database instance
Network configurations to enable connectivity from the Kubernetes cluster

This is enough for a development environment and to get going, but when running production workloads, it is advisable to improve the setup a bit. Giving the aws-region module a parameter named production with a value of true will enhance the setup with the following changes:

The RDS database cluster will be provisioned an additional read-only database instance (this reduces the amount of time for the automated failover if the primary RDS instance fails, as well as enhances performance by allowing game servers to offload more read-intensive queries to read-only replicas).
RDS database cluster nightly snapshots are enabled and persisted for 35 days.
Delete protection is enabled for the RDS database cluster to protect the infrastructure from accidental database deletions.
The RDS database cluster is configured to perform a final database snapshot on deletion.

As such, it is advisable to always run production environments with the production switch mentioned above. The minor increases in duplicated instance costs and snapshot storage costs are greatly offset by the increased safety against any hiccups that may occur.

RDS database instance sizing

The aws-region module allows you to configure the specific AWS RDS instance types that you wish to use for your RDS database cluster instances using the db_instance_type parameter.

bash

module "infra" {
  # snip...

  db_instance_type = "db.t5.large"
}

Rightsizing the instances is often a mix of art and science, but in practice one of the best ways to build confidence in the capacity planning is to run load tests using estimated player volumes using the bot clients (see Load Testing for more information).

As a rule of thumb, it is often advisable to not run database servers at more than 50-70% utilization at most, so prepare accordingly. One thing to note, however, is that the way how the Metaplay game servers are designed allows the servers to reduce database actions by persisting player states on the game servers themselves, which effectively means that the design allows for a lower ratio of database server capacity to game server node capacity, which means that you should not shy away from using instance sizes which are sufficiently large.

Database backups to AWS Backup

INFO

AWS Backups require infra-modules version v0.1.4 or later.

The AWS RDS automated snapshots are retained for at most 35 days. This period of time is often adequate for day-to-day operations, but for longer term backing up of data we can leverage AWS Backup. aws-region offers configuration options to enable this. By default if enabled, the module will also create an AWS Backup vault for you in the same AWS account and set up daily backups with an infinite retention period. These can be enabled as follows:

bash

module "infra" {
  # snip...

  aws_backup_enabled = true
}

You can optionally also use an existing AWS Backup vault by using the aws_backup_vault_name parameter and providing a valid vault name to use. The backup schedule can be adjusted using the aws_backup_schedule parameter. The schedule is defined in the same format as AWS CloudWatch Cron expressions; further information is available on the Schedule Expressions for Rules page.

bash

resource "aws_backup_vault" "my-vault" {
  name        = "my-backup-vault"
  kms_key_arn = aws_kms_key.my-backup-key.arn
}

resource "aws_kms_key" "my-backup-key" {
  description = "My custom KMS key for backups"
}

module "infra" {
  # snip...

  aws_backup_enabled    = true
  aws_backup_vault_name = aws_backup_vault.my-vault.name
  aws_backup_schedule   = "cron(0 0 * * * *)" # run backups every midnight
}

Database backups to S3

Alongside the regular RDS cluster snapshots that are taken, we also provide additional tooling for you to take periodic backups of your database contents into an S3 bucket. These are controlled with the db_backups_enabled parameter and can be scheduled with finer granularity using the db_backups_schedule parameter (which follows a standard cron format).

bash

module "infra" {
  db_backups_enabled  = true
  db_backups_schedule = "0 0 * * *"
}

The underlying routine will snapshot and export your database to an S3 bucket as Parquet files. Unlike conventional RDS snapshots, which need to be instantiated into a running RDS cluster to be operated against, Parquet files in S3 are easier to interact with, which make them more suitable if you just wish to quickly dig into an old version of a table without the overhead of setting up new infrastructure.

By default, we do not configure any retention periods for S3-based backups. If you enable this feature, you should keep tabs on your backup bucket and perform periodic data pruning to keep costs in check.

Database sharding

INFO

Database sharding is an advanced topic and the risks of getting yourself into trouble with your infrastructure increase compared to basic setups. If in doubt, test in your development environments first or ask for help!

By default, our setup is configured to provision a single database shard. For development environments and smaller games, this is often adequate enough, and by sizing the database instances appropriately, even relatively large games can be run with a single shard.

Games that become very large or have large quantities of records, can put extra strain on the single-shard database cluster. These can manifest as slow table scans or schema migrations. To address these types of issues, we offer the possibility to set up sharded databases. To enable sharding on the infrastructure level, you can define the number of shards required using the db_shard_count parameter.

bash

module "infra" {
  db_shard_count = 2
}

Increasing the db_shard_count will create new RDS clusters in parallel and provide the game servers with a list of the clusters and their endpoints as shards to operate against.

Sharding up

To move to a higher number of shards, logically the process is as follows:

Turn off all game servers on the infrastructure at minimum by setting on the maintenance mode and ideally by removing the game server deployments altogether.
Snapshot the database shards individually and record the identifiers for the snapshots.
Increase the shard count using the db_shard_count parameter and use the db_shard_initial_snapshot_identifiers list to provide the snapshot identifiers to seed all the database shards from.
Taint and recreate the database shards using Terraform to ensure that all databases are reloaded appropriately from the snapshots.
Re-deploy the game servers and allow the game servers to discover the database shards and carry out a process of de-duplicating the data to finish the sharding.

Overall the process of sharding up can take a not-insignificant amount of time as it involves creating new infrastructure and logically removing duplicate data.

To execute the upsharding, we can take first the snapshots:

bash

$ DATABASE_NAME_PREFIX="metaplay-d1-eu-west-1-rds"
$ DATE=$(date %Y%m%d) # e.g. 20201020 for our example
$ CURRENT_SHARDS=2
$ for i in {0...${CURRENT_SHARDS-1}}; do
    aws rds create-db-cluster-snapshot \
      --db-cluster-identifier ${DATABASE_NAME_PREFIX}-${i} \
      --db-cluster-snapshot-identifier ${DATE}-${DATABASE_NAME_PREFIX}-${i} &
done
$ for i in {0...${CURRENT_SHARDS-1}}; do
    aws rds wait db-cluster-snapshot-available \
      --db-cluster-snapshot-identifier ${DATE}-${DATABASE_NAME_PREFIX}-${i}
done

Once all snapshots are taken and available, we can update our Terraform code and tell Terraform to utilize our new snapshots as we double our shard count:

bash

module "infra" {
  db_shard_count = 4 # up from earlier 2

  # we can define the two snapshots from above; they will be striped so that the
  # first snapshot will be used for shards 0 and 2 while the second snapshot will
  # be used for shards 1 and 3
  db_shard_initial_snapshot_identifiers = [
    "arn:aws:rds:eu-west-1:000011112222:cluster-snapshot:20201020-metaplay-d1-eu-west-1-rds-0",
    "arn:aws:rds:eu-west-1:000011112222:cluster-snapshot:20201020-metaplay-d1-eu-west-1-rds-1",
  ]

  # alternatively we could specify explicitly all the four snapshots:
  db_shard_initial_snapshot_identifiers = [
    "arn:aws:rds:eu-west-1:000011112222:cluster-snapshot:20201020-metaplay-d1-eu-west-1-rds-0",
    "arn:aws:rds:eu-west-1:000011112222:cluster-snapshot:20201020-metaplay-d1-eu-west-1-rds-1",
    "arn:aws:rds:eu-west-1:000011112222:cluster-snapshot:20201020-metaplay-d1-eu-west-1-rds-0",
    "arn:aws:rds:eu-west-1:000011112222:cluster-snapshot:20201020-metaplay-d1-eu-west-1-rds-1",
  ]

  # if we wanted to skip the tainting and recreating of the existing shards, we
  # could also just explicitly tell to skip the snapshots for the first two shards:
  db_shard_initial_snapshot_identifiers = [
    null,
    null,
    "arn:aws:rds:eu-west-1:000011112222:cluster-snapshot:20201020-metaplay-d1-eu-west-1-rds-0",
    "arn:aws:rds:eu-west-1:000011112222:cluster-snapshot:20201020-metaplay-d1-eu-west-1-rds-1",
  ]
}

In our case, let's go with the first option as it's pretty straightforward. If we wanted to ensure that we refresh the entire database setup, we could run targeted taint and apply Terraform loops for the new databases:

bash

$ for i in {0...${CURRENT_SHARDS-1}}; do
    terraform taint module.infra.module.database[${i}].module.aurora.aws_rds_cluster_instance.this
    terraform taint module.infra.module.database[${i}].module.aurora.aws_rds_cluster.this
    terraform apply \
      -auto-approve \
      -target=module.infra.module.database[${i}].module.aurora.aws_rds_cluster_instance.this \
      -target=module.infra.module.databsae[${i}].module.aurora.aws_rds_cluster.this
done
$ terraform apply

The above will first mark the existing shard nodes as tainted, which will tell Terraform that the nodes should be fully recreated. Then a targeted Terraform apply will allow us to provision those modules from scratch. Finally, a broader Terraform apply should bring the new shards up as well. This broader apply will also update the Kubernetes secrets for the game servers, which store the details of the database shards to use.

Finally, re-deploying the game servers should allow the game servers to run through the new shards and reconcile the overall database to be ready to use the additional shards.

Sharding down

INFO

This section lacks proper documentation. Please talk to us if you are planning to shard down.

Logically sharding down is the opposite of sharding up. We are presently working on functionality to allow game servers to compact data logically down from a larger number of shards. Using this functionality, the game servers would initially do the heavy lifting and this would then allow you to update the Terraform db_shard_count and carry out targeted Terraform destroys to remove the unneeded shards.

Migrating from infra-modules v0.0.x to v0.1.0

Prior to infra-modules v0.1.0, the infrastructure was deployed as purely a single RDS cluster setup. v0.1.0 introduces the sharded database infrastructure. For infrastructure deployments that are still in an earlier version, the migration will require some additional steps to switch to the new infrastructure setup. Logically the process is:

Snapshot the RDS database
Update the Terraform configurations to use newer infra-modules version and set db_shard_initial_snapshot_identifiers with the snapshot ARN from step 1
Manually disassociate the RDS access security group from the Kubernetes cluster EC2 network interfaces. The security group must be recreated, but cannot be done if it is still associated with ENIs and in use.
Carry out a targeted Terraform apply to recreate the database using the snapshot with terraform apply -target=module.infra.module.database. This will destroy the old database and recreate a new single-shard database.
Carry out a broad Terraform apply with terraform apply to bring the infrastructure up to date with the new database configurations.

Database backup operations

To recover to an earlier snapshot of a database, the process of reloading a database snapshot is very similar to the case of sharding up, with the exception that the shard count remains intact and the db_shard_initial_snapshot_identifiers list is updated with the specific snapshots to reload. This is then followed by the same Terraform taint and apply cycle as above, which will recreate the databases from the specified snapshots.

Getting Started

Wordle Tutorial

SDK Updates

Release Notes

Release 30

Release 29

Release 27

Release 25

Open Source Software Licenses

Database Infrastructure Operations

Key Concepts

Introduction

Configuring the basic database

RDS database instance sizing

Database backups to AWS Backup

Database backups to S3

Database sharding

Sharding up

Sharding down

Migrating from infra-modules v0.0.x to v0.1.0

Database backup operations

Release Notes

Release 30

Release 29

Release 27

Release 25

Database Infrastructure Operations

Key Concepts ​

Introduction ​

Configuring the basic database ​

RDS database instance sizing ​

Database backups to AWS Backup ​

Database backups to S3 ​

Database sharding ​

Sharding up ​

Sharding down ​

Migrating from infra-modules v0.0.x to v0.1.0 ​

Database backup operations ​

Key Concepts

Introduction

Configuring the basic database

RDS database instance sizing

Database backups to AWS Backup

Database backups to S3

Database sharding

Sharding up

Sharding down

Migrating from infra-modules v0.0.x to v0.1.0

Database backup operations