Working familiarity with public cloud providers - Metaplay’s infrastructure stack is designed on top of AWS cloud services. Prior experience with operating public cloud resources will greatly help with getting up-to-speed with our stack!
Working familiarity with Docker and Kubernetes - While not strictly necessary, we assume you are familiar with Docker containers and Kubernetes clusters to get the most out of this article!
Metaplay SDK ships with a pre-configured cloud infrastructure stack. It is the same stack we use internally to host and manage your development environments, and allows you to customize and deploy as many environments as you need for your game team. Building on top of our infrastructure stack allows you to benefit from all the hand-tuning and best practices we’ve spent years perfecting, as well as making future SDK updates easy.
Metaplay SDK’s cloud infrastructure is run in the AWS cloud. We leverage many of their managed services to keep the operational complexity low. For example, the game servers and their associated services are deployed as containers on AWS’ managed Kubernetes platform, EKS.
Metaplay SDK’s cloud infrastructure follows a two-tier architecture that is highly available, self-healing, and secure. The core game services are complemented by a selection of specialized services, such as file storage, load balancers, content delivery, authentication, observability, backups and more.
Each part of the service stack is designed to be highly efficient and scalable. The load balancer, CDN, and file storage are already inherently scalable, managed AWS services. The Kubernetes and RDS clusters can be scaled both vertically (more powerful VMs) and horizontally (more VMs).
The Metaplay game server’s software architecture is distributed and scales extremely well with both more and faster cores. This gives you flexibility in picking the best suited (and cost optimized) instance types for your various deployments!
Effective daily operations require the right tools. We’ve built and tuned the full stack of industry standard “DevOps” tools that cover all the necessary use-cases from initial provisioning, configuration and deployments to the daily monitoring of 100+ million DAU games.
Automated builds - Bring your own! We recommend (and have pre-configured scripts for) GitHub Actions and TeamCity.
Metaplay CLI - A utility to easily execute the most common manual commands you need to interact with your live deployments, such as uploading a new game server build.
Backups - We support AWS Backup as a service for backing up and persisting the game databases and file storage buckets through snapshots, which allow you to revert back if errors occur or to even rebuild your entire stack, if needed.
Metrics - We facilitate automatic collection of a wide array of game server metrics, as well as metrics of other supporting systems such as the underlying infrastructure.
Logs - We collect and persist logs of game servers and other systems to allow for efficient debugging and tracing of issues across distributed systems.
Monitoring dashboard - We have created a curated set of monitoring dashboards, which give you an at-a-glance view into how your game servers and infrastructure are operating.
Alerts - We provide an extendable set of alerting rules that allow you to be alerted if the health of game servers or underlying infrastructure is reduced. You can be alerted on the most common IM and paging platforms, giving you flexibility in who, when and how should be contacted.
The Metaplay SDK’s application layer is run as containers within Kubernetes. All application data is transient and persisted outside of Kubernetes, so the application containers (or even the whole cluster) can be destroyed without losing data.
Game server resources are services directly related to the game server application.
Supporting services resources are services that help observe and manage the game server.
Cluster tooling contains the “necessary evil” services of running a Kubernetes cluster securely and efficiently.
We have extended the base Kubernetes resource types with Metaplay resource types and an operator, which allows for easier control of the life cycle of game payloads. All Kubernetes payloads can be split into three main categories: game server resources, supporting service resources and cluster tooling. The applications between these categories are segregated and can be managed in separate life cycles and cadences.
Our provisioning scripts also handle the provisioning of tightly controlled service accounts, secrets management and environment specific runtime configuration of the clusters’ services.
Game server resources are services directly related to the game server application.
The game server cluster has dedicated endpoints for the various sub-services.
Game server - Docker containerized game server builds running in Kubernetes pods. The specific configuration of different types of shards and their workloads can be fine-tuned for your game’s specific needs.
LiveOps Dashboard - Web-based tools for the game development, support and operations teams to manage to manage the game. Hosted by ASP.NET running on the game server cluster.
Ingress & endpoints
Game endpoint - Kubernetes load balancer service that provides an endpoint for game client communication.
Public HTTP endpoint - Support for setting up public HTTP endpoints as needed. A typical use-case is for webhook endpoints where other services can send calls to.
LiveOps Dashboard and the game server API - Protected private HTTP endpoint for providing administrators access to the LiveOps Dashboard, observability tools and for making calls to the game server APIs.
Cluster tooling contains the “necessary evil” services of running a Kubernetes cluster securely and efficiently. The services in this category generally require the least customization for Metaplay’s needs.
Networking and connectivity services
DNS - Managing external DNS resources on behalf of the cluster.
Reverse proxy - Handling of all incoming HTTP traffic and routing it securely within the cluster.
Authentication - Resolving the roles & permissions of connecting users.
Network security - Enforcing network policies between different systems within the Kubernetes cluster.
Autoscaler - Allows configuring the Kubernetes cluster to automatically scale compute nodes up and down depending on resource requests.
Spot termination handler - Allows for more safely running Kubernetes nodes as spot instances, which can yield up to 70% savings in compute costs.
Certificate manager - Allows in-cluster systems to request certificates for securing communication.
Something missing? The whole Metaplay cloud infrastructure stack is extendable by design, meaning that you can replace our “sensible defaults” with your own solutions for specific areas, or to add additional services onto the cluster easily!
The Metaplay SDK implements modern cloud security best practices:
Identity and access control - Users and their roles are handled through a combination of AWS IAM for infrastructure access management and intra-game server accounts and RBAC for application-level security. Additionally, external identity providers are supported through industry-standard OAuth2/OIDC protocols.
Transport-level security - Data is encrypted at transport using Transport Layer Security (TLS) between endpoints using AWS-managed certificates in AWS ACM.
Network-level security - Metaplay’s infrastructure is designed to run in private VPCs using security groups and a public/private subnet split to enhance security. In-cluster networking is secured with Calico and configured to restrict cross-namespace connectivity.
Data security - All game data is encrypted at rest in the AWS RDS and AWS S3 services. Additionally, access to these data sources is controlled via security groups and credentials or role-based access.
Backups - Backups are taken using AWS Backup. You can can customize the backup routines and destinations based on your project’s specific needs.
Monitoring and alerting - Metrics and logs are collected both in-cluster via Prometheus and Loki and on a cloud-level with CloudWatch’s metrics, logs, and AWS CloudTrail. Grafana is provided for holistic visibility across data sources. Customizable alerting can be managed with Prometheus’ Alertmanager.
No amount of best practices can replace the continuous monitoring of the online security landscape and frequent updates to keep all system up-to-date. We ship regular updates to our infrastructure and help you roll those out into your self-hosted environments!
We also understand that in some cases games may prefer to stick with older Metaplay versions; in these cases we can work with you to prevent possible security vulnerabilities by backporting critical security fixes as needed.