AWS for Games Blog
Level-Up Player Retention with No-Code Machine Learning Using Amazon SageMaker Canvas

Free-to-play (F2P) games have become a dominant force and model in the gaming industry. Their revenue relies primarily on in-game ads and transactions driven by a large, engaged, and consistent player base. Besides requiring robust infrastructure and backend systems to handle the scale of incoming players, F2P games also require a continual funnel of new players, along with engaging gameplay and experiences that entice players to return regularly.
And in today’s world, having excellent gameplay and experiences is no longer enough.
Importance of player retention
To get players coming back for more, successful games go beyond pure analytics. They deploy live operations that surface real-time data and make accurate predictions on players’ actions. This can be the difference between failure and success. Knowing whether a player is on the verge of leaving the game before they do so provides the opportunity to take action ahead of the player churning. The key is having that knowledge ahead of the event. Traditionally, a machine learning practice that accurately predicts player churn required a team of data scientists with a supporting team to manage the infrastructure to run those ML predictions—all of which takes cycles away from building and improving the actual game.
Enter: SageMaker Canvas! Fight!
Luckily, there’s a way you can simplify and do it all. You can get an accurate prediction on customer churn without having to hire an entire data science team or take precious time away from your game developers. In this post, we’ll show you how to create a customer churn ML model with Amazon SageMaker Canvas; no code required. SageMaker Canvas provides customers with a visual point-and-click interface that allows you to build models and generate accurate ML predictions on your own—without requiring any ML experience or having to write a single line of code.
Let’s dive in.
Prerequisites
To complete the prerequisites, please follow these steps from this Amazon SageMaker Canvas Workshop. These steps will help you quickly onboard a SageMaker domain, create a user profile, and then launch your SageMaker Canvas application. Make sure to open Canvas using the sagemakeruser profile that you create. You only need to complete the prerequisites steps from this workshop before moving on.
Exploring churn factors within the data
The starting point of any machine learning journey is a dataset. The goal of a churn prediction model is to build a relevant dataset from game events emitted by players while playing the game. At its core, a game’s churn rate is the number of players that leave the game in a given time frame.
We’re using a publicly available player behavior dataset generator, which you can try on your own for experiments. To synthesize the events dataset, we simulated a game feature launch campaign lasting 90 days, acquiring 150 players daily. In the simulation, players emit two types of events: “begin_session” and “end_session”. From these events, one can deduce session duration, frequencies, and more. Based on this data, we can categorize players into three profiles: hardcore, casual, and churner, which will help get us closer to the data needed for our ML model.
Game events form the foundation for predicting player churn. From our game events dataset, we need to extract more specific data (also known as “features”) that might be most useful to a machine learning algorithm. These player features will be used to build our model, tailoring them to suit the machine learning algorithm of choice.
Below are the player features we will use to build our model and predictions:
| Event Field Name Range | Description | Total Fields | 
| Begin Session Count Last Day 1 to Last Day 7 | Count of ‘begin_session’ events 1 to 7 days before the player’s last event | 7 | 
| Begin Session Time of Day Mean Last Day 1 to Last Day 7 | Mean event timestamp 1 to 7 days before the player’s last event | 7 | 
| Begin Session Time of Day Standard Last Day 1 to Last Day 7 | Standard deviation of event timestamp 1 to 7 days before the player’s last event | 7 | 
| Cohort Day of Week | The day of the week the player was acquired | 1 | 
| End Session Time of Day Mean Last Day 1 to Last Day 7 | Mean event timestamp 1 to 7 days before the player’s last event for ‘end_session’ events | 7 | 
| End Session Time of Day Standard Last Day 1 to Last Day 7 | Standard deviation of event timestamp 1 to 7 days before the player’s last event for ‘end_session’ events | 7 | 
| Player Churn | Indicates if a player has sent an event for the last 5 days | 1 | 
| Player ID | player unique identifier | 1 | 
| Player Type | player type hardcore, casual, churner | 1 | 
| Player Lifetime | The time difference between the last and the first event timestamp sent by a player | 1 | 
| Session Count | Total Number of Sessions Launched by a Player | 1 | 
The pivotal attribute, Player Churn, is what our ML model aims to predict. When building a model, only certain event fields of the dataset are needed. In this example, we have done the feature engineering work for you. This presents a balanced dataset which is crucial when building machine learning models because it helps ensure the model works well and prevents bias. For more info on feature engineering, you can view this AWS for Games Workshop, under the “Prepare Data” section.
Import the dataset into SageMaker Canvas
1. Download the game-features-balanced dataset to build the model. You will also need to download the unbalanced game-features dataset to generate predictions once we have trained the model.
2. Go back to the SageMaker Canvas tab created in the Prerequisites section.
3. On the left menu, select the second icon to head to the Datasets section, then select Create, then choose Tabular from the dropdown.

4. Name the file game-features-balanced.
5. Then, select Local Upload option and browse to the game-features-balanced.csv file which we downloaded previously.
6. Then, select Create dataset.

Repeat this process for the game-features.csv file as well. After that, the datasets tab should look like the following:

Building and training the ML model
You will now build the machine learning churn prediction model using the prepared dataset extracted from the game events.
1. Confirming the dataset is in the Ready state, select the checkbox next to the game-features-balanced dataset.
2. Select Create a model.

3. Enter a name such as Players Churn – Day 7.
4. Select Create.

5. Wait for the dataset to be imported. Once imported, choose player_churn in the target column. As mentioned, this is the pivotal attribute our ML model aims to predict once it is trained on our first dataset.
6. Uncheck all columns selected in game-features-balanced. You may click on Column name twice to sort in ascending order.

7. Now that the columns are sorted, you can select the needed columns with ease.
8. Select the column names listed below. Note: you can hover the cursor over a column to see its full name.
Begin Session Count Last Day 1 to Last Day 7 (7 Columns)
 Begin Session Time of Day Mean Last Day 1 to Last Day 7 (7 Columns)
 Begin Session Time of Day Std Last Day 1 to Last Day 7 (7 Columns)
 Cohort Day of Week (1 Column)
 End Session Time of Day Mean Last Day 1 to Last Day 7 (7 Columns)
 End Session Time of Day Std Last Day 1 to Last Day 7 (7 Columns)
9. To confirm the correct columns are selected, uncheck the Show dropped columns check box. You should see 37 total columns.
10. Select the Standard Build button to start the build process.

The standard build process builds the best model from an optimized process powered by AutoML; which provides the greatest accuracy. The build process can take up to one hour. During this time, Canvas tests hundreds of candidate pipelines, selecting the best model to present to us. In the following screenshot, we see the expected build time and progress.

Evaluate the model performance
When the model building process is complete, we see that the model predicted player churn 85.537% of the time. This seems realistic, but as analysts, we want to dive deeper and see if we can trust the model to make decisions based on its accuracy. Proceed by selecting the Scoring tab to review a visual plot of our predictions mapped to their outcomes. This allows us a deeper insight into our model.
SageMaker Canvas separates the dataset into training and test sets. The training dataset is the data Canvas uses to build the model. The test set is used to see if the model performs well with new data. The following image depicts what is called a Sankey diagram, which shows how the model performed on the test set. To learn more, refer to Evaluating Your Model’s Performance in Amazon SageMaker Canvas.

To get more detailed insights beyond what is displayed in the Sankey diagram, we can use a confusion matrix analysis. For example, we want to better understand the likelihood of the model making false predictions. We can see this in the Sankey diagram, but want more insights, so we choose Advanced metrics. We’re presented with a confusion matrix, which displays the performance of a model in a visual format with the following values, specific to the positive class.
True Positive (TP) – The number of True results that were correctly predicted as True
 True Negative (TN) – The number of False results that were correctly predicted as False
 False Positive (FP) – The number of False results that were wrongly predicted as True
 False Negative (FN) – The number of True results that were wrongly predicted as False
We’re measuring based on whether they will in fact churn, so our positive class is True in this example.
We can use this matrix chart to determine not only how accurate our model is, but when it is wrong, how often it might be wrong, and how it’s wrong.

The advanced metrics look good. We can trust the model result because we see very low false positives and false negatives. These are if the model thinks a player in the dataset will churn and they actually don’t (false positive), or if the model thinks the player will not churn and they actually do (false negative). High numbers for either might make us hesitant to use the model to make decisions.
Let’s go back to Overview tab to review the Column impact. This information can help the game developers gain insights that lead to taking actions to reduce customer churn. At a quick glance, the metrics from the fourth (-4) day, followed by the first (-1) and sixth (-6) days, highlight a critical period predictive of churn. In response, it is crucial that during this time, game developers introduce themed in-game events, offer player-specific discounts based on behavior, and promote collaborative global missions in hopes of fostering sustained interest and a deeper connection to the gaming experience.

Using the model to generate predictions
Our model is now trained! And, since our model looks accurate, we can directly perform an interactive prediction using our unbalanced features dataset, or any new player dataset with the same schema.
1. From the Analyze interface, select Predict.

2. Select Manual.

3. On the Select dataset for predictions page, choose the game-features dataset and then select Generate predictions.

4. Wait for the predictions to be generated. Once the predictions are generated, you will see a small pop-up appearing on the bottom of the window. Select the View link in the pop-up. You can now see all the predictions for the full dataset.

5. Select Download to download the predictions.
For any field in the Prediction (player_churn) column marked as True, we can then take the needed actions to retain this player or any player marked as True. Additionally, we can start to identify the player types and the factors that explain churn. We can take corrective actions to change predicted behavior, such as running targeted retention campaigns or change aspects of the game that might contribute to churn.

Clean up
To avoid incurring future session charges, log out of SageMaker Canvas using the icon on the lower left of the taskbar. Logging out ends the workspace instance used to build and run your ML model. You are only billed for the duration that you were logged in.
Conclusion
As our model has shown us, the dynamic world of gaming can be visualized as a multi-layered vortex. The outermost spirals represent the initial layers of excitement. However, as players delve deeper, reaching the critical third and fourth days, they find themselves at the vortex’s more challenging depths where engagement can wane. Leveraging no-code machine learning solutions provided by Amazon SageMaker Canvas, game developers don’t just aim to attract players, but to better understand and even predict their behavior. Much like the intricate layers of the vortex, live operations strategies are meticulously crafted to continually reignite the spark, emphasizing the paramount importance of creating long-term relationships with the players.
Visit the SageMaker Canvas homepage to learn more about no-code machine learning possibilities.