At Raptor, we have made a big effort for simplifying the rather complex calculations running behind the scenes of the CLV model when training and validating it's expected performance when applied to your data. The output of the validation displayed in the interface is a simple, binary High or Low indication.
While this is useful in practice, you might be interested in the methods leading to the result of the validation. This article will present the methods used to evaluate the CLV model's ability to make precise predictions of your customers' future behavior.
Go to this article to see how to apply a CLV model to your data in the Customer Data Platform.
Training & Validation
When setting up a CLV model, the algorithm learns a mathematical representation of the customer base. This representation is used to predict the CLV and Churn scores for each customer. During setup and again once every day when the model is pushed into production, training and validation succeed through four different evaluation methods that can each be expressed through a graph. All four methods must prove sufficiently strong for performance to be high.
The model is trained on the data available in the data set, and the result of the training is then validated against the actual data. When comparing the model's representation to the actual data, we can determine if the model is precise or not.
As a general rule, the CLV model requires at least four months of data to be able to perform training and validation.
Graph no. 1: Number of customers placing a given number of orders
In this evaluation we look at the number of customers placing any given number of orders in the full available data set. We also determine the number of customers placing any given number of orders according to the mathematical representation inside the model. If the model provides a representation that is sufficiently close to the actual customer distribution, we can expect good predictions from the model.
Graph no. 2: Accumulated number of orders
Here we evaluate the model's ability to predict the number of orders into the future. Since no one knows the future, this evaluation is done by splitting the data in two parts. A training data set and a validation data set. We train the model on the training data set and use that model to predict the number of orders in the validation data set.
Graph no. 3: Predicted number of orders per customer
In this test we split the available data in two equal parts. One training set from the date of the first data to the middle of the data, and one holdout set from the middle to the date of the last data. We train the CLV model on the training set. We then group all the customers by the number of orders they placed in the training set, and use the model to predict the average number of orders the customers in each group will place in the holdout period. If this estimate is close to the actual number of orders the customers did buy, then the estimate is good. If the model underestimates the number of orders, you might need data for a longer time span before the model can truly estimate the customer return rate.
Graph no. 4: Predicted number of orders per customer when grouped by months since last order
In this test, we group customers by the number of months since the last order and then compare the actual number of orders placed by customers in each group to the predicted number of orders for each customer group.
If you know your sales are distributed very unevenly over a year you might see that the bars representing the actual sales fluctuates. A good model should give a good overall representation of the sale distribution - averaging out variations between the months.
What if the CLV Model indicates low performance?
In general, the CLV Model gets better as more data streams into your CDP.
The model is trained on data from the date of the first data to the middle of the data, and validated on data from the middle to the date of the last data. If you know that your sales differed significantly in between these two periods, that might explain the difference, and might give you confidence in using the model despite the difference. Otherwise you should expect that the model will be more precise if it gets more data.
If not enough data is provided, then the model will not know to what extent your customers are returning, and you need to wait or provide more historic data for the model.
If the model overestimates the number of orders that might be because the holdout data set represents a period with a significantly lower amount of orders compared to the training data set. You can use the model but you need to compensate when interpreting attributes like 'predicted number of future orders' and 'lifetime value'.
The model used in production is trained on the full data set (not the data set split into training and validation), and is therefore expected to be more accurate than shown in this test.