Join Us for the MMA webinar on IDFA featuring Charles Manning Join Us for the MMA webinar on IDFA featuring Charles Manning

Kochava brings advertisers the best tools and intel. The following post from Aarki dives deep into methodologies and benefits for CTR Prediction.

Aarki is a Kochava-certified network. Click here to learn how to create an install campaign or reengagement campaign within Kochava.


The success of a mobile ad is determined by its ability to drive audience action — typically measured by the click-through rate (CTR). Accurate CTR prediction resulting from campaign actions is critical to the success of any mobile app advertising campaign.

An ad can be deemed successful if it piques audience interest enough to incite them to interact with the call-to-action. To measure how well the ad does in capturing interest, ad click-through rate (CTR) is typically used. The higher the CTR, the more successful the ad is in generating interest amongst the target audience. In addition, CTR prediction can be helpful in setting campaign goals. The more accurate the prediction is, the better it can help advertisers set realistic expectations. This prediction can also be used to make better media buying decisions. Thus, the ability to accurately predict ad CTR is essential in mobile app advertising.

Benefits of Accurate Click Prediction

This is especially true in a real-time bidding (RTB) situation, where the bid amount is derived from the predicted probability of a click, and it is not sufficient to simply rank ads ordinarily. The predicted click probability is used to determine which ad impressions to bid on and the bid amount. The accuracy of CTR prediction not only determines the placement of the ad but also the ad performance.

There are several methods that can be used to predict click probability.

Logistic Regression

A logistic regression model, and variants thereof, is commonly used to analyze the performance of ad campaigns. It is generally a common choice for predicting the probability of a binary (yes/no) outcome based on a set of independent variables XX.

This model assumes that for a set of coefficients ββ, the dependent variable (yy) takes on the value 1 with probability given by

p(y=1β)=σ(βTX)p(y=1∣β)=σ(βTX)

Represented differently, yy can be specified as a Bernoulli random variable with distribution

p(yβ)=Bernoulli(σ(βTX))p(y∣β)=Bernoulli(σ(βTX))

Naive Maximum Likelihood Estimation

In the simplest case, ββ is chosen to maximize p(yβ)p(y∣β), the probability of observing the training data using maximum-likelihood estimate (MLE). This estimation approach assumes each observation to be an independent event and the model parameters to be constants.

Anything we know about the model parameters a priori is ignored. While it is tempting to assume that we know nothing and let the model do the work, feeding prior knowledge into the model helps to minimize the generalization error.

One consequence of naive MLE is a tendency to overfit, i.e., exaggerate relatively small fluctuations in observed data. This can be mitigated by techniques such as regularization, but this leads to miscalibrated probabilities, which require additional adjustment.

It is also important to remember that the probability of success is a random variable and is influenced by various exogenous factors not included in the model. For example, the basic model excludes randomness in usage patterns that may have occurred over time. MLE only gives an average point estimate of this random variable. While this estimate is a useful measure of the central tendency, we cannot be certain that it is representative of the entire distribution for prediction purposes.

Maximum A Posteriori Estimation

A somewhat more sophisticated model estimation approach is to choose the coefficient vector ββ that maximizes the posterior probability p(βy)p(β∣y) using maximum a posteriori (MAP) estimate. This posterior probability is proportional to p(yβ)p(β)p(y∣β)p(β).

This approach does incorporate our prior knowledge about the model parameters p(β)p(β) and mitigates the overfitting issue. The result, however, is still a point estimate of the probability, albeit a more informed one that includes state dependence.

Bayesian Inversion Estimation

Ideally, we would like to get a picture of the full posterior distribution, one that is aware of both state dependence and random effects. This posterior distribution is given by Bayes’ theorem

p(βy)=p(yβ)p(β)p(y)p(β∣y)=p(y∣β)p(β)p(y)

However, in all but the simplest of cases, this model needs to rely on numerical estimation such as the Markov Chain Monte Carlo (MCMC) method. This model estimation is able to capture both prior knowledge as well as random changes to the system in a robust manner. As a result, the team can make campaign decisions that are statistically sound and more trackable.

Experiment

To illustrate the differences between the three approaches outlined above, we trained and tested three model specifications on a set of 2.5 million historical impressions spanning 23 campaigns and over 10,000 publishers.

Each model was trained on a random 75% subsample of the dataset, and then tested over the remaining 25% using average element log-loss, and an R-squared statistic computed by averaging the true and predicted CTR for each campaign. The baseline case represents a “market share” model that predicts the average CTR from the training set for every impression in the test set.

Results

The results of the analysis are given below.

Avg. Predicted CTR Avg. Error R-squared Log-loss
Baseline 0.099200 5.01 -0.02 0.312732
Naive MLE 0.098258 4.01 0.53 0.298476
MAP 0.095153 0.71 0.67 0.221826
Bayesian Inversion 0.094219 -0.26 0.81 0.211702
Actual 0.094467 0.00 1.00 0.000000

 

Summary

The Naive MLE model results in the least accurate predictions. Model fit metrics improve significantly when we include state dependence in model estimation. In particular, the MAP estimate, which incorporates a shrinkage prior on the regression coefficients as a form of regularization, results in a significantly better model, which generalizes better to the test set than MLE.

The most accurate estimate, however, is the Bayesian Inversion model. The high accuracy of this estimate can be credited to the fact that the model is aware of both state dependence and random effects.

This analysis indicates that choice of model specification and estimation methodology can have a significant impact on the accuracy and robustness of the prediction. This, in turn, impacts campaign performance and the velocity of achieving optimal results.


Aarki is a Kochava-certified network. Click here to learn how to create an install campaign or reengagement campaign within Kochava.