Timeseries charts can be quite powerful for depicting changes of a metric over time. For instance, this chart shows how revenue has grown over time:

From a cursory glance, it appears that revenue has really skyrocketed after our mobile launch on 12/24/14, but how can we be sure of the impact? In this blog, we dissect and quantify the effect of a single event on time-series data.

## Timeseries Data

A time-series is a collection of measurements for one metric over time. Timeseries data comes in a variety of shapes and sizes. For instance, the height of one person collected over time is expected to increase with each data point dependent on the previous data points; whereas stock prices for Apple are expected to be dynamic and hard to predict.

If the past is a good predictor of the future, we can model time-series data using an autoregressive model, which means tomorrow’s value depends linearly on today’s value plus some imperfectly predictable term.

Autoregressive models can be used to model revenue since, as an example, tomorrow’s revenue is probably very similar to today’s revenue with some noise added in. What complicates this analysis is the impact of a singular event on revenue.

## Mobile Launch Scenario

Revisiting our earlier example, it’s really hard to precisely measure the impact that our mobile launch has had on revenue. However, we can actually build out an autoregressive model to measure this impact and then use the data to calculate the effect of the launch on revenue.

To measure the impact of a particular event, we will use a dummy variable D to indicate whether or not the mobile launch has happened yet. D will take on the value 1 when the mobile launch has already happened or 0 when it hasn’t. Thus, to generate the complete dataset for this series, we use the following in Postgres/Redshift SQL:

``````with revenue as
(
select d
date
, sum(price) as dailyrev
, case when date >= '2014-12-24'
then 1
else 0
end as D
from purchases
group by 1, 3
)``````

We first build an AR(1) model – an auto-regressive model which uses the first lag of the metrics to predict its value. As with most time-series models, we will be using the log of returns, where the return r is calculated as the proportional change in our metric x:

Doing the same in SQL, we create the CTE:

``````with
...
, log_revenue as
(
select date, log(dailyrev) as logrev
, log(lag(dailyrev) over (order by date))
as laglogrev
, D
from revenue
)``````

Putting this together with our dummy variable, we get the following AR(1) model:

## Calculating Coefficients for the Regression

Now that we have the model, we want to calculate the coefficients for all the beta values by running a regression analysis. To do this, we can use the ordinary least squares method to estimate these values:

Within SQL, we can create a few more CTEs — one for doing all the sigma calculations and one each for each beta:

``````with
...
, sigmas as
(
select
sum(D*D)-sum(D)*sum(D)/count(*) as SigSqD
, sum(laglogrev*logrev)-sum(laglogrev)*sum(logrev)
/count(*) as Sigy1y
, sum(laglogrev*D)-sum(laglogrev)*sum(D)/count(*) as Sigy1D
, sum(D*logrev)-sum(D)*sum(logrev)/count(*) as SigDy
, sum(laglogrev*laglogrev)-sum(laglogrev)*sum(laglogrev)
/count(*) as SigSqy1
, pow(sum(laglogrev*D)-sum(laglogrev)*sum(D)/count(*),2)
as SqSigy1D from log_revenue
)
, beta1 as
(
select
(SigSqD*Sigy1y-Sigy1D*SigDy)/(SigSqy1*SigSqD-SqSigy1D)
as beta1
from
sigmas
)
, beta2 as
(
select (SigSqy1*SigDy-Sigy1D*Sigy1y)/(SigSqy1*SigSqD-SqSigy1D)
as beta2
from
sigmas
)
, beta0 as
(
select
avg(logrev)-(select beta1 from beta1) * avg(laglogrev)
- (select beta2 from beta2)*avg(D)
as beta0
from
log_revenue
)``````

Now that we have the finalized regression coefficients, we can draw a nice trendline through the data:

``````with
...
, finalized_regression as
(
select
log_revenue.*
, beta1
, beta2
, beta0
from
log_revenue
, beta0
, beta1
, beta2
)
, predicted_revenue as (
select
finalized_regression.date
, finalized_regression.logrev as log_revenue
, (beta0 + beta1*log_revenue.laglogrev + beta2*log_revenue.D)
as predicted_log_revenue
from finalized_regression
join log_revenue using (date)
)
select *
from predicted_revenue``````

Looks like the trendline is a great fit!

## Translating an Event into \$\$\$

The beta values from our above regression turned out to be:

```beta0 = 2.609
beta1 = 0.724
beta2 = 0.553```

So what do these numbers tell us about our mobile launch on 12/24/14? The key number we’re looking at is beta2, which is the coefficient in front of the dummy variable we created to indicate post-launch metrics. Since we were examining the logarithm of revenue returns, we can simply take the exponential to find its impact:

e^(0.553*1) = 1.738x

From the above analysis, launching the mobile version of the game produced 74% higher daily revenue than pre-launch! The day after our launch, our revenue is \$25k higher according to our model.

Tags: