Last active
February 17, 2017 04:10
-
-
Save mindyng/35b418e11ff480815841080bbb7cf71d to your computer and use it in GitHub Desktop.
Mode Analytics Case Study: Validating A B Test Results
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Validating A/B Test Results for Yammer | |
Possible Causes to Increased Messages in Treatment Group | |
1. Metric may need to be redefined | |
2. Poor calculations | |
3. Users were not random, which would make test set-up faulty by being bias | |
4. Confounding factor that is hard to detect, but having effect(s) on test results | |
1. Metric needs to be redefined. | |
Was test run correctly? | |
Maybe not the best metric to use to explain rise in treatment group results. | |
Advice is to use different metrics, such as Average Login's and/or Average Days Engaged. | |
If both of these metrics show an increase in treatment group results over the | |
control group results then this means that increase cannot be explained alone by | |
the metric, number of messages sent. | |
Query for Average Login's and Average Days Engaged follow: | |
a. Average Login's | |
Refer to attached file #1 for SQL code. | |
b. Average Days Engaged | |
Refer to attached file #1 for SQL code. | |
2. Poor calculations. | |
The results could have been a consequence of poor calculations. There are a wide variety of ways to run an A/B test. There are different opinions on how to conduct the test, such as how to define test-sample size, whether or not a one-tailed or two-tailed test are to be used, what assumptions are to be made about distributions. However, in the end, these do not affect test results. | |
Also, how the users were grouped could be reconsidered. New and old users were grouped together, which could cause a faulty measurement. If messages are tallied, a new user would always have less messages than an old user. Therefore, separating new and old users would be advised. When this happens, novelty effect could be measured. For example, if a new feature is introduced to an old user, would this cause increased engagement? | |
3. Users treated incorrectly. | |
Isolating new users to treatment group caused a problem. This made engagement results extremely low. Instead, there could have been mixing of new users with old users for the treatment group. This could have caused higher measurements in treatment group. | |
Treatments by Month Activated: | |
Refer to attached file #2 for SQL code. | |
When new user data was taken out and existing user data was kept to calulate average messages sent, the difference between control and treatment groups was significantly less than test results gathered from mixed groups used in control and treatment groups. | |
4. Confounding factors. | |
There could be multiple problems in test results. Thus, it is advised to explore all possible reasons. Also, sometimes problems can be related. Thus, it helps in having a comprehensive perspective and full understanding of a experiment’s results to help in advising different teams when presenting results. | |
Other relationships to check out are users who use different devices or users who are are different user types. When separating groups based on these new labels, there could be different test results worthy to explore and make own conclusions on in order to have more informed decision. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
AVERAGE LOGIN'S & AVERAGE DAYS ENGAGED | |
--First block of code from SELECT to FROM sets up the columns in the table. New statistics columns are included. | |
SELECT c.experiment, | |
c.experiment_group, | |
c.users, | |
c.total_treated_users, | |
ROUND(c.users/c.total_treated_users,4) AS treatment_percent, | |
c.total, | |
ROUND(c.average,4)::FLOAT AS average, | |
ROUND(c.average - c.control_average,4) AS rate_difference, | |
ROUND((c.average - c.control_average)/c.control_average,4) AS rate_lift, | |
ROUND(c.stdev,4) AS stdev, | |
ROUND((c.average - c.control_average) / | |
SQRT((c.variance/c.users) + (c.control_variance/c.control_users)) | |
,4) AS t_stat, | |
(1 - COALESCE(nd.value,1))*2 AS p_value | |
FROM ( | |
--Another block from SELECT to FROM creates more stat's columns. | |
SELECT *, | |
MAX(CASE WHEN b.experiment_group = 'control_group' THEN b.users ELSE NULL END) OVER () AS control_users, | |
MAX(CASE WHEN b.experiment_group = 'control_group' THEN b.average ELSE NULL END) OVER () AS control_average, | |
MAX(CASE WHEN b.experiment_group = 'control_group' THEN b.total ELSE NULL END) OVER () AS control_total, | |
MAX(CASE WHEN b.experiment_group = 'control_group' THEN b.variance ELSE NULL END) OVER () AS control_variance, | |
MAX(CASE WHEN b.experiment_group = 'control_group' THEN b.stdev ELSE NULL END) OVER () AS control_stdev, | |
SUM(b.users) OVER () AS total_treated_users | |
FROM ( | |
--Another block from SELECT to FROM creates more stat's columns. | |
SELECT a.experiment, | |
a.experiment_group, | |
COUNT(a.user_id) AS users, | |
AVG(a.metric) AS average, | |
SUM(a.metric) AS total, | |
STDDEV(a.metric) AS stdev, | |
VARIANCE(a.metric) AS variance | |
FROM ( | |
--Another block from SELECT to FROM with second use of CASE. | |
SELECT ex.experiment, | |
ex.experiment_group, | |
ex.occurred_at AS treatment_start, | |
u.user_id, | |
u.activated_at, | |
COUNT(CASE WHEN e.event_name = 'login' THEN e.user_id ELSE NULL END) AS metric | |
FROM (SELECT user_id, | |
experiment, | |
experiment_group, | |
occurred_at | |
FROM tutorial.yammer_experiments | |
WHERE experiment = 'publisher_update' | |
) ex | |
--Here, many tables with common columns are connected together in order to condense data. | |
JOIN tutorial.yammer_users u | |
ON u.user_id = ex.user_id | |
JOIN tutorial.yammer_events e | |
ON e.user_id = ex.user_id | |
AND e.occurred_at >= ex.occurred_at | |
AND e.occurred_at < '2014-07-01' | |
AND e.event_type = 'engagement' | |
GROUP BY 1,2,3,4,5 | |
) a | |
GROUP BY 1,2 | |
) b | |
) c | |
--Here, normal distribution is included and relevcant stat's. | |
LEFT JOIN benn.normal_distribution nd | |
ON nd.score = ABS(ROUND((c.average - c.control_average)/SQRT((c.variance/c.users) + (c.control_variance/c.control_users)),3)) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
TREATMENTS BY ACTIVATION MONTH | |
--Block of code from SELECT to FROM sets up columns that isolate control group and treatment group per activation month. | |
SELECT DATE_TRUNC('month',u.activated_at) AS month_activated, | |
COUNT(CASE WHEN e.experiment_group = 'control_group' THEN u.user_id ELSE NULL END) AS control_users, | |
COUNT(CASE WHEN e.experiment_group = 'test_group' THEN u.user_id ELSE NULL END) AS test_users | |
FROM tutorial.yammer_experiments e | |
--Combining tables using user information. | |
JOIN tutorial.yammer_users u | |
ON u.user_id = e.user_id | |
GROUP BY 1 | |
ORDER BY 1 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment