New notebook: causal inference with MMM's #1032

drbenvincent · 2024-09-13T16:52:49Z

Description

This PR adds a new docs page which demonstrates ability to do causal inference with MMM's. In particular we showcase an example where we run a campaign with elevated media spend for a period of time. Our goal is to estimate the causal impact of the campaign. This is illustrated with simulated data in order to run a parameter recovery. So we know the true causal impact of the campaign and can therefore evaluate if the estimated causal impact is close to the true causal impact.

Overall, this notebooks showcases the ability to conduct causal inference in MMM's in pymc-marketing.

Related Issue

Partially addresses Expand causal/counterfactual functionality and add the do-operator #1022

Checklist

Checked that the pre-commit linting/style checks pass
Included tests that prove the fix is effective or that the new feature works
Added necessary documentation (docstrings and/or example notebooks)
If you are a pro: each commit corresponds to a relevant logical change

Modules affected

MMM
CLV

Type of change

📚 Documentation preview 📚: https://pymc-marketing--1032.org.readthedocs.build/en/1032/

review-notebook-app · 2024-09-13T16:52:54Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

juanitorduz · 2024-09-13T17:56:31Z

@drbenvincent What if we do a train test split (before and after intervention) and generate predictions and counterfactuals as https://www.pymc-marketing.io/en/latest/notebooks/mmm/mmm_time_slice_cross_validation.html ? This is how we use it in practice. WDYT?

I fear that if we train on the whole data set, we will leak information (for example, inferring the parameter of the trend component).

drbenvincent · 2024-09-13T18:05:44Z

I will think about it. There are at least 2 scenarios:

Using prior to an experiment to help in planning an experiment. For example, guiding you on level of spend increase for different minimum detectable effects.
Using after the experiment - to analyse the causal impact of a campaign. In this case I don't think there's a data leakage issue. Or are you thinking about a case where there's an unobserved confounding variable which changed at the same time as the intervention?

EDIT: I'm not sure I understand the issue with data leakage in this particular scenario. But it's been a long week.

I'm also planning on taking about (but maybe not including a worked example) what you'd do in real work situations when you don't have counterfactual spends. That could be approached by using CausalPy & synthetic control to predict what the media spend would have been in the absence of the campaign. Though it might be the case that a company knows what they were going to spend in the absence of a campaign.

juanitorduz · 2024-09-13T18:13:50Z

I'm not sure I understand the issue with data leakage in this particular scenario. But it's been a long week.

For example, if you train in the whole data set, then you are already using information on the post-intervention period to infer parameters to generate counterfactuals.

Now that I think about it, according to https://smithamilli.com/blog/causal-ladder/ (and similar references). Your approach is indeed at the last level: counterfactuals (and fitting on the whole data is ok I think). Whereas how I would use it (for scenario planing) is on level 2: interventions. This can actually be a nice discussion to have in the notebook (if we want to)

drbenvincent · 2024-09-13T18:27:35Z

I understand the point now - you could argue that training on the full data may overestimate the trend. But I think that any increase in the outcome should be accounted for by the media + transformations.

Good point about counterfactual versus interventions. I think it is worth talking about that in the notebook.

review-notebook-app · 2024-09-14T00:09:53Z

View / edit / reply to this conversation on ReviewNB

cetagostini commented on 2024-09-14T00:09:53Z
----------------------------------------------------------------

I like it, but I think that to be in line with the language usually used in causal inference, I would say that we add an intervention to the synthetic dataset.

Also, I would talk about the intervention, giving the example where we will increase our budget by 40% from period A to B.

drbenvincent commented on 2024-09-16T14:59:45Z
----------------------------------------------------------------

Have slightly modified this text. Let's see how it all reads once the larger scale structure of the notebook has settled down a bit.

cetagostini commented on 2024-09-16T20:08:57Z
----------------------------------------------------------------

🚀

review-notebook-app · 2024-09-14T00:09:54Z

View / edit / reply to this conversation on ReviewNB

cetagostini commented on 2024-09-14T00:09:53Z
----------------------------------------------------------------

Just to be more descriptive I would perhaps call it something like simulate_intervention instead of generate_actual_dataset ?

drbenvincent commented on 2024-09-16T14:57:46Z
----------------------------------------------------------------

Done in an upcoming commit

review-notebook-app · 2024-09-14T00:09:54Z

View / edit / reply to this conversation on ReviewNB

cetagostini commented on 2024-09-14T00:09:54Z
----------------------------------------------------------------

Line #6.        Based on the counterfactual dataset, apply the intervention on the specified date

You mentioned here counterfactual, may be nice to talk about whats a counterfactual and factual scenario? what do we mean by causal impact?

review-notebook-app · 2024-09-14T00:09:55Z

View / edit / reply to this conversation on ReviewNB

cetagostini commented on 2024-09-14T00:09:55Z
----------------------------------------------------------------

Here we start talking about counterfactuals without any previous information, I would add a paragraph before to explain in detail this section.

Mentioning that we have a dataset with the real intervention, but because we know the strength of the real intervention, we can create another time series where it did not occur and estimate the true effect of the 40% increase in spending.

I think all the information is there, but I would make it clearer and with a little more storytelling.

drbenvincent commented on 2024-09-16T15:01:29Z
----------------------------------------------------------------

Yep - so this initial commit was very preliminary - I've expanded on the text and narrative of the notebook much more now

cetagostini commented on 2024-09-16T20:09:48Z
----------------------------------------------------------------

🚀

review-notebook-app · 2024-09-14T00:09:56Z

View / edit / reply to this conversation on ReviewNB

cetagostini commented on 2024-09-14T00:09:55Z
----------------------------------------------------------------

Are these priors doing anything? We can leave the default ones we use. Otherwise, we could extend and show how changing priors changes estimated impact. (?)

drbenvincent commented on 2024-09-16T15:01:46Z
----------------------------------------------------------------

Good point. I've removed these and we now use default priors

drbenvincent · 2024-09-14T00:15:23Z

Just to say... this notebook is going to get substantial modification / TLC. But the input already is great - looks like this topic is interesting to people 🙂

drbenvincent · 2024-09-14T00:51:28Z

Very excited about plans I have for this post. Though changes will have to wait until Monday. Watch this space...

wd60622

Very cool.

Do you think we can do away with the large functions and duplication in the notebook? It seems the API should expose:

parameter freezing via pm.do instead redefining the forward_pass
- Someone can initialize model without data (that defines the structure) and call something like: params = {"intercept: 1, ...}; y = mmm.generate_target(X, params=params, target_scaler=...). This opens door to X_counterfactual = some_modification(X); y_counterfactual = mmm.generate_target(X_counterfactual, params=params, ...)
The counterfactual comparision plot. We just need two targets, right? which we can store in xarray to have the date dim
- plot_counterfactual(y, y_counterfactual, along="date", intervention_date="...")

WDYT?

drbenvincent · 2024-09-14T09:53:04Z

Yep - so this is the first very crude implementation. I'm trying to move away from generating synthetic data from separate numpy code and doing it with the DAG, sampling for the prior. I didn't know about mmm.generate_target so this is very useful. I'll see how far I get with that approach and maybe ask for some guidance. I know how to do it with base pymc, but I'm still unfamiliar with much of the pymc-marketing functionality/methods.

And I'll move towards getting counterfactual samples into the idata.predictions field. Is that what you're suggesting?

This notebook is going to change quite a lot before I flag as ready for review 👍🏻

wd60622 · 2024-09-14T09:55:55Z

Yep - so this is the first very crude implementation. I'm trying to move away from generating synthetic data from separate numpy code and doing it with the DAG, sampling for the prior. I didn't know about mmm.generate_target so this is very useful. I'll see how far I get with that approach and maybe ask for some guidance. I know how to do it with base pymc, but I'm still unfamiliar with much of the pymc-marketing functionality/methods.

That doesn't exist but just some ideas are API. I think we should make whatever is useful to us

And I'll move towards getting counterfactual samples into the idata.predictions field. Is that what you're suggesting?

The idata for the instance won't exist until it being fit. However, xarray objects could be generated. Similar to the sample_prior, sample_curve workflow for many of the components as they relay on pymc functions

This notebook is going to change quite a lot before I flag as ready for review 👍🏻

👍

drbenvincent · 2024-09-16T14:57:48Z

Done in an upcoming commit

View entire conversation on ReviewNB

drbenvincent · 2024-09-16T14:59:46Z

Have slightly modified this text. Let's see how it all reads once the larger scale structure of the notebook has settled down a bit.

View entire conversation on ReviewNB

drbenvincent · 2024-09-16T15:01:30Z

Yep - so this initial commit was very preliminary - I've expanded on the text and narrative of the notebook much more now

View entire conversation on ReviewNB

drbenvincent · 2024-09-16T15:01:48Z

Good point. I've removed these and we now use default priors

View entire conversation on ReviewNB

drbenvincent · 2024-09-16T15:11:32Z

Sorry, took me a moment to understand that this was a proposal @wd60622.

parameter freezing via pm.do instead redefining the forward_pass

Someone can initialize model without data (that defines the structure) and call something like: params = {"intercept: 1, ...}; y = mmm.generate_target(X, params=params, target_scaler=...). This opens door to X_counterfactual = some_modification(X); y_counterfactual = mmm.generate_target(X_counterfactual, params=params, ...)

Yep - I will have a go at defending these functions (quickly and crudely) in the notebook. Then they can be built out either in this PR or a different one.

The counterfactual comparision plot. We just need two targets, right? which we can store in xarray to have the date dim

plot_counterfactual(y, y_counterfactual, along="date", intervention_date="...")

Yes, but I think we would need a way to set the predictions=True flag in the kwargs of pm.sample_posterior_predictive. Is that possible now, or would that be new functionality.

I have to shift focus for a day or so, but I'll see what I can do code-wise in this notebook to inspire some new functionality.

cetagostini · 2024-09-16T20:08:58Z

🚀

View entire conversation on ReviewNB

cetagostini · 2024-09-16T20:09:50Z

🚀

View entire conversation on ReviewNB

wd60622 · 2024-09-17T12:52:23Z

Feel free to make some issues if you don't touch in this PR

Yes, but I think we would need a way to set the predictions=True flag in the kwargs of pm.sample_posterior_predictive. Is that possible now, or would that be new functionality.

I have to shift focus for a day or so, but I'll see what I can do code-wise in this notebook to inspire some new functionality.

The sample_posterior_predictive method allows for passing sample_posterior_predictive kwargs.
My idea was to just pass the xarray.DataArray objects for a function

initial and crude proof of concept notebook

1eb0fe8

drbenvincent added docs Improvements or additions to documentation MMM labels Sep 13, 2024

drbenvincent marked this pull request as draft September 13, 2024 16:52

drbenvincent changed the title ~~initial and crude proof of concept notebook~~ New notebook: causal inference with MMM's Sep 13, 2024

Merge branch 'main' into counterfactuals

bc572e5

wd60622 reviewed Sep 14, 2024

View reviewed changes

drbenvincent added 9 commits September 14, 2024 13:35

add causal ladder framing, improve structure, add text

b072ca8

small tidy up

1fa5df0

fix admonition boxes and internal notebook links

cf82bec

misc small changes and clean ups

f8826d7

various updates to text and code

bdc7e6e

Merge branch 'main' into counterfactuals

cf35809

now we have an intervention period with a start and end

a09ce5e

add note about lagged effects

8611491

rename function

4717d97

expand on text before simulate_intervention function

d5d29d3

drbenvincent added 4 commits September 16, 2024 17:09

refactor plotting code

a7821d6

re-execute notebook

c19e0cc

fix typos + expand explanation of data simulation

08f2db2

fix typos + re-execute

a0c656f

Merge branch 'main' into counterfactuals

173898f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New notebook: causal inference with MMM's #1032

New notebook: causal inference with MMM's #1032

drbenvincent commented Sep 13, 2024 •

edited by github-actions bot

Loading

review-notebook-app bot commented Sep 13, 2024

juanitorduz commented Sep 13, 2024 •

edited

Loading

drbenvincent commented Sep 13, 2024 •

edited

Loading

juanitorduz commented Sep 13, 2024

drbenvincent commented Sep 13, 2024 •

edited

Loading

review-notebook-app bot commented Sep 14, 2024 •

edited

Loading

review-notebook-app bot commented Sep 14, 2024 •

edited

Loading

review-notebook-app bot commented Sep 14, 2024 •

edited

Loading

review-notebook-app bot commented Sep 14, 2024 •

edited

Loading

review-notebook-app bot commented Sep 14, 2024 •

edited

Loading

drbenvincent commented Sep 14, 2024

drbenvincent commented Sep 14, 2024

wd60622 left a comment

drbenvincent commented Sep 14, 2024

wd60622 commented Sep 14, 2024 •

edited

Loading

drbenvincent commented Sep 16, 2024

drbenvincent commented Sep 16, 2024

drbenvincent commented Sep 16, 2024

drbenvincent commented Sep 16, 2024

drbenvincent commented Sep 16, 2024

cetagostini commented Sep 16, 2024

cetagostini commented Sep 16, 2024

wd60622 commented Sep 17, 2024

New notebook: causal inference with MMM's #1032

Are you sure you want to change the base?

New notebook: causal inference with MMM's #1032

Conversation

drbenvincent commented Sep 13, 2024 • edited by github-actions bot Loading

Description

Related Issue

Checklist

Modules affected

Type of change

review-notebook-app bot commented Sep 13, 2024

juanitorduz commented Sep 13, 2024 • edited Loading

drbenvincent commented Sep 13, 2024 • edited Loading

juanitorduz commented Sep 13, 2024

drbenvincent commented Sep 13, 2024 • edited Loading

review-notebook-app bot commented Sep 14, 2024 • edited Loading

review-notebook-app bot commented Sep 14, 2024 • edited Loading

review-notebook-app bot commented Sep 14, 2024 • edited Loading

review-notebook-app bot commented Sep 14, 2024 • edited Loading

review-notebook-app bot commented Sep 14, 2024 • edited Loading

drbenvincent commented Sep 14, 2024

drbenvincent commented Sep 14, 2024

wd60622 left a comment

Choose a reason for hiding this comment

drbenvincent commented Sep 14, 2024

wd60622 commented Sep 14, 2024 • edited Loading

drbenvincent commented Sep 16, 2024

drbenvincent commented Sep 16, 2024

drbenvincent commented Sep 16, 2024

drbenvincent commented Sep 16, 2024

drbenvincent commented Sep 16, 2024

cetagostini commented Sep 16, 2024

cetagostini commented Sep 16, 2024

wd60622 commented Sep 17, 2024

drbenvincent commented Sep 13, 2024 •

edited by github-actions bot

Loading

juanitorduz commented Sep 13, 2024 •

edited

Loading

drbenvincent commented Sep 13, 2024 •

edited

Loading

drbenvincent commented Sep 13, 2024 •

edited

Loading

review-notebook-app bot commented Sep 14, 2024 •

edited

Loading

review-notebook-app bot commented Sep 14, 2024 •

edited

Loading

review-notebook-app bot commented Sep 14, 2024 •

edited

Loading

review-notebook-app bot commented Sep 14, 2024 •

edited

Loading

review-notebook-app bot commented Sep 14, 2024 •

edited

Loading

wd60622 commented Sep 14, 2024 •

edited

Loading