Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change priors for TVPs #775

Draft
wants to merge 47 commits into
base: main
Choose a base branch
from
Draft

Change priors for TVPs #775

wants to merge 47 commits into from

Conversation

bwengals
Copy link

@bwengals bwengals commented Jun 23, 2024

Description

This PR changes the TVP HSGP prior parameters for the scale and lengthscale to default to the PC prior, and be allowed to change to the Inverse Gamma.

Posting this PR as draft since I got pretty hung up on how best to integrate this into PyMC-Marketing, so hopefully I could get some guidance on how to do that.

I also reran the results in the TVP example nb. The first sine wave example looks OK to me, but the GP just reverts to it's mean (which looks to be estimated a bit off). The second example with the linear trend looks much much better. The PC prior allows the data to choose a very long lengthscale now. The third example shouldn't change much, a short lengthscale will work the best here. I do think it'd be nice to include an example where the GP can really shine, where the TVP varies irregularly with a moderate frequency.

Related Issue

Checklist

Modules affected

  • MMM
  • CLV

Type of change

  • New feature / enhancement
  • Bug fix
  • Documentation
  • Maintenance
  • Other (please specify):

📚 Documentation preview 📚: https://pymc-marketing--775.org.readthedocs.build/en/775/

wd60622 and others added 30 commits April 2, 2024 15:07
* add scaling into lift test method

* test scaling methods

* test change of MMM likelihood

* link up beta_channel parameters

* change name to sigma

* reorganize and edit

* closes #406

* address the feedback in docstrings

* add more to docstring

* format the docstring

* be verbose for future devs

* be explicit for the column max values

* incorporate the feedback

* hide cells and add to intro

* add conclusion

* switch to header 2

* run notebook through

* move the types together

* separate model estimated from empirical

* fix typos
* drop python 3.9

* try python 3.12

* undo try python 3.12
* improve nb

* rm warnings and add link to lifetimes quickstart

* address comments

* feedback part 3

* remove warnings manually
* add more info to the notebook

* hide plots code

* fix plot y labels

* fix plot outputs and remove model build

* improve final note probability plots

* address comments

* use quickstart dataset

* feedback part 3

* remowe warnings manually

* feedback part 4
* improve mmm docs init

* add more code examples to docstrings

* minor improvemeents

* typo

* better phrasing

* add thomas suggestion
* move fixtures to conftest

* docstrings and moved set_model_fit to conftest

* fixed pandas quickstart warnings

* revert to MockModel and add ParetoNBD support

* quickstart edit for issue 609

* notebook edit
remove ruff E501 ignore
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.3.5 → v0.3.7](astral-sh/ruff-pre-commit@v0.3.5...v0.3.7)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Juan Orduz <juanitorduz@gmail.com>
* fix potential bugs

* minor improvements

* remove rule for nb

* fix test

* improve tests syntax

* use stacklevel 2 for warnings

* use stacklevel 1 for warnings as they are used in public methods

* ignore B904

* undo change

* ricardos feedback

* use fit_posterior

* Update pymc_marketing/clv/models/gamma_gamma.py

Co-authored-by: Ricardo Vieira <28983449+ricardoV94@users.noreply.github.com>

* last fix XD

---------

Co-authored-by: Ricardo Vieira <28983449+ricardoV94@users.noreply.github.com>
* notebook opening and imports

* model definition markdown

* Data Load Notebook section

* WIP model fitting section

* added notebook to docs directory

* notebook edits and graph code

* ppc section and nb cleanup

* demz sampling and WIP plotting

* WIP predictive plots

* WIP heatmap plots

* predictive plots

* WIP covariates and nbqa-ruff edits

* covariate section

* plot additions

* fig sizes

* remove model file
…book (#651)

* add spaces, increase indentation, and fix number order

* explicit with 6
* Creating plot waterfall

Co-Authored-By: Carlos Trujillo <59846724+cetagostini@users.noreply.github.com>

* requested changes

* pre-commit

---------

Co-authored-by: Carlos Trujillo <59846724+cetagostini@users.noreply.github.com>
Databricks should have a lower-case b.
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.4.1 → v0.4.2](astral-sh/ruff-pre-commit@v0.4.1...v0.4.2)
- [github.com/pre-commit/mirrors-mypy: v1.9.0 → v1.10.0](pre-commit/mirrors-mypy@v1.9.0...v1.10.0)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* support for negative values and dates (if used)

* fix terrible spelling

* test dates in coords

* cover numpy objects

* consolidate the tests
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@bwengals bwengals changed the title first commit Change priors for TVPs Jun 23, 2024
@wd60622 wd60622 added enhancement New feature or request MMM labels Jun 23, 2024
@wd60622
Copy link
Contributor

wd60622 commented Jun 23, 2024

Carlos made some recent changes so their will have to be a rebase. Not only is there the intercept tvp kwargs (now config) but media tvp config

return pm.Deterministic(f"{name}", 1 / ell_inv)


def approx_hsgp_hyperparams(x, x_center, lengthscale_range: list[float], cov_func: str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is taken directly from PyMC, right? Can this be imported? Or not of its origins
Maybe create an issue to remove at some point once the PyMC version guarantees its existance

Copy link
Collaborator

@juanitorduz juanitorduz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bwengals! This is super nice, and I definitely appreciate your input on better GP priors. I left some comments that are more questions :)

It seems there is a merge conflict with a notebook so feel free to open a PR or rebase (whatever fits you well). I like the PC prior and we just need to define a nice fallback mechanism. We do not expect our users to be experts on GPs so that is a challenge. Once we crack that, we simply add a couple of tests and merge :)

Comment on lines +23 to +24
One dimensional PC prior for GP lengthscales, parameterized by tail probability:
p(lengthscale < lower) = alpha.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very nice! Would it be possible to get a bit more info about the nature of this prior and the benefits ? This can help us (and our users) whoa re not familiar with this :)

Comment on lines +175 to +176
# Note: this function sometimes fails, how to handle?
# If not using PC prior, should user set ls_mu and ls_sigma instead?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any simple heuristics for a reasonable fallback mechanism?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None that I know of, I think @ulfaslak told me he wrote something to fix this though?

}
cov_func = eta**2 * cov_funcs[cov_func.lower()](input_dim=1, ls=ls)

gp = pm.gp.HSGP(m=[m], L=[L], cov_func=cov_func, drop_first=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we pass the drop_true from the function signature here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got hung up on the best default. There can be identifiability issues between the intercept and a GP, particularly for smaller intercepts (relative to the GP) and at longer lengthscales. This is evident in the HSGP when c gets a bit larger, the first basis vector becomes more and more constant. For a default, I think the best option is to either, keep the intercept parameter and set drop_first = True, or have drop_first=False and not have an intercept parameter.

I'm not sure what would be better though, it depends on what the data "usually" looks like. Like if the intercept for the TVP is like 1000 and the GP part oscillates between -1 and 1, then there wont be identifiability issues, regardless of lengthscale. But if the intercept is like 0.2 and the GP oscillates between -1 and 1, then it'll be a problem.

phi, sqrt_psd = gp.prior_linearized(Xs=X[:, None] - X_mid)
hsgp_coefs = pm.Normal(f"{name}_hsgp_coefs", dims=hsgp_dims)
f = phi @ (hsgp_coefs * sqrt_psd).T
centered_f = f - f.mean(axis=0) + 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want this to oscillate around 1.0 as before?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took out f - f.mean(axis=0) + 1 since I think that will make predicting into the future difficult. And a better way to zero mean this is to de-mean the basis phi directly. I think if z ~ Normal(0, 1), then exp(z) will have a median at 1, so still oscillates around 1.0.

@@ -146,7 +257,7 @@ def create_time_varying_intercept(
)
return pm.Deterministic(
name="intercept",
var=intercept_base * multiplier,
var=pt.exp(intercept_base + multiplier),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this parametrization much better!

Copy link
Contributor

@cetagostini cetagostini Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey team, can we deep-dive a bit more about this implementation?

In the original implementation, The mean of 𝛽𝑡 would be the
**_beta (since the mean of multiplier is 1). Then variance of 𝛽𝑡 will depend on the variance of the multiplier.

Because of it, the initial implementation is more straightforward and interpretable (Personal opinion). Mostly around the response curves.

  1. What would be the benefit of this parameterization, if it is not more interpretable?
  2. Being exponential, it forces the final parameter to be positive. This depending on the parameter we want to be time dependent may or may not be needed.
  3. I think that if the result of **_base + latent is a normal distribution, there should be no problem, the result is a LogNormal distribution. But if it is Gamma or HalfNormal, the result would be an even more skewed distribution -This might be feasible because the config controls the base distributions- I'm not sure if it's a problem or not, but have we thought about how this transformation would apply to different arbitrary distributions?

A couple of questions that may be very obvious and I am a little slow to see the answers! 😅

cc: @bwengals @juanitorduz @wd60622 @ulfaslak

Comment on lines +239 to +246
# if model_config["intercept_tvp_kwargs"]["L"] is None:
# model_config["intercept_tvp_kwargs"]["L"] = (
# time_index_mid + DAYS_IN_YEAR / time_resolution
# )
# if model_config["intercept_tvp_kwargs"]["ls_mu"] is None:
# model_config["intercept_tvp_kwargs"]["ls_mu"] = (
# DAYS_IN_YEAR / time_resolution * 2
# )
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this to be deleted?

Are we using the DAYS_IN_YEAR at all? 🤔

@bwengals
Copy link
Author

Carlos made some recent changes so their will have to be a rebase. Not only is there the intercept tvp kwargs (now config) but media tvp config

Gotcha thanks @wd60622. I'll rebase and check out those updates. This should benefit that as well.

Copy link

review-notebook-app bot commented Jun 25, 2024

View / edit / reply to this conversation on ReviewNB

juanitorduz commented on 2024-06-25T07:58:52Z
----------------------------------------------------------------

Line #12.    sys.path.insert(0, "../../../../")

why do you need this :) ?


@juanitorduz
Copy link
Collaborator

@bwengals now that we have released 0.7.0 after some refactor I think this is a great time to work on this improvements :) Would you mind rebasing or (maybe even better to avoid fixing conflicts) creating a new PR and work on the integration? We can also work on smaller PRs to work in shorter iterations :)

@wd60622
Copy link
Contributor

wd60622 commented Jul 11, 2024

I will take this over using main branch. I will get to it when I can

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request MMM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants