Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relax check_X_y for DecayEstimator #580

Merged
merged 5 commits into from
Oct 13, 2023

Conversation

FBruzzesi
Copy link
Collaborator

Description

Relaxes the check for X in DecayEstimator by delegating the actual checks to the given estimator. This fixes #573 and allows estimators that do not need X features.
Remark, as discussed in the issue, it may be worth it to completely remove the check.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

  • My code follows the style guidelines (flake8)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation (also to the readme.md)
  • I have added tests that prove my fix is effective or that my feature works
  • I have added tests to check whether the new feature adheres to the sklearn convention
  • New and existing unit tests pass locally with my changes

@koaning
Copy link
Owner

koaning commented Oct 5, 2023

I'm also OK with removing the entire check. Mainly because the internal method should already be doing that and this way we'll need to make fewer assumptions.

@FBruzzesi
Copy link
Collaborator Author

I'm also OK with removing the entire check. Mainly because the internal method should already be doing that and this way we'll need to make fewer assumptions.

To be completely pedantic, the check_X_y function call checks and converts X and y to arrays. This allows to then access .shape attribute to know how many samples are there and compute the decay.

Now both approaches feel wrong and with shortcomings:

  • Using a check, it may lead to errors or different results from the check in the estimator (e.g. if the estimator allows for multioutput while the check we are doing doesn't)
  • Not having a check doesn't allow to access .shape or len in the general case (I noticed this by running unit tests) and I would be unsure how to proceed.

Opinions?

@koaning
Copy link
Owner

koaning commented Oct 10, 2023

Not having a check doesn't allow to access .shape or len in the general case (I noticed this by running unit tests) and I would be unsure how to proceed.

Ah, I forgot about that. That's a fair point. I guess the simplest solution is to offer a flag to the end-user and to defer the responsibility of applying the check to them? That said, I think most of the time X will be .shape-able or len()-able so the default could be True. Or am I missing something?

@FBruzzesi
Copy link
Collaborator Author

I guess the simplest solution is to offer a flag to the end-user and to defer the responsibility of applying the check to them?

That's reasonable!

That said, I think most of the time X will be .shape-able or len()-able so the default could be True

With such default the scikit-learn tests would fail. I will add a big warning in the docstring 😁

@koaning
Copy link
Owner

koaning commented Oct 10, 2023

With such default the scikit-learn tests would fail. I will add a big warning in the docstring 😁

There are a lot of components in this library that fail the standard checks. That's ok. It's not ideal, but given how "non-standard" some of these ideas are ... it's fine to turn a test off sometimes. Adding a warning to the docstring is fair though!

The DecayEstimator will use exponential decay to weight the parameters.

w_{t-1} = decay * w_{t}
"""

def __init__(self, model, decay: float = 0.999, decay_func="exponential"):
def __init__(
self, model, decay: float = 0.999, decay_func="exponential", check_X_y=False
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit of a nitpick.

Suggested change
self, model, decay: float = 0.999, decay_func="exponential", check_X_y=False
self, model, decay: float = 0.999, decay_func="exponential", check_input=False

My reasoning here is that some estimators may only have a check_X. This is the case when the component is a scikit-learn transformer. But it'd be nice to have a consistent name for checking the internal thing. So maybe this namechange?

Once that change is in/discussed I think we can hit merge :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's very reasonable 👌 let me adjust that!

Copy link
Owner

@koaning koaning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one comment related to the name of a variable

Copy link
Owner

@koaning koaning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@koaning koaning merged commit f377999 into koaning:main Oct 13, 2023
7 checks passed
@FBruzzesi FBruzzesi deleted the patch/573-decay-estimator-check branch October 13, 2023 09:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Cannot use GroupedPredictor with DecayEstimator for fitting 1-feature dataframe
2 participants