Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: PCA and UMAP OutlierDetection docstring examples #654

Merged
merged 15 commits into from
Jul 18, 2024

Conversation

anopsy
Copy link
Contributor

@anopsy anopsy commented Apr 27, 2024

Docs

Added usage examples to:
decomposition.umap_reconstruction.UMAPOutlierDetection
decomposition.pca_reconstruction.PCAOutlierDetection

Fixes #652 and #653

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • [ x] New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

@anopsy
Copy link
Contributor Author

anopsy commented Apr 27, 2024

image
Wondering what happend here

Copy link
Collaborator

@FBruzzesi FBruzzesi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering, how did you come up with these array values?
Wouldn't make sense to have some randomly generated arrays (with fixed seed) and then tinker around some value to make them outliers? Point is: currently I cannot spot outliers just by looking at the arrays and everything seems a bit magical.

On the Github action failing: Getting a sense that macos-latest runner is on macos 14, and it is breaking most of github actions.
I can't deep dive today, but will try in the coming week.

Anyway, certainly unrelated from the docstring changes

@anopsy
Copy link
Contributor Author

anopsy commented Apr 28, 2024

Oh gee, there are even more failed tests now.😮

About examples
I copied the approach with the arrays from scikit-learn docs on PCA (check ss)
image

and I tinkered around with the values to check how the detectors work for different values in the arrays, n_components and thresholds. I tried to add values that would be clearly an outlier because of the quantiles for example [-100, 99, -99] but the PCA/UMAPOutlierDetectors "classified" them as inliers. I also saw in User Guide that the values classified by PCA/UMAPOutlierDetectors as outliers, don't look like quantiles based outliers -so you can't spot them just by looking. If that makes any sense.

image

@FBruzzesi
Copy link
Collaborator

Yep the doc page is using iris dataset, which I would not expect to have any particular outlier.
We have one obvious example in the test suite.

@koaning thoughts on this? In my opinion, it could be worth it to change dataset in the user guide as well. It seems a bit confusing

@anopsy
Copy link
Contributor Author

anopsy commented Apr 30, 2024

Sure, I'll do it the way it's done in the test suite.

@anopsy
Copy link
Contributor Author

anopsy commented Jul 17, 2024

Yep the doc page is using iris dataset, which I would not expect to have any particular outlier. We have one obvious example in the test suite.

Hey Francesco I'm back at it. I was having some thoughts about the examples for PCA and UMAPs, what bugs me is that if I use the obvious example which is a 10-d array, how can I show the resulting outliers? Should I print the 10d output?
I mean that the numbers in the arrays I used may seem arbitrary, but at least we can show the outliers in a simple one line output.
Let me know wdyt

@FBruzzesi FBruzzesi changed the title Added usage examples docs: PCA and UMAP OutlierDetection docstring examples Jul 18, 2024
Copy link
Collaborator

@FBruzzesi FBruzzesi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's ship it ✨

@FBruzzesi FBruzzesi merged commit a7c498c into koaning:main Jul 18, 2024
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

decomposition.pca_reconstruction.PCAOutlierDetection
2 participants