-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preparing v0.5.0 release #62
Conversation
Codecov Report
@@ Coverage Diff @@
## dev #62 +/- ##
=======================================
Coverage 99.73% 99.73%
=======================================
Files 18 18
Lines 1525 1528 +3
=======================================
+ Hits 1521 1524 +3
Misses 4 4
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok by me. Elkan algorithm is a bit different way of calculating kmeans (eg see https://github.com/jjcordano/elkans_kmeans) so a slight change is acceptable IMO. I would vote go ahead, and if an issue appears in the future, we deal with it.
The options for the kmeans algorithm in new scikit-learn version are |
Elkan vs Lloyd - the choice is about speed (elkan faster) and ram (elkan uses a bit more i think). i do not think we need to be perfectly identical, is ok to switch to elkan. But we still need to understand the amira integration fail, right? |
In the amira integration test, we try to build 3 PRGs:
if we compare before and after, both sites can spell CACTGGG... and AACTGGG... but only the second one can spell CGCTGGG... and I can find CGCTGGG... in the MSAs... so it seems the new version is actually doing the right thing. The second difference is the second version having two more SNPs in a region than the first one, representing 4 alleles instead of 1:
The 1st allele, common to both versions, is the most common in the MSA (genotypes I can continue this analysis, if you want, going to the 3rd, 4th, and so on differences on |
looks good to me, am ok to merge |
Main changes:
scikit-learn v0.24.2
now requires building, and I am not sure why... CI failed with python 3.8, 3.9, 3.10, 3.11. It was also failing for the same reason to build the docker image withpython:3.10-slim
, but somehow worked forpython:3.8-slim
. I tried several ways to fix this, but I didn't manage. This version ofscikit-learn
is not even that old, from April 2021. Anyway, due to these issues, I thought it would be a good time to upgrade it to the stable versions>=1.0.0
, so this is one of the main changes in this PR. We then upgraded numpy to"^1.24.4"
and scikit-learn to"^1.3.0"
. The issue is then that unit tests were getting stuck, and we could not run them. Fixing this required to upgradepytest
and run it with--forked
, but then it was too slow to run the tests. Thus I've addedpytest-xdist = "^3.3.1"
to the dependencies, and we now runpytest
also with-n 4
(i.e. 4 threads) so that is fast again;force_overwrite
tests, 3 tests were redundant;elkan
as thekmeans
algorithm in the updatedscikit-learn
version as tests were failing with the other algorithm. Weirdly theelkan kmeans
changes the PRG/result for a single test case, the amira one. So I am a bit reluctant and wondering if this should be a prerelease until we test it more on real data...This release properly handles
N
s in the MSA, and in the denovo sequences. If fixes breaking issues reported by @Danderson123 inamira
.