Filter update #67

frederik-sandfort1 · 2024-08-20T13:59:01Z

Closes #66 and #61 .

Additionally implements a (jsonable) DescriptorsFilter.

Also did some code cleaning and removed unnecessary lines of code without a breaking change

c-w-feldmann

This was my first round reading. Will check again tomorrow in more detail.

molpipeline/abstract_pipeline_elements/mol2mol/filter.py

molpipeline/mol2mol/filter.py

c-w-feldmann · 2024-08-21T17:45:16Z

molpipeline/mol2mol/filter.py

+            FilterCatalog.SmartsMatcher(smarts, smarts)
+            for i, smarts in enumerate(self.patterns)
+        ]
+        rdkit_filter = FilterCatalog.FilterCatalog()


We have to check if this works in parallel.

added a test, but not sure whether this is what you want. please have a look

Ah sorry this was more a reminder for myself. Some rdkit objects cannot be pickled properly. I'll have look at your test,

@c-w-feldmann did you check this one again?

tests/test_elements/test_mol2mol/test_mol2mol_filter.py

molpipeline/mol2mol/filter.py

frederik-sandfort1 · 2024-08-22T15:20:24Z

@c-w-feldmann @JochenSiegWork

I have one mypy issue, which I do not understand.

Also three questions:

should we combine smarts and smiles filter within a single class? option usesmiles?
should we check the input patterns for valid smarts/smiles?
should we apply the same logic to ElementFilter?

c-w-feldmann

I think my head is exploding, but beside the small additions in the doc string, nothing I can complain about. @JochenSiegWork , your turn.

molpipeline/abstract_pipeline_elements/mol2mol/filter.py

c-w-feldmann · 2024-09-10T14:01:21Z

molpipeline/abstract_pipeline_elements/mol2mol/filter.py

+
+    @property
+    def filter_elements(self) -> Mapping[str, tuple[Optional[int], Optional[int]]]:
+        """Get filter elements as dict."""


Maybe add note that this was implemented to keep backwards compatibility.

I am not sure that this is true. I implemented this because I liked to name the descriptors "descriptors" in the init for the DescriptorFilter and the patterns "patterns" in the PatternsFilters. But I could easily get rid of this. As the classes were not there yet, there is no real backward compatibility. What is your opinion? Is the naming not so important? Should we always go with filter_elements? I would remove quite some code and move get params and set params to the base method... @c-w-feldmann @JochenSiegWork your opinion?

I like that you can access the patterns, descriptor names, etc. I also think "filter_elements" is a suitable name for that property.

I discussed with @JochenSiegWork that it might make sense to implement this quite generic with just filter_elements, and include a good descriptoin in the docstring

I implemented this generically now. Please check

JochenSiegWork

almost there :)

molpipeline/abstract_pipeline_elements/mol2mol/filter.py

molpipeline/utils/value_conversions.py

JochenSiegWork · 2024-09-11T09:00:52Z

molpipeline/mol2mol/filter.py

-        if not unique_elements.issubset(self.allowed_element_numbers):
-            forbidden_elements = unique_elements - self.allowed_element_numbers
+        to_process_value = (
+            Chem.AddHs(value) if 1 in self.allowed_element_numbers else value


Implicitly modifying the molecule does not seem right. I understand that this can make sense when you are interested in exact element counts in the molecules. However, the filter should just filter molecules and not implicitly modify them before filtering. I think the most consistent way would be to let the user explicitly add hydrogens before the filter, which would result in an additional pipeline element. However, I think a compromise would also be fine by adding a flag add_hs to the ElementFilter's constructor to let the user choose.

@c-w-feldmann what do you think?

Here I disagree... If the Use adds the element number 1 in allowed_elements, I am pretty sure he would like to get the Hydrogens counted... The returned molecules do NOT have the added hydrogens, because this creates a new mol object

You're absolutely right, and I understand your point. I have 3 aspects I am unhappy with:

It's not explicit but implicit, and I advocate the "Explicit is Better Than Implicit" principle. However, this might also be solved with good, explicit docstrings that you can add.

The molecules might already have been prepared with Hydrogens added. Calling Chem.AddHs again adds additional computation for every molecule.

I am not sure about our use cases, but for docking, for example, the users typically prepare the protonation states of their molecules with some other tool beforehand. In which case readding hydrogen might mess with the prepared molecules. Is something similar also possible in our use cases?

Good point, could we check, whether hydrogens are there already? Or should we alternatively log a warning here if hydrogens are requested but the count is 0 and tell people to e.g. use an AddHs Pipeline module? (not sure whether this exists?)

the three of us should talk about it together. This is a more conceptual decision we need to make.

molpipeline/mol2mol/filter.py

tests/test_elements/test_mol2mol/test_mol2mol_filter.py

frederik-sandfort1

Resolved most, and some comments. Please have a look

molpipeline/abstract_pipeline_elements/mol2mol/filter.py

frederik-sandfort1 · 2024-09-11T12:31:33Z

molpipeline/abstract_pipeline_elements/mol2mol/filter.py

+
+    @property
+    def filter_elements(self) -> Mapping[str, tuple[Optional[int], Optional[int]]]:
+        """Get filter elements as dict."""


I am not sure that this is true. I implemented this because I liked to name the descriptors "descriptors" in the init for the DescriptorFilter and the patterns "patterns" in the PatternsFilters. But I could easily get rid of this. As the classes were not there yet, there is no real backward compatibility. What is your opinion? Is the naming not so important? Should we always go with filter_elements? I would remove quite some code and move get params and set params to the base method... @c-w-feldmann @JochenSiegWork your opinion?

frederik-sandfort1 · 2024-09-11T12:59:00Z

molpipeline/mol2mol/filter.py

-        if not unique_elements.issubset(self.allowed_element_numbers):
-            forbidden_elements = unique_elements - self.allowed_element_numbers
+        to_process_value = (
+            Chem.AddHs(value) if 1 in self.allowed_element_numbers else value


Here I disagree... If the Use adds the element number 1 in allowed_elements, I am pretty sure he would like to get the Hydrogens counted... The returned molecules do NOT have the added hydrogens, because this creates a new mol object

molpipeline/utils/value_conversions.py

tests/test_elements/test_mol2mol/test_mol2mol_filter.py

frederik-sandfort1 · 2024-09-12T05:36:16Z

@c-w-feldmann @JochenSiegWork I added an additional filter (ComplexFilter) which can be initiatied with multiple moltomol elements and uses the keep matches logic

I also had a look on auto2mol - I was a bit puzzled about serializability (there you also just set the elements as attributes, no get_params / set_params functionality given. Please have a look on the implementation especially regarding serializability

JochenSiegWork

just two small comments

molpipeline/utils/value_conversions.py

molpipeline/abstract_pipeline_elements/mol2mol/filter.py

molpipeline/utils/value_conversions.py

JochenSiegWork · 2024-09-12T08:01:31Z

molpipeline/abstract_pipeline_elements/mol2mol/filter.py

+
+    @property
+    def filter_elements(self) -> Mapping[str, tuple[Optional[int], Optional[int]]]:
+        """Get filter elements as dict."""


I like that you can access the patterns, descriptor names, etc. I also think "filter_elements" is a suitable name for that property.

tests/test_elements/test_mol2mol/test_mol2mol_filter.py

JochenSiegWork · 2024-09-12T10:25:37Z

@c-w-feldmann @JochenSiegWork

I have one mypy issue, which I do not understand.

Also three questions:

should we combine smarts and smiles filter within a single class? option usesmiles?

Strong yes. I think we should combine SMARTS and SMILES filter within a single class, because every SMILES is a valid SMARTS. Effectively a list of SMILES is a list of SMARTS.

should we check the input patterns for valid smarts/smiles?

No. But since you pre-compute the molecules for the SMARTS/SMILES now this is done automatically during the construction of the filter. So you already included this feature now.

should we apply the same logic to ElementFilter?

We talked about this. We can in the future because it should fit but we don't have any real use case right now. So postpone.

JochenSiegWork · 2024-09-13T08:23:38Z

molpipeline/abstract_pipeline_elements/mol2mol/filter.py

+        params["mode"] = self.mode
+        if deep:
+            params["filter_elements"] = {
+                element: (count_tuple[0], count_tuple[1])


@c-w-feldmann You are more familiar with sklearns conventions in set_params/get_params. Is this already considered a "deep" copy? As I understand the code, the element can also be a full-fledged object, e.g. in the ComplexFilter. The current code here copies just the reference to element not the object itself.

JochenSiegWork · 2024-09-13T08:27:11Z

molpipeline/mol2mol/filter.py

-        if not unique_elements.issubset(self.allowed_element_numbers):
-            forbidden_elements = unique_elements - self.allowed_element_numbers
+        to_process_value = (
+            Chem.AddHs(value) if 1 in self.allowed_element_numbers else value


the three of us should talk about it together. This is a more conceptual decision we need to make.

JochenSiegWork · 2024-09-13T08:30:22Z

molpipeline/mol2mol/filter.py

-    def __int__(
+class SmilesFilter(_BasePatternsFilter):
+    """Filter to keep or remove molecules based on SMILES patterns.
+


If we keep the SmilesFilter, then add a comment in the docstring here: "In contrast to the SMARTSFilter, which also can match SMILES, the SmilesFilter checks kekulized bonds for aromaticity and then sets it to aromatic while the SmartsFilter detects alternating single and double bonds."

frederik-sandfort1 added 4 commits August 19, 2024 17:19

remove unnecessary inits and refactor

fa27bfd

include smarts filter, smiles filter, descriptors filter

b706268

Fix wrong typing that caused thousands of type ignores

476d65a

linting and fix element number test

f14b71a

frederik-sandfort1 self-assigned this Aug 20, 2024

frederik-sandfort1 requested review from c-w-feldmann and JochenSiegWork August 20, 2024 13:59

frederik-sandfort1 added 2 commits August 21, 2024 15:45

Merge branch 'main' into filter_update

e3f5d2d

reset name typing

c352144

c-w-feldmann reviewed Aug 21, 2024

View reviewed changes

frederik-sandfort1 added 4 commits August 22, 2024 11:56

Christians first review

5c95f81

more changes

16088db

linting

b2ca26d

pylint

81ffb7c

c-w-feldmann reviewed Aug 22, 2024

View reviewed changes

molpipeline/mol2mol/filter.py Outdated Show resolved Hide resolved

c-w-feldmann self-requested a review August 22, 2024 11:43

rewrite filter logic (#71)

9fed198

frederik-sandfort1 marked this pull request as draft August 22, 2024 12:55

Combine filters with one base logic

f49cb70

frederik-sandfort1 marked this pull request as ready for review August 22, 2024 15:19

This comment was marked as resolved.

Sign in to view

c-w-feldmann and others added 3 commits August 22, 2024 19:34

change dict to Mapping

91feed1

Merge branch 'main' into filter_update

1d70f17

isort

93e6183

c-w-feldmann requested changes Sep 10, 2024

View reviewed changes

JochenSiegWork requested changes Sep 11, 2024

View reviewed changes

Include comments

cd18310

frederik-sandfort1 commented Sep 12, 2024

View reviewed changes

linting

c0427ab

linting and ComplexFilter

cfdfd83

JochenSiegWork requested changes Sep 12, 2024

View reviewed changes

frederik-sandfort1 added 2 commits September 12, 2024 16:18

typing, tests, complex filter naming

b843657

finalize filter refactoring

a93344c

JochenSiegWork requested changes Sep 13, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filter update #67

Filter update #67

frederik-sandfort1 commented Aug 20, 2024 •

edited

Loading

c-w-feldmann left a comment

c-w-feldmann Aug 21, 2024

frederik-sandfort1 Aug 22, 2024

c-w-feldmann Aug 22, 2024

frederik-sandfort1 Sep 11, 2024

frederik-sandfort1 commented Aug 22, 2024

This comment was marked as resolved.

c-w-feldmann left a comment

c-w-feldmann Sep 10, 2024

frederik-sandfort1 Sep 11, 2024

JochenSiegWork Sep 12, 2024 •

edited

Loading

frederik-sandfort1 Sep 12, 2024

frederik-sandfort1 Sep 12, 2024

JochenSiegWork left a comment

JochenSiegWork Sep 11, 2024

frederik-sandfort1 Sep 11, 2024

JochenSiegWork Sep 12, 2024

frederik-sandfort1 Sep 12, 2024

JochenSiegWork Sep 13, 2024

frederik-sandfort1 left a comment

frederik-sandfort1 Sep 11, 2024

frederik-sandfort1 Sep 11, 2024

frederik-sandfort1 commented Sep 12, 2024

JochenSiegWork left a comment

JochenSiegWork Sep 12, 2024 •

edited

Loading

JochenSiegWork commented Sep 12, 2024 •

edited

Loading

JochenSiegWork Sep 13, 2024

JochenSiegWork Sep 13, 2024

JochenSiegWork Sep 13, 2024 •

edited

Loading

Filter update #67

Are you sure you want to change the base?

Filter update #67

Conversation

frederik-sandfort1 commented Aug 20, 2024 • edited Loading

c-w-feldmann left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

frederik-sandfort1 commented Aug 22, 2024

This comment was marked as resolved.

c-w-feldmann left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JochenSiegWork Sep 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JochenSiegWork left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

frederik-sandfort1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

frederik-sandfort1 commented Sep 12, 2024

JochenSiegWork left a comment

Choose a reason for hiding this comment

JochenSiegWork Sep 12, 2024 • edited Loading

Choose a reason for hiding this comment

JochenSiegWork commented Sep 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JochenSiegWork Sep 13, 2024 • edited Loading

Choose a reason for hiding this comment

frederik-sandfort1 commented Aug 20, 2024 •

edited

Loading

JochenSiegWork Sep 12, 2024 •

edited

Loading

JochenSiegWork Sep 12, 2024 •

edited

Loading

JochenSiegWork commented Sep 12, 2024 •

edited

Loading

JochenSiegWork Sep 13, 2024 •

edited

Loading