Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of the canopy model #288

Draft
wants to merge 11 commits into
base: develop
Choose a base branch
from

Conversation

davidorme
Copy link
Collaborator

@davidorme davidorme commented Sep 18, 2024

Description

This PR is to review, adapt and merge in @AmyOctoCat's code on the canopy object from #231. The main areas are:

  • The t_model_functions and canopy_functions modules have now settled down to which functions belong where.

  • Thecanopy_functions module has now been extended to include the canopy vertical structure functions from WIP - 230 create a draft canopy class #231, including the relative canopy radius, stem crown area and stem leaf area at given heights, along with a solver function to find the height at which a canopy layer closes. This includes:

    • Rethinking the workflow for functions to avoid repetitive calculation.
    • Removing Community object as an argument and replacing it with explicit arguments for the community properties for that function. This makes the core functions more flexible.
    • Adding a bunch of validation for input shapes of canopy function arguments, but making it optional for a speed-up within solvers and classes where the shapes can be assumed correct.
  • The Canopy object has been reworked to use this new functionality to get to the same end point as in WIP - 230 create a draft canopy class #231.

I've also added unit tests for the canopy functions and a really simple test of the Canopy object.

Things to do later:

  • Update the docstrings with better docs on the equations and then update the canopy.md document to use this code.
  • Maybe drop pandas?
  • More functionality needed in the Canopy object to use it to generate absorbed radiation per stem per layer and hence generate productivity estimates.

Fixes #286 (issue)

Type of change

  • New feature (non-breaking change which adds functionality)
  • Optimization (back-end change that speeds up the code)
  • Bug fix (non-breaking change which fixes an issue)

Key checklist

  • Make sure you've run the pre-commit checks: $ pre-commit run -a
  • All tests pass: $ poetry run pytest

Further checks

  • Code is commented, particularly in hard-to-understand areas
  • Tests added that prove fix is effective or that feature works

@davidorme davidorme linked an issue Sep 18, 2024 that may be closed by this pull request
@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 98.07692% with 2 lines in your changes missing coverage. Please review.

Project coverage is 95.35%. Comparing base (1f315ba) to head (d4f2426).
Report is 85 commits behind head on develop.

Files with missing lines Patch % Lines
pyrealm/demography/canopy.py 96.87% 1 Missing ⚠️
pyrealm/demography/canopy_functions.py 98.27% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #288      +/-   ##
===========================================
+ Coverage    95.29%   95.35%   +0.06%     
===========================================
  Files           28       34       +6     
  Lines         1720     2176     +456     
===========================================
+ Hits          1639     2075     +436     
- Misses          81      101      +20     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

for layer in np.arange(self.n_layers - 1):
target_area = (layer + 1) * community.cell_area * (1 - canopy_gap_fraction)

# TODO - the solution here is predictably closer to the upper bracket, might
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of putting a TODO in the code, creating an issue has a higher likelihood of this being seen and solved.

validate: Boolean flag to suppress argument validation.
"""

# TODO - could this merge with the stem crown area function? A lot of overlap, so
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, should be an issue

Comment on lines +387 to +411
# def calculate_total_canopy_A_cp(z: float, f_g: float, community: Community) -> float:
# """Calculate total leaf area at a given height.

# :param f_g:
# :param community:
# :param z: Height above ground.
# :return: Total leaf area in the canopy at a given height.
# """
# A_cp_for_individuals = calculate_projected_leaf_area_for_individuals(
# z, f_g, community
# )

# A_cp_for_cohorts = A_cp_for_individuals * community.cohort_number_of_individuals

# return A_cp_for_cohorts.sum()


# def calculate_gpp(cell_ppfd: NDArray, lue: NDArray) -> float:
# """Estimate the gross primary productivity.

# Not sure where to place this - need an array of LUE that matches to the

# """

# return 100
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these be removed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're placeholders for the next steps in this milestone, so easier for me to leave them there for now.



def test_calculate_heights():
"""Tests happy path for calculation of heights of tree from diameter."""

from pyrealm.demography.t_model_functions import calculate_heights
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I don't know how I feel about moving imports into functions. I prefer them to be on the top of the file, for better overview. Is there a difference in performance if you draw the imports in here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(same below)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That style was advice from the RSE team at Imperial. The argument is that having the imports for the package being tested inside the functions isolates the actual importatation as part of the test. So, if stuff goes wrong with an import, a specific related test fails, rather than a whole module of tests (or the whole test suite).

I definitely see what you mean about having an overview of which code is getting tested but I buy their argument and we've seen it be useful in practice.

Copy link
Collaborator

@omarjamil omarjamil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this is a well-structured and thoroughly documented piece of scientific code. The use of numpy, type hinting, and input validation contributes to its robustness. My suggestions are for further improvement of the code, but not critical.

self,
community: Community,
canopy_gap_fraction: float,
layer_tolerance: float = 0.001,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps worth outlining in the docstring this default value.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've actually got an issue about this (#270) - @j-emberton pointed out that the documentation of defaults and optional values isn't consistent throughout the codebase.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that this stops me fixing this specific instance 😄

stem_height=community.cohort_data["stem_height"].to_numpy(),
m=community.cohort_data["m"].to_numpy(),
n=community.cohort_data["n"].to_numpy(),
validate=False,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate is set to False in these function calls. Should there be an option to set this to true via this class?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should doc this better. The validation is intended to help people use the standalone functions correctly, but here the Canopy.__init__ should guarantee the correct inputs, and so the validation is turned off to improve run time by avoiding repeated validation within the root solver call.

If it's broken, that's a developer issue with the __init__ code, rather than something where a user could usefully turn validation back on.

self.stem_relative_radius: NDArray[np.float32] = (
calculate_relative_canopy_radius_at_z(
z=self.layer_heights,
stem_height=community.cohort_data["stem_height"].to_numpy(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same arrays are being converted to numpy in different calls. From performance point of view it might worth doing a conversion once and using those arrays.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. This is the issue we discussed yesterday about using pandas.DataFrame for the cohort_data attribute of Community. I think the answer is to change that attribute - we're never making use of the pandas functionality outside of creating the cohort_data, so it would be much cleaner as a dictionary of np.arrays.



def solve_community_projected_canopy_area(
z: float,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes a really long argument list can be a bit cumbersome so worth considering dataclass or namedtuple for such cases. However, that comes with different overheads so a bit of a style choice.

Copy link
Collaborator Author

@davidorme davidorme Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This specific module has a bunch of extremely similar but not identical signatures. We could bundle everything up in one dataclass (and indeed @AmyOctoCat had these using Community for exactly that reason), but I'd like to retain the flexibility of using named arguments directly.

One reason for that is that I want to build some user facing helper classes to generate canopy data outside of a Community object, so it's probably easier to keep them like this, but I will be looking at that next.

raise ValueError("Invalid shape for the z value.")


def calculate_relative_canopy_radius_at_z(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the inputs array can be reshaped to be broadcastable, then using numpy functions instead of python built-ins e.g. np.power instead of **can help with vectorization and improve performance.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to think about (and document) what shapes are sensible here. I believe that M**n is faster if n is a scalar but np.power(M, n) is faster when n is an array? Could get either in using this function.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think np.power will be faster if you are applying the operation to an array even if the exponent is a scaler. And you can further performance gains if you can apply np.vectorize to functions. Though I feel like I need to back up this with some testing!

@omarjamil
Copy link
Collaborator

I should add that I am happy to defer to James and Marion in terms of what they consider to important changes to make as they have better code and project knowledge than me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

Implementation of the Canopy model
4 participants