Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix performance issue in 'allDependencies' #668

Merged
merged 1 commit into from
Nov 9, 2023
Merged

Conversation

thomashoneyman
Copy link
Member

In #667 I added an implementation to prune unused dependencies from legacy packages. Unfortunately, while the implementation was tested in the small, it was never tested on a package with a sizable graph. On larger packages the inefficiencies in the implementation would cause the process to run out of memory.

This implementation fixes the issue. We switch to ST so we can keep a mutable map around to track modules we've already visited and push all unvisited dependencies to an array for later processing. Once we go through the full pending array without refilling it we're done and can exit.

@thomashoneyman
Copy link
Member Author

Tested on this package run in the API:
purescript/registry#410 (comment)

Copy link
Member

@f-f f-f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good, but I wonder why we don't have similar issues in Spago? I run the graph code against our work codebase, which is much bigger than the stuff the Registry goes through.

@thomashoneyman
Copy link
Member Author

The major difference is that Spago's implementation is all folds, but I was using bind via the call to traverse. I think this is a reasonable place to use ST — at least the way it's used in the registry, which is a bit different from what Spago is doing — but I can explore a different approach if you aren't happy with the use of ST.

Comment on lines +97 to +106
ST.while (map (_ > 0) (Array.ST.length pending)) do
Array.ST.shift pending >>= case _ of
Nothing -> pure unit
Just (ModuleName mod) -> Object.ST.peek mod visited >>= case _ of
Nothing -> do
void $ Object.ST.poke mod unit visited
case Map.lookup (ModuleName mod) graph of
Nothing -> pure unit
Just { depends } -> void $ Array.ST.pushAll depends pending
Just _ -> pure unit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is equivalent to writing a recursive function with an explicit List stack and accumulator, FWIW.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to write it that way if you have an example somewhere. I'm curious what the generated code will look like vs. the ST version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have an opinion. One is obviously using mutation, so I would expect this to be significantly faster. I just don't use ST in core backend-optimizer because it's not very portable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case since we don't need portability in the registry and speed does matter (this is part of the pipeline) I would prefer to stay with ST. The generated code looks quite good.

@thomashoneyman
Copy link
Member Author

thomashoneyman commented Nov 9, 2023

@f-f could you take another look at this? it's still blocking publishing.

@f-f
Copy link
Member

f-f commented Nov 9, 2023

Ah yes sorry I had the tab open and got distracted with other things - my preference is that we keep the lib portable if we can. We explicitely depend on some node libs though, so I guess this is alright as well?

@thomashoneyman
Copy link
Member Author

Yeah, we already explicitly depend on some NPM libraries (ssh2), and Node builtins (the crypto module), so if we were to make the library portable we'd need to move that stuff out altogether and at that time could also replace the ST code. But we'd only need to do any of that if someone came along wanting to use the registry-lib with an alternate backend.

@thomashoneyman thomashoneyman merged commit f686858 into master Nov 9, 2023
15 checks passed
@thomashoneyman thomashoneyman deleted the trh/memory branch November 9, 2023 14:27
@f-f
Copy link
Member

f-f commented Nov 9, 2023

I assume we'll want to use the Scheme backend with Spago (which depends on the registry-lib) once that takes off, but we'll sorry about it then

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants