Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rust: Replace keep_fingerprints logic with unconditional cargo clean #63

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ijc
Copy link

@ijc ijc commented Aug 16, 2024

Due to Earthly's layer cache code added with COPY (even with --keep-ts) can
end up with timestamps (mtime) corresponding to the point of creation of the
cache entry, not the current time.

However on a following build the target mount cache may contain builds from
other branches, with different code for those dependencies, which have a newer
mtime. In this case cargo will think it can use the cached dependency
instead of rebuilding (because the code appears older than the cached entry
under target).

Avoid this by using cargo clean to remove the build artifacts for any local
crate. This should become unnecessary with rust-lang/cargo#14136

This replaces the old behaviour of removing the fingerprints directory. Using
cargo clean uses a proper cargo API rather than relying on implementation
details like where the fingerprints live and what the consequence removing them
is. It may also keep the cached data smaller since it removes the build
artifacts which will likely never be reused due to the lack of fingerprint.

Note that the previous fingerprint cleaning was subject to a race where a
different parallel build could reintroduce some fingerprints between DO +REMOVE_SOURCE_FINGERPRINTS and the RUN ... cargo $args. For that reason the
calls to cargo clean here are made within the same RUN command so that the
target cache remains locked.

By switching to cargo metadata the requirement for tomljson is removed.


This is an alternative to #62.

I also included a couple of QOL changes which I think are useful.

@ijc ijc requested a review from a team as a code owner August 16, 2024 09:41
@ijc
Copy link
Author

ijc commented Aug 16, 2024

The downside of this change is that local crates must always be fully rebuilt. ie. caching is now only effective for 3rd party dependencies.

@idelvall
Copy link
Member

Interesting idea Ian. This is related to https://github.com/ijc/earthly-lib/tree/rust-avoid-caching-local-path-dependencies/rust#keep_fingerprints-false.
Wondering if we should always remove the package fingerprints instead.
What do you think?

@ijc
Copy link
Author

ijc commented Aug 16, 2024

Hrm, yes, there is some overlap for sure.

My take would be that it is better to use formal APIs (cargo clean) rather than relying on implementation details like where the fingerprints live and what the consequence removing them is.

I would expect this also produces a space saving since I think removing the fingerprint doesn't remove the actual build artifacts, they will never be reused due to the lack of fingerprint so it is wasteful to leave them in the cache? Whereas clean will remove the actual binary blobs too.

rust/Earthfile Outdated
cargo $args; \
if [ "$EARTHLY_DO_NOT_CACHE_PATH_BASED_DEPENDENCIES" = "true" ]; then \
cargo metadata --format-version=1 --no-deps | jq -r '.packages[] | select(.id | startswith("path+file://")) | .name' | xargs -I{} cargo clean -p {}; \
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too complicated and doesn't match "path" deps like it claims, it's just matching all local crates in an complicated way (since it is matching the crate location not something to do with the dep tree). It could be replaced with .packages[].name which is exactly the same due to --no-deps.

I think the path based dependencies thing is a red-herring, since we'd want to not cache any local crate since even the crate being built could suffer the false target cache hit problem, not only the things it depends on.

If the principal here is one we want to go fortward with then I'll update to simplify and rename/reword things.

@ijc ijc force-pushed the rust-avoid-caching-local-path-dependencies branch from 919b55d to 8362c22 Compare August 16, 2024 10:23
@idelvall
Copy link
Member

idelvall commented Aug 19, 2024

My take would be that it is better to use formal APIs (cargo clean) rather than relying on implementation details like where the fingerprints live and what the consequence removing them is.

Agree

I would expect this also produces a space saving since I think removing the fingerprint doesn't remove the actual build artifacts, they will never be reused due to the lack of fingerprint so it is wasteful to leave them in the cache? Whereas clean will remove the actual binary blobs too.

clean sounds like the way to go, for sure

I think the path based dependencies thing is a red-herring, since we'd want to not cache any local crate since even the crate being built could suffer the false target cache hit problem, not only the things it depends on.

What if we remove the fingerprints option and we always avoid caching path-based deps?

@ijc
Copy link
Author

ijc commented Aug 19, 2024

What if we remove the fingerprints option and we always avoid caching path-based deps?

I think that would be fine.

I'd probably leave an option to disable the clean still, but have it do the clean by default.

I'm not sure, but I think with this the --keep-ts stuff may no longer be needed, unless there is some other reason why it would be useful.

@idelvall
Copy link
Member

I'm not sure, but I think with this the --keep-ts stuff may no longer be needed, unless there is some other reason why it would be useful.

I don't see a reason anymore neither

@ijc
Copy link
Author

ijc commented Aug 19, 2024

I'd probably leave an option to disable the clean still, but have it do the clean by default.

Actually, in trying to write a doc update as part of this I realised I couldn't really put into succinct words the circumstances where this option would be useful -- it's a pretty subtle case and my suspicion is that any attempt to use it would more than likely be wrong. Maybe I'll just make it unconditional.

This avoids producing:
```
+lint | no files found within ./target matching the provided output regexp
+lint | find: '/tmp/earthly/lib/rust': No such file or directory
```
when there is no output, because `copy-output.sh` only creates that path `if [
-n \"\$1\" ]`.

Apply the same condition to the `rename-output.sh` end of things.
The logs otherwise just contain:
```
--> RUN set -e; cargo $args; cargo sweep -r -t $EARTHLY_SWEEP_DAYS; cargo sweep -r -i; $EARTHLY_FUNCTIONS_HOME/copy-output.sh "$output";
```

which doesn't show what is actually being run.
Due to Earthly's layer cache code added with `COPY` (even with `--keep-ts`) can
end up with timestamps (`mtime`) corresponding to the point of creation of the
cache entry, not the current time.

However on a following build the `target` mount cache may contain builds from
other branches, with different code for those dependencies, which have a newer
`mtime`. In this case `cargo` will think it can use the cached dependency
instead of rebuilding (because the code appears older than the cached entry
under `target`).

Avoid this by using `cargo clean` to remove the build artifacts for any local
crate. This should become unnecessary with rust-lang/cargo#14136

This replaces the old behaviour of removing the fingerprints directory. Using
`cargo clean` uses a proper cargo API rather than relying on implementation
details like where the fingerprints live and what the consequence removing them
is. It may also keep the cached data smaller since it removes the build
artifacts which will likely never be reused due to the lack of fingerprint.

Note that the previous fingerprint cleaning was subject to a race where a
different parallel build could reintroduce some fingerprints between `DO
+REMOVE_SOURCE_FINGERPRINTS` and the `RUN ... cargo $args`. For that reason the
calls to `cargo clean` here are made within the same `RUN` command so that the
target cache remains locked.

By switching to `cargo metadata` the requirement for `tomljson` is removed.
@xv-ian-c xv-ian-c force-pushed the rust-avoid-caching-local-path-dependencies branch from 8362c22 to d24d25c Compare August 19, 2024 12:58
@ijc ijc changed the title Rust avoid caching local path dependencies rust: Replace keep_fingerprints logic with unconditional cargo clean Aug 19, 2024
echo "+CARGO: copying output"; \
$EARTHLY_FUNCTIONS_HOME/copy-output.sh "$output"; \
echo "+CARGO: removing local crates from target cache"; \
cargo metadata --format-version=1 --no-deps | /tmp/jq -r '.packages[].name' | xargs -I{} cargo clean -p {};
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The call to cargo clean needs to match the profile used by cargo $args e.g. needs to include --release or --profile=foo as needed. Or I need to figure out a way to say "all profiles".

I'll have a think.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find a way to get a list of interesting profiles to clean. Other than "ask the user to supply it" or something ugly involving inspecting $args I can't currently see a way to make this approach work.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is further complicated by tools such as cargo llvm-cov which overrides target dir (to target/llvm-cov-target/).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants