Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capturing the command-line arguments used on each translation unit #34

Open
woodruffw opened this issue Apr 8, 2020 · 9 comments
Open

Comments

@woodruffw
Copy link
Collaborator

Hi there,

Does gllvm support capturing the command-line arguments (not underlying driver arguments) used on each translation unit?

For example, if I had the following runs:

gclang -o foo.o -flag1 -flag2 -flag3 foo.c
gclang -o bar.o -flag1 -flag2 -flag4 bar.c
gclang -o bar -lwhatever foo.o bar.o

I'd like the following mapping stored in a section stored somewhere in bar:

foo.o = clang -o foo.o -flag1 -flag2 -flag3 foo.c
bar.o = clang -o bar.o -flag1 -flag2 -flag4 bar.c
bar = clang -o bar -lwhatever foo.o bar.o

I'm aware that I can approximate this at the clang/LLVM level with -grecord-gcc-switches or -frecord-command-line, but was curious if I could do the same at the gllvm level.

This is something I could try to contribute, if there's interest.

@ianamason
Copy link
Member

So you want to create another section that contains the commands used to generate the compilation unit?

I guess that is possible, but isn't in the code. It shouldn't be too hard, after all that is how the
bitcode is "recorded".

There has been occasional mumbling about needing something like this.

@woodruffw
Copy link
Collaborator Author

So you want to create another section that contains the commands used to generate the compilation unit?

Yep, exactly. And yeah, I figure I could reuse the current section techniques/code to stash it.

I'll look into it a bit.

@ianamason
Copy link
Member

And I guess an additional switch to get-bc that dumps the information out to a file,
like the manifest switch does.

Sounds reasonable. I remember @HassenSaidi complained that we lost the necessary information to relink the bitcode.

@HassenSaidi
Copy link

@ianamason @woodruffw : I did run into this issue in the past. I had more complex scenarios involving changes to the .o files between their creation and the linking. Imagine for instance changing the name of a symbol between generating the .o file and linking it. So to do this properly, the section containing the commands should be generated by tracking all file changes during the build process.

@ianamason
Copy link
Member

So you gave up on this and created that: https://github.com/trailofbits/blight.
I'll leave this here, so others can follow.

@ianamason
Copy link
Member

@woodruffw I took a look at blight, but I haven't tried it out. Is there any other black magic other than creating a directory
containing your wrappers and sticking that directory at the front of the PATH? What happens when build systems do bad things like call hard coded paths to tools? Like /usr/bat/shit/crazy/clang?

@woodruffw
Copy link
Collaborator Author

@ianamason we use two techniques:

  1. Most build systems respect CC, CXX, etc., so we simply point those to blight-cc, blight-c++, etc.
  2. If that doesn't work (e.g. if a build hardcodes clang++ instead of using $(CC)), we do the $PATH trick you mentioned. In that case, /tmp/.../clang++ becomes a shim around blight-c++.

That leaves the worst case, i.e. a fully qualified path like /usr/bat/shit/crazy/clang. We don't handle those at all at the moment, since we (experimentally) haven't run into too many real world builds that actually do that. However, we could in theory handle those by tracing the child process's exec* family calls and looking for things that look like build tools. I believe that's what tools like bear do.

@ianamason
Copy link
Member

Thanks! I thought I saw a discussion that cmake doesn't respect AR, is that right?

@woodruffw
Copy link
Collaborator Author

That sounds right, although I'm not 100% sure -- I know they have their own CMAKE_AR variable instead, but I'm not sure if that's the sole variable or whether it just takes precedence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants