Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about evaluation input format #27

Open
yellow-binary-tree opened this issue May 11, 2024 · 1 comment
Open

Question about evaluation input format #27

yellow-binary-tree opened this issue May 11, 2024 · 1 comment

Comments

@yellow-binary-tree
Copy link

In SEED-Bench-2/model/InternLM_Xcomposer_VL_interface.py, for InternLM_Xcomposer_VL model all choices are added to model input and choice letters ("A.", "B.", "C.", "D.") are used as labels to calculate loss. While for all other models (instructblip, qwen_vl, llava_v2), in their interface code we can see only the question is added to model input, and the text of each choice is used as labels to calculate loss independently.

I wonder why do you use different input format for different models? Will this have large impact on accuracy?

@Bohao-Lee
Copy link
Collaborator

Thank you for your attention on our work. For qwen_vl, their office code uses ppl for A/B/C/D method to evaluate. For llava 1.5, their code uses generate method to evaluate. So I have modified corresponding code for ppl evaluation method. But for InternLM_Xcomposer_VL, their evaluation code is ppl evaluation method. Hence, I just provide InternLM_Xcomposer_VL evaluation code based on their code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants