Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-SWE-Bench Integration #7644

Open
neubig opened this issue Apr 1, 2025 · 2 comments
Open

Multi-SWE-Bench Integration #7644

neubig opened this issue Apr 1, 2025 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@neubig
Copy link
Contributor

neubig commented Apr 1, 2025

What problem or use case are you trying to solve?

It would be good to know OpenHands's performance on Java and other major languages.

Describe the UX of the solution you'd like

Multi-SWE-Bench makes this possible: https://multi-swe-bench.github.io/#/
We should incorporate it into our evaluation harness.

@neubig neubig added the enhancement New feature or request label Apr 1, 2025
@JohnsterID
Copy link

JohnsterID commented Apr 1, 2025

I see ByteDance on the charts using OpenHands: https://github.com/multi-swe-bench/MopenHands

@neubig
Copy link
Contributor Author

neubig commented Apr 2, 2025

Yeah, so I expect that OpenHands will do a reasonable job at this, but we need to have benchmarks in our eval harness in order to iterate on them as OpenHands improves.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants