You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Yeah, so I expect that OpenHands will do a reasonable job at this, but we need to have benchmarks in our eval harness in order to iterate on them as OpenHands improves.
What problem or use case are you trying to solve?
It would be good to know OpenHands's performance on Java and other major languages.
Describe the UX of the solution you'd like
Multi-SWE-Bench makes this possible: https://multi-swe-bench.github.io/#/
We should incorporate it into our evaluation harness.
The text was updated successfully, but these errors were encountered: