|
1 | 1 | <template>
|
2 | 2 | <Header></Header>
|
3 | 3 | <section class="main-container">
|
4 |
| - <!-- <div class="content-wrapper" style="display: flex; justify-content: center; align-items: center;"> |
5 |
| - <div style="background-color: black; padding: 1.5em 1em; color: white; border-radius: 1em; text-align: center; width: 82%;"> |
6 |
| - 📣 [08/2024] We’ve released the JAVA version of <a rel="noopener noreferrer" target="_blank" style="color:var(--dark_accent_color)" href="https://www.swebench.com">SWE-bench</a>! |
7 |
| - Check it out on <a target="_blank" rel="noopener noreferrer" style="color:var(--dark_accent_color)" href="https://huggingface.co/datasets/Daoguang/multi-swe-bench">Hugging Face</a>. |
8 |
| - For more details, see our <a target="_blank" rel="noopener noreferrer" style="color:var(--dark_accent_color)" href="https://arxiv.org/abs/2408.14354">paper</a>. |
9 |
| - </div> |
10 |
| - </div> --> |
11 | 4 | <div class="content-wrapper">
|
12 | 5 | <div class="content-box" v-if="leaderboard">
|
13 | 6 | <h2 class="text-title">Leaderboard</h2>
|
|
115 | 108 | </div>
|
116 | 109 |
|
117 | 110 | <p class="text-content">
|
118 |
| - - The <span style="color:var(--dark_accent_color);"><b>% Resolved</b></span> metric is the percentage of instances |
119 |
| - (<b>500</b> for Python, <b>128</b> for Java, <b>224</b> for TypeScript, <b>356</b> for JavaScript, <b>428</b> for Go, <b>239</b> for Rust, <b>128</b> for C, <b>129</b> for C++) <i>solved</i> by the model. |
120 |
| - <b>Overall</b> represents all instances, while <b>Easy</b>, <b>Medium</b>, and <b>Hard</b> denote instances at different difficulty levels. |
| 111 | + - <span style="color:var(--dark_accent_color);"><b>% Resolved</b></span> denotes the proportion of successfully solved instances per language (Python: <b>500</b>, Java: <b>128</b>, TypeScript: <b>224</b>, JavaScript: <b>356</b>, Go: <b>428</b>, Rust: <b>239</b>, C: <b>128</b>, C++: <b>129</b>). |
| 112 | + <b>Overall</b> includes all instances for each language, while <b>Easy</b>, <b>Medium</b>, and <b>Hard</b> correspond to subsets categorized by difficulty level. |
121 | 113 | <br>
|
122 | 114 | - <span style="color:var(--dark_accent_color);"><b>✅ Checked</b></span> indicates that we, the Multi-SWE-bench team, received access to the system and
|
123 | 115 | were able to reproduce the patch generations.
|
|
127 | 119 | <br>
|
128 | 120 | <br>
|
129 | 121 |
|
130 |
| - If you'd like to submit to the leaderboard, please check <router-link to="/submit">this</router-link> page. |
| 122 | + If you'd like to submit to the leaderboard, please check <router-link to="/submit">this page</router-link>. |
131 | 123 | All submissions are Pass@1, do not use
|
132 | 124 | <code style="color:black;background-color:#ddd;border-radius: 0.25em">hints_text</code>,
|
133 | 125 | and are in the unassisted setting.
|
|
0 commit comments