Skip to content

Commit 0955e40

Browse files
committed
Merge remote-tracking branch 'origin/main'
2 parents ab794d4 + a7c6e83 commit 0955e40

File tree

2 files changed

+16
-17
lines changed

2 files changed

+16
-17
lines changed

src/About.vue

Lines changed: 13 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3,23 +3,30 @@
33
<div class="content-box">
44
<h2 class="text-title">About</h2>
55
<p class="text-content">
6-
Multi-SWE-bench is a dataset that tests LLMs' capability to solve GitHub issues automatically.
7-
The dataset collects 1,632 Issue-Pull Request pairs from 39 popular repositories across seven widely used programming languages: Java, TypeScript, JavaScript, Go, Rust, C, and C++.
6+
Multi-SWE-Bench is a benchmark for evaluating the issue-resolving capabilities of LLMs across multiple programming languages.
7+
The dataset consists of 1,632 issue-resolving tasks spanning 7 programming languages: Java, TypeScript, JavaScript, Go, Rust, C, and C++.
88
Evaluation is performed by verifying the project's built-in test suite results, using post-PR behavior as the reference solution.
9-
Read more about Multi-SWE-bench in our <a href="https://arxiv.org/abs/2310.06770" target="_blank">paper</a>!
9+
Read more about Multi-SWE-bench in our <a href="https://arxiv.org/abs/xxx" target="_blank">paper</a>!
1010
</p>
1111
<h3 class="text-title">Citation</h3>
12-
If you found our <a href="https://multi-swe-bench.github.io">Multi-SWE-bench</a> helpful for your work, please cite as follows:
12+
If you found the <a href="https://multi-swe-bench.github.io">Multi-SWE-bench</a> and <a href="https://www.swebench.com">SWE-bench</a> helpful for your work, please cite as follows:
1313
<p class="text-content">
14-
<pre id="citation">
15-
<code>@misc{zan2025multiswebench,
14+
<pre id="citation"><code>@misc{zan2025multiswebench,
1615
title={Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving},
1716
author={Xxx},
1817
year={2025},
1918
eprint={2503.17315},
2019
archivePrefix={arXiv},
2120
primaryClass={cs.SE},
2221
url={xxx},
22+
}</code></pre>
23+
<br/>
24+
<pre id="citation"><code>@inproceedings{jimenez2024swebench,
25+
title={SWE-bench: Can Language Models Resolve Real-world Github Issues?},
26+
author={Carlos E Jimenez and John Yang and Alexander Wettig and Shunyu Yao and Kexin Pei and Ofir Press and Karthik R Narasimhan},
27+
booktitle={The Twelfth International Conference on Learning Representations},
28+
year={2024},
29+
url={https://openreview.net/forum?id=VTF8yNQM66}
2330
}</code></pre>
2431
<br/>
2532
<b>Disclaimer:</b> Multi-SWE-bench is for research purposes only. Models

src/Home.vue

Lines changed: 3 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,6 @@
11
<template>
22
<Header></Header>
33
<section class="main-container">
4-
<!-- <div class="content-wrapper" style="display: flex; justify-content: center; align-items: center;">
5-
<div style="background-color: black; padding: 1.5em 1em; color: white; border-radius: 1em; text-align: center; width: 82%;">
6-
📣 [08/2024] We’ve released the JAVA version of <a rel="noopener noreferrer" target="_blank" style="color:var(--dark_accent_color)" href="https://www.swebench.com">SWE-bench</a>!
7-
Check it out on <a target="_blank" rel="noopener noreferrer" style="color:var(--dark_accent_color)" href="https://huggingface.co/datasets/Daoguang/multi-swe-bench">Hugging Face</a>.
8-
For more details, see our <a target="_blank" rel="noopener noreferrer" style="color:var(--dark_accent_color)" href="https://arxiv.org/abs/2408.14354">paper</a>.
9-
</div>
10-
</div> -->
114
<div class="content-wrapper">
125
<div class="content-box" v-if="leaderboard">
136
<h2 class="text-title">Leaderboard</h2>
@@ -115,9 +108,8 @@
115108
</div>
116109

117110
<p class="text-content">
118-
- The <span style="color:var(--dark_accent_color);"><b>% Resolved</b></span> metric is the percentage of instances
119-
(<b>500</b> for Python, <b>128</b> for Java, <b>224</b> for TypeScript, <b>356</b> for JavaScript, <b>428</b> for Go, <b>239</b> for Rust, <b>128</b> for C, <b>129</b> for C++) <i>solved</i> by the model.
120-
<b>Overall</b> represents all instances, while <b>Easy</b>, <b>Medium</b>, and <b>Hard</b> denote instances at different difficulty levels.
111+
- <span style="color:var(--dark_accent_color);"><b>% Resolved</b></span> denotes the proportion of successfully solved instances per language (Python: <b>500</b>, Java: <b>128</b>, TypeScript: <b>224</b>, JavaScript: <b>356</b>, Go: <b>428</b>, Rust: <b>239</b>, C: <b>128</b>, C++: <b>129</b>).
112+
<b>Overall</b> includes all instances for each language, while <b>Easy</b>, <b>Medium</b>, and <b>Hard</b> correspond to subsets categorized by difficulty level.
121113
<br>
122114
- <span style="color:var(--dark_accent_color);"><b>✅ Checked</b></span> indicates that we, the Multi-SWE-bench team, received access to the system and
123115
were able to reproduce the patch generations.
@@ -127,7 +119,7 @@
127119
<br>
128120
<br>
129121

130-
If you'd like to submit to the leaderboard, please check <router-link to="/submit">this</router-link> page.
122+
If you'd like to submit to the leaderboard, please check <router-link to="/submit">this page</router-link>.
131123
All submissions are Pass@1, do not use
132124
<code style="color:black;background-color:#ddd;border-radius: 0.25em">hints_text</code>,
133125
and are in the unassisted setting.

0 commit comments

Comments
 (0)