Merge remote-tracking branch 'origin/main'

luolin101 · luolin101 · commit 0955e404eb2e · 2025-04-03T22:31:28.000+08:00
diff --git a/src/About.vue b/src/About.vue
@@ -3,23 +3,30 @@
     <div class="content-box">
       <h2 class="text-title">About</h2>
       <p class="text-content">
-        Multi-SWE-bench is a dataset that tests LLMs' capability to solve GitHub issues automatically.
-        The dataset collects 1,632 Issue-Pull Request pairs from 39 popular repositories across seven widely used programming languages: Java, TypeScript, JavaScript, Go, Rust, C, and C++.
+        Multi-SWE-Bench is a benchmark for evaluating the issue-resolving capabilities of LLMs across multiple programming languages.
+        The dataset consists of 1,632 issue-resolving tasks spanning 7 programming languages: Java, TypeScript, JavaScript, Go, Rust, C, and C++.
         Evaluation is performed by verifying the project's built-in test suite results, using post-PR behavior as the reference solution.
-        Read more about Multi-SWE-bench in our <a href="https://arxiv.org/abs/2310.06770" target="_blank">paper</a>!
+        Read more about Multi-SWE-bench in our <a href="https://arxiv.org/abs/xxx" target="_blank">paper</a>!
       </p>
       <h3 class="text-title">Citation</h3>
-      If you found our <a href="https://multi-swe-bench.github.io">Multi-SWE-bench</a> helpful for your work, please cite as follows:
+      If you found the <a href="https://multi-swe-bench.github.io">Multi-SWE-bench</a> and <a href="https://www.swebench.com">SWE-bench</a> helpful for your work, please cite as follows:
       <p class="text-content">
-<pre id="citation">
-<code>@misc{zan2025multiswebench,
+<pre id="citation"><code>@misc{zan2025multiswebench,
       title={Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving}, 
       author={Xxx},
       year={2025},
       eprint={2503.17315},
       archivePrefix={arXiv},
       primaryClass={cs.SE},
       url={xxx}, 
+}</code></pre>
+<br/>
+<pre id="citation"><code>@inproceedings{jimenez2024swebench,
+     title={SWE-bench: Can Language Models Resolve Real-world Github Issues?},
+     author={Carlos E Jimenez and John Yang and Alexander Wettig and Shunyu Yao and Kexin Pei and Ofir Press and Karthik R Narasimhan},
+     booktitle={The Twelfth International Conference on Learning Representations},
+     year={2024},
+     url={https://openreview.net/forum?id=VTF8yNQM66}
 }</code></pre>
     <br/>
         <b>Disclaimer:</b> Multi-SWE-bench is for research purposes only. Models
diff --git a/src/Home.vue b/src/Home.vue
@@ -1,13 +1,6 @@
 <template>
   <Header></Header>
   <section class="main-container">
-    <!-- <div class="content-wrapper" style="display: flex; justify-content: center; align-items: center;">
-      <div style="background-color: black; padding: 1.5em 1em; color: white; border-radius: 1em; text-align: center; width: 82%;">
-        📣 [08/2024] We’ve released the JAVA version of <a rel="noopener noreferrer" target="_blank" style="color:var(--dark_accent_color)" href="https://www.swebench.com">SWE-bench</a>!
-        Check it out on <a target="_blank" rel="noopener noreferrer" style="color:var(--dark_accent_color)" href="https://huggingface.co/datasets/Daoguang/multi-swe-bench">Hugging Face</a>.
-        For more details, see our <a target="_blank" rel="noopener noreferrer" style="color:var(--dark_accent_color)" href="https://arxiv.org/abs/2408.14354">paper</a>.
-      </div>
-    </div> -->
     <div class="content-wrapper">
       <div class="content-box" v-if="leaderboard">
         <h2 class="text-title">Leaderboard</h2>
@@ -115,9 +108,8 @@
         </div>
 
         <p class="text-content">
-          - The <span style="color:var(--dark_accent_color);"><b>% Resolved</b></span> metric is the percentage of instances
-          (<b>500</b> for Python, <b>128</b> for Java, <b>224</b> for TypeScript, <b>356</b> for JavaScript, <b>428</b> for Go, <b>239</b> for Rust, <b>128</b> for C, <b>129</b> for C++) <i>solved</i> by the model.
-          <b>Overall</b> represents all instances, while <b>Easy</b>, <b>Medium</b>, and <b>Hard</b> denote instances at different difficulty levels.
+          - <span style="color:var(--dark_accent_color);"><b>% Resolved</b></span> denotes the proportion of successfully solved instances per language (Python: <b>500</b>, Java: <b>128</b>, TypeScript: <b>224</b>, JavaScript: <b>356</b>, Go: <b>428</b>, Rust: <b>239</b>, C: <b>128</b>, C++: <b>129</b>).
+          <b>Overall</b> includes all instances for each language, while <b>Easy</b>, <b>Medium</b>, and <b>Hard</b> correspond to subsets categorized by difficulty level.
           <br>
           - <span style="color:var(--dark_accent_color);"><b>✅ Checked</b></span> indicates that we, the Multi-SWE-bench team, received access to the system and
           were able to reproduce the patch generations.
@@ -127,7 +119,7 @@
           <br>
           <br>
 
-          If you'd like to submit to the leaderboard, please check <router-link to="/submit">this</router-link> page.
+          If you'd like to submit to the leaderboard, please check <router-link to="/submit">this page</router-link>.
           All submissions are Pass@1, do not use
           <code style="color:black;background-color:#ddd;border-radius: 0.25em">hints_text</code>,
           and are in the unassisted setting.