Skip to content

[Doc] Fix broken links and unlinked docs, add shortcuts to home sidebar #18627

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 23, 2025
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 7 additions & 3 deletions docs/.nav.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,12 @@ nav:
- getting_started/examples/offline_inference
- getting_started/examples/online_serving
- getting_started/examples/other
- Roadmap: https://roadmap.vllm.ai
- Releases: https://github.com/vllm-project/vllm/releases
- User Guide: serving/offline_inference.md
- Developer Guide: contributing/overview.md
- API Reference: api/README.md
- News:
- Roadmap: https://roadmap.vllm.ai
- Releases: https://github.com/vllm-project/vllm/releases
- User Guide:
- Inference and Serving:
- serving/offline_inference.md
Expand Down Expand Up @@ -38,7 +42,7 @@ nav:
- contributing/overview.md
- glob: contributing/*
flatten_single_child_sections: true
- contributing/model
- Model Implementation: contributing/model
- Design Documents:
- V0: design
- V1: design/v1
Expand Down
4 changes: 2 additions & 2 deletions docs/contributing/model/tests.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,14 +33,14 @@ These tests compare the model outputs of vLLM against [HF Transformers](https://

#### Generative models

For [generative models][generative-models], there are two levels of correctness tests, as defined in <gh-file:tests/models/utils.py>:
Copy link
Member Author

@DarkLight1337 DarkLight1337 May 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to change these because they were misinterpreted as relative links to the sub-headers on the current page

For [generative models](../../models/generative_models.md), there are two levels of correctness tests, as defined in <gh-file:tests/models/utils.py>:

- Exact correctness (`check_outputs_equal`): The text outputted by vLLM should exactly match the text outputted by HF.
- Logprobs similarity (`check_logprobs_close`): The logprobs outputted by vLLM should be in the top-k logprobs outputted by HF, and vice versa.

#### Pooling models

For [pooling models][pooling-models], we simply check the cosine similarity, as defined in <gh-file:tests/models/embedding/utils.py>.
For [pooling models](../../models/pooling_models.md), we simply check the cosine similarity, as defined in <gh-file:tests/models/utils.py>.

[](){ #mm-processing-tests }

Expand Down
2 changes: 1 addition & 1 deletion docs/features/spec_decode.md
Original file line number Diff line number Diff line change
Expand Up @@ -170,7 +170,7 @@ A variety of speculative models of this type are available on HF hub:
## Speculating using EAGLE based draft models

The following code configures vLLM to use speculative decoding where proposals are generated by
an [EAGLE (Extrapolation Algorithm for Greater Language-model Efficiency)](https://arxiv.org/pdf/2401.15077) based draft model. A more detailed example for offline mode, including how to extract request level acceptance rate, can be found [here](<gh-file:examples/offline_inference/eagle.py>).
an [EAGLE (Extrapolation Algorithm for Greater Language-model Efficiency)](https://arxiv.org/pdf/2401.15077) based draft model. A more detailed example for offline mode, including how to extract request level acceptance rate, can be found [here](gh-file:examples/offline_inference/eagle.py).

```python
from vllm import LLM, SamplingParams
Expand Down
6 changes: 3 additions & 3 deletions docs/models/supported_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ title: Supported Models
---
[](){ #supported-models }

vLLM supports [generative](generative-models) and [pooling](pooling-models) models across various tasks.
vLLM supports [generative](./generative_models.md) and [pooling](./pooling_models.md) models across various tasks.
If a model supports more than one task, you can set the task via the `--task` argument.

For each task, we list the model architectures that have been implemented in vLLM.
Expand Down Expand Up @@ -376,7 +376,7 @@ Specified using `--task generate`.

### Pooling Models

See [this page](pooling-models) for more information on how to use pooling models.
See [this page](./pooling_models.md) for more information on how to use pooling models.

!!! warning
Since some model architectures support both generative and pooling tasks,
Expand Down Expand Up @@ -628,7 +628,7 @@ Specified using `--task generate`.

### Pooling Models

See [this page](pooling-models) for more information on how to use pooling models.
See [this page](./pooling_models.md) for more information on how to use pooling models.

!!! warning
Since some model architectures support both generative and pooling tasks,
Expand Down
2 changes: 1 addition & 1 deletion docs/serving/openai_compatible_server.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ title: OpenAI-Compatible Server

vLLM provides an HTTP server that implements OpenAI's [Completions API](https://platform.openai.com/docs/api-reference/completions), [Chat API](https://platform.openai.com/docs/api-reference/chat), and more! This functionality lets you serve models and interact with them using an HTTP client.

In your terminal, you can [install](../getting_started/installation.md) vLLM, then start the server with the [`vllm serve`][serve-args] command. (You can also use our [Docker][deployment-docker] image.)
In your terminal, you can [install](../getting_started/installation/README.md) vLLM, then start the server with the [`vllm serve`][serve-args] command. (You can also use our [Docker][deployment-docker] image.)

```bash
vllm serve NousResearch/Meta-Llama-3-8B-Instruct --dtype auto --api-key token-abc123
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Seed Parameter Behavior in vLLM
# Seed Parameter Behavior

## Overview

Expand Down