Skip to content

Commit 3750881

Browse files
authored
chore: Fix typos in docs and examples (#36524)
Fix typos in docs and examples Signed-off-by: co63oc <[email protected]>
1 parent 84f0186 commit 3750881

38 files changed

+50
-50
lines changed

awesome-transformers.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ Keywords: LLMs, Large Language Models, Agents, Chains
4747

4848
## [LlamaIndex](https://github.com/run-llama/llama_index)
4949

50-
[LlamaIndex](https://github.com/run-llama/llama_index) is a project that provides a central interface to connect your LLM's with external data. It provides various kinds of indices and retreival mechanisms to perform different LLM tasks and obtain knowledge-augmented results.
50+
[LlamaIndex](https://github.com/run-llama/llama_index) is a project that provides a central interface to connect your LLM's with external data. It provides various kinds of indices and retrieval mechanisms to perform different LLM tasks and obtain knowledge-augmented results.
5151

5252
Keywords: LLMs, Large Language Models, Data Retrieval, Indices, Knowledge Augmentation
5353

docs/source/en/add_new_model.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ There is never more than two levels of abstraction for any model to keep the cod
5757

5858
Other important functions like the forward method are defined in the `modeling.py` file.
5959

60-
Specific model heads (for example, sequence classification or language modeling) should call the base model in the forward pass rather than inherting from it to keep abstraction low.
60+
Specific model heads (for example, sequence classification or language modeling) should call the base model in the forward pass rather than inheriting from it to keep abstraction low.
6161

6262
New models require a configuration, for example `BrandNewLlamaConfig`, that is stored as an attribute of [`PreTrainedModel`].
6363

@@ -233,7 +233,7 @@ If you run into issues, you'll need to choose one of the following debugging str
233233
This strategy relies on breaking the original model into smaller sub-components, such as when the code can be easily run in eager mode. While more difficult, there are some advantages to this approach.
234234

235235
1. It is easier later to compare the original model to your implementation. You can automatically verify that each individual component matches its corresponding component in the Transformers' implementation. This is better than relying on a visual comparison based on print statements.
236-
2. It is easier to port individal components instead of the entire model.
236+
2. It is easier to port individual components instead of the entire model.
237237
3. It is easier for understanding how a model works by breaking it up into smaller parts.
238238
4. It is easier to prevent regressions at a later stage when you change your code thanks to component-by-component tests.
239239

@@ -328,7 +328,7 @@ def _init_weights(self, module):
328328

329329
The initialization scheme can look different if you need to adapt it to your model. For example, [`Wav2Vec2ForPreTraining`] initializes [nn.Linear](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html) in its last two linear layers.
330330

331-
The `_is_hf_initialized` flag makes sure the submodule is only initialized once. Setting `module.project_q` and `module.project_hid` to `True` ensures the custom initialization is not overriden later. The `_init_weights` function won't be applied to these modules.
331+
The `_is_hf_initialized` flag makes sure the submodule is only initialized once. Setting `module.project_q` and `module.project_hid` to `True` ensures the custom initialization is not overridden later. The `_init_weights` function won't be applied to these modules.
332332

333333
```py
334334
def _init_weights(self, module):
@@ -457,7 +457,7 @@ Don't be discouraged if your forward pass isn't identical with the output from t
457457
Your output should have a precision of *1e-3*. Ensure the output shapes and output values are identical. Common reasons for why the outputs aren't identical include:
458458

459459
- Some layers were not added (activation layer or a residual connection).
460-
- The word embedding matix is not tied.
460+
- The word embedding matrix is not tied.
461461
- The wrong positional embeddings are used because the original implementation includes an offset.
462462
- Dropout is applied during the forward pass. Fix this error by making sure `model.training` is `False` and passing `self.training` to [torch.nn.functional.dropout](https://pytorch.org/docs/stable/nn.functional.html?highlight=dropout#torch.nn.functional.dropout).
463463

docs/source/en/agents.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -159,7 +159,7 @@ Here are a few examples using notional tools:
159159
---
160160
{examples}
161161

162-
Above example were using notional tools that might not exist for you. You only have acces to those tools:
162+
Above example were using notional tools that might not exist for you. You only have access to those tools:
163163
<<tool_names>>
164164
You also can perform computations in the python code you generate.
165165

docs/source/en/deepspeed.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -840,7 +840,7 @@ Unless you have a lot of free CPU memory, fp32 weights shouldn't be saved during
840840
<hfoptions id="save">
841841
<hfoption id="offline">
842842
843-
DeepSpeed provies a [zero_to_fp32.py](https://github.com/microsoft/DeepSpeed/blob/91829476a8fd4d0d9268c03c1d56795d20a51c12/deepspeed/utils/zero_to_fp32.py#L14) script at the top-level checkpoint folder for extracting weights at any point. This is a standalone script and you don't need a config file or [`Trainer`].
843+
DeepSpeed provides a [zero_to_fp32.py](https://github.com/microsoft/DeepSpeed/blob/91829476a8fd4d0d9268c03c1d56795d20a51c12/deepspeed/utils/zero_to_fp32.py#L14) script at the top-level checkpoint folder for extracting weights at any point. This is a standalone script and you don't need a config file or [`Trainer`].
844844
845845
For example, if your checkpoint folder looks like the one shown below, then you can run the following command to create and consolidate the fp32 weights from multiple GPUs into a single `pytorch_model.bin` file. The script automatically discovers the subfolder `global_step1` which contains the checkpoint.
846846
@@ -942,7 +942,7 @@ import deepspeed
942942
ds_config = {...}
943943
# must run before instantiating the model to detect zero 3
944944
dschf = HfDeepSpeedConfig(ds_config) # keep this object alive
945-
# randomly intialize model weights
945+
# randomly initialize model weights
946946
config = AutoConfig.from_pretrained("openai-community/gpt2")
947947
model = AutoModel.from_config(config)
948948
engine = deepspeed.initialize(model=model, config_params=ds_config, ...)

docs/source/en/generation_features.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ The `streamer` parameter is compatible with any class with a [`~TextStreamer.put
5050

5151
Watermarking is useful for detecting whether text is generated. The [watermarking strategy](https://hf.co/papers/2306.04634) in Transformers randomly "colors" a subset of the tokens green. When green tokens are generated, they have a small bias added to their logits, and a higher probability of being generated. You can detect generated text by comparing the proportion of green tokens to the amount of green tokens typically found in human-generated text.
5252

53-
Watermarking is supported for any generative model in Transformers and doesn't require an extra classfication model to detect the watermarked text.
53+
Watermarking is supported for any generative model in Transformers and doesn't require an extra classification model to detect the watermarked text.
5454

5555
Create a [`WatermarkingConfig`] with the bias value to add to the logits and watermarking algorithm. The example below uses the `"selfhash"` algorithm, where the green token selection only depends on the current token. Pass the [`WatermarkingConfig`] to [`~GenerationMixin.generate`].
5656

docs/source/en/llm_tutorial.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,7 @@ You can customize [`~GenerationMixin.generate`] by overriding the parameters and
8787
model.generate(**inputs, num_beams=4, do_sample=True)
8888
```
8989

90-
[`~GenerationMixin.generate`] can also be extended with external libraries or custom code. The `logits_processor` parameter accepts custom [`LogitsProcessor`] instances for manupulating the next token probability distribution. `stopping_criteria` supports custom [`StoppingCriteria`] to stop text generation. Check out the [logits-processor-zoo](https://github.com/NVIDIA/logits-processor-zoo) for more examples of external [`~GenerationMixin.generate`]-compatible extensions.
90+
[`~GenerationMixin.generate`] can also be extended with external libraries or custom code. The `logits_processor` parameter accepts custom [`LogitsProcessor`] instances for manipulating the next token probability distribution. `stopping_criteria` supports custom [`StoppingCriteria`] to stop text generation. Check out the [logits-processor-zoo](https://github.com/NVIDIA/logits-processor-zoo) for more examples of external [`~GenerationMixin.generate`]-compatible extensions.
9191

9292
Refer to the [Generation strategies](./generation_strategies) guide to learn more about search, sampling, and decoding strategies.
9393

docs/source/en/model_doc/speech_to_text.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ be installed as follows: `apt install libsndfile1-dev`
7474
For multilingual speech translation models, `eos_token_id` is used as the `decoder_start_token_id` and
7575
the target language id is forced as the first generated token. To force the target language id as the first
7676
generated token, pass the `forced_bos_token_id` parameter to the `generate()` method. The following
77-
example shows how to transate English speech to French text using the *facebook/s2t-medium-mustc-multilingual-st*
77+
example shows how to translate English speech to French text using the *facebook/s2t-medium-mustc-multilingual-st*
7878
checkpoint.
7979

8080
```python

docs/source/en/model_doc/tvp.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -111,7 +111,7 @@ def decode(container, sampling_rate, num_frames, clip_idx, num_clips, target_fps
111111
Returns:
112112
frames (tensor): decoded frames from the video.
113113
'''
114-
assert clip_idx >= -2, "Not a valied clip_idx {}".format(clip_idx)
114+
assert clip_idx >= -2, "Not a valid clip_idx {}".format(clip_idx)
115115
frames, fps = pyav_decode(container, sampling_rate, num_frames, clip_idx, num_clips, target_fps)
116116
clip_size = sampling_rate * num_frames / target_fps * fps
117117
index = np.linspace(0, clip_size - 1, num_frames)

docs/source/en/modular_transformers.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -355,7 +355,7 @@ class Olmo2Model(OlmoModel):
355355
)
356356
```
357357

358-
You only need to change the *type* of the `self.norm` attribute to use `RMSNorm` isntead of `LayerNorm`. This change doesn't affect the logic in the forward method (layer name and usage is identical to the parent class), so you don't need to overwrite it. The linter automatically unravels it.
358+
You only need to change the *type* of the `self.norm` attribute to use `RMSNorm` instead of `LayerNorm`. This change doesn't affect the logic in the forward method (layer name and usage is identical to the parent class), so you don't need to overwrite it. The linter automatically unravels it.
359359

360360
### Model head
361361

@@ -374,7 +374,7 @@ The logic is identical to `OlmoForCausalLM` which means you don't need to make a
374374

375375
The [modeling_olmo2.py](https://github.com/huggingface/transformers/blob/main/src/transformers/models/olmo2/modeling_olmo2.py) generated by the linter also contains some classes (`Olmo2MLP`, `Olmo2RotaryEmbedding`, `Olmo2PreTrainedModel`) that weren't explicitly defined in `modular_olmo2.py`.
376376

377-
Classes that are a dependency of an inherited class but aren't explicitly defined are automatically added as a part of depdendency tracing. This is similar to how some functions were added to the `Attention` class without drrectly importing them.
377+
Classes that are a dependency of an inherited class but aren't explicitly defined are automatically added as a part of dependency tracing. This is similar to how some functions were added to the `Attention` class without directly importing them.
378378

379379
For example, `OlmoDecoderLayer` has an attribute defined as `self.mlp = OlmoMLP(config)`. This class was never explicitly redefined in `Olmo2MLP`, so the linter automatically created a `Olmo2MLP` class similar to `OlmoMLP`. It is identical to the code below if it was explicitly written in `modular_olmo2.py`.
380380

docs/source/en/perf_hardware.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ It is important the PSU has stable voltage otherwise it may not be able to suppl
2929

3030
## Cooling
3131

32-
An overheated GPU throttles its performance and can even shutdown if it's too hot to prevent damage. Keeping the GPU temperature low, anywhere between 158 - 167F, is essential for delivering full perfomance and maintaining its lifespan. Once temperatures reach 183 - 194F, the GPU may begin to throttle performance.
32+
An overheated GPU throttles its performance and can even shutdown if it's too hot to prevent damage. Keeping the GPU temperature low, anywhere between 158 - 167F, is essential for delivering full performance and maintaining its lifespan. Once temperatures reach 183 - 194F, the GPU may begin to throttle performance.
3333

3434
## Multi-GPU connectivity
3535

docs/source/en/perf_train_gpu_many.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ Use the [Model Memory Calculator](https://huggingface.co/spaces/hf-accelerate/mo
3333

3434
## Data parallelism
3535

36-
Data parallelism evenly distributes data across multiple GPUs. Each GPU holds a copy of the model and concurrently proccesses their portion of the data. At the end, the results from each GPU are synchronized and combined.
36+
Data parallelism evenly distributes data across multiple GPUs. Each GPU holds a copy of the model and concurrently processes their portion of the data. At the end, the results from each GPU are synchronized and combined.
3737

3838
Data parallelism significantly reduces training time by processing data in parallel, and it is scalable to the number of GPUs available. However, synchronizing results from each GPU can add overhead.
3939

docs/source/en/pipeline_tutorial.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ Tailor the [`Pipeline`] to your task with task specific parameters such as addin
2424

2525
Transformers has two pipeline classes, a generic [`Pipeline`] and many individual task-specific pipelines like [`TextGenerationPipeline`] or [`VisualQuestionAnsweringPipeline`]. Load these individual pipelines by setting the task identifier in the `task` parameter in [`Pipeline`]. You can find the task identifier for each pipeline in their API documentation.
2626

27-
Each task is configured to use a default pretrained model and preprocessor, but this can be overriden with the `model` parameter if you want to use a different model.
27+
Each task is configured to use a default pretrained model and preprocessor, but this can be overridden with the `model` parameter if you want to use a different model.
2828

2929
For example, to use the [`TextGenerationPipeline`] with [Gemma 2](./model_doc/gemma2), set `task="text-generation"` and `model="google/gemma-2-2b"`.
3030

docs/source/en/testing.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -220,7 +220,7 @@ Just run the following line to automatically test every docstring example in the
220220
```bash
221221
pytest --doctest-modules <path_to_file_or_dir>
222222
```
223-
If the file has a markdown extention, you should add the `--doctest-glob="*.md"` argument.
223+
If the file has a markdown extension, you should add the `--doctest-glob="*.md"` argument.
224224

225225
### Run only modified tests
226226

docs/source/zh/agents.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -233,7 +233,7 @@ Here are a few examples using notional tools:
233233
---
234234
{examples}
235235
236-
Above example were using notional tools that might not exist for you. You only have acces to those tools:
236+
Above example were using notional tools that might not exist for you. You only have access to those tools:
237237
<<tool_names>>
238238
You also can perform computations in the python code you generate.
239239

examples/flax/speech-recognition/run_flax_speech_recognition_seq2seq.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -265,7 +265,7 @@ class FlaxDataCollatorSpeechSeq2SeqWithPadding:
265265
Data collator that will dynamically pad the inputs received.
266266
Args:
267267
processor ([`Wav2Vec2Processor`])
268-
The processor used for proccessing the data.
268+
The processor used for processing the data.
269269
decoder_start_token_id (:obj: `int`)
270270
The begin-of-sentence of the decoder.
271271
input_padding (:obj:`bool`, :obj:`str` or :class:`~transformers.tokenization_utils_base.PaddingStrategy`, `optional`, defaults to :obj:`True`):

examples/pytorch/speech-pretraining/run_wav2vec2_pretraining_no_trainer.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -296,7 +296,7 @@ class DataCollatorForWav2Vec2Pretraining:
296296
The Wav2Vec2 model used for pretraining. The data collator needs to have access
297297
to config and ``_get_feat_extract_output_lengths`` function for correct padding.
298298
feature_extractor (:class:`~transformers.Wav2Vec2FeatureExtractor`):
299-
The processor used for proccessing the data.
299+
The processor used for processing the data.
300300
padding (:obj:`bool`, :obj:`str` or :class:`~transformers.tokenization_utils_base.PaddingStrategy`, `optional`, defaults to :obj:`True`):
301301
Select a strategy to pad the returned sequences (according to the model's padding side and padding index)
302302
among:
@@ -445,7 +445,7 @@ def main():
445445
accelerator.wait_for_everyone()
446446

447447
# 1. Download and create train, validation dataset
448-
# We load all dataset configuration and datset split pairs passed in
448+
# We load all dataset configuration and dataset split pairs passed in
449449
# ``args.dataset_config_names`` and ``args.dataset_split_names``
450450
datasets_splits = []
451451
for dataset_config_name, train_split_name in zip(args.dataset_config_names, args.dataset_split_names):

examples/pytorch/speech-recognition/run_speech_recognition_ctc.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -292,7 +292,7 @@ class DataCollatorCTCWithPadding:
292292
Data collator that will dynamically pad the inputs received.
293293
Args:
294294
processor (:class:`~transformers.AutoProcessor`)
295-
The processor used for proccessing the data.
295+
The processor used for processing the data.
296296
padding (:obj:`bool`, :obj:`str` or :class:`~transformers.tokenization_utils_base.PaddingStrategy`, `optional`, defaults to :obj:`True`):
297297
Select a strategy to pad the returned sequences (according to the model's padding side and padding index)
298298
among:

examples/pytorch/speech-recognition/run_speech_recognition_ctc_adapter.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -275,7 +275,7 @@ class DataCollatorCTCWithPadding:
275275
Data collator that will dynamically pad the inputs received.
276276
Args:
277277
processor (:class:`~transformers.AutoProcessor`)
278-
The processor used for proccessing the data.
278+
The processor used for processing the data.
279279
padding (:obj:`bool`, :obj:`str` or :class:`~transformers.tokenization_utils_base.PaddingStrategy`, `optional`, defaults to :obj:`True`):
280280
Select a strategy to pad the returned sequences (according to the model's padding side and padding index)
281281
among:
@@ -559,7 +559,7 @@ def remove_special_characters(batch):
559559
)
560560

561561
# if we doing adapter language training, save
562-
# vocab with adpter language
562+
# vocab with adapter language
563563
if data_args.target_language is not None:
564564
vocab_dict[data_args.target_language] = lang_dict
565565

examples/pytorch/text-classification/run_classification.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -429,7 +429,7 @@ def main():
429429
if is_regression:
430430
label_list = None
431431
num_labels = 1
432-
# regession requires float as label type, let's cast it if needed
432+
# regression requires float as label type, let's cast it if needed
433433
for split in raw_datasets.keys():
434434
if raw_datasets[split].features["label"].dtype not in ["float32", "float64"]:
435435
logger.warning(

examples/pytorch/text-generation/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ limitations under the License.
1919
Based on the script [`run_generation.py`](https://github.com/huggingface/transformers/blob/main/examples/pytorch/text-generation/run_generation.py).
2020

2121
Conditional text generation using the auto-regressive models of the library: GPT, GPT-2, GPT-J, Transformer-XL, XLNet, CTRL, BLOOM, LLAMA, OPT.
22-
A similar script is used for our official demo [Write With Transfomer](https://transformer.huggingface.co), where you
22+
A similar script is used for our official demo [Write With Transformer](https://transformer.huggingface.co), where you
2323
can try out the different models available in the library.
2424

2525
Example usage:

examples/pytorch/token-classification/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ limitations under the License.
1919
## PyTorch version
2020

2121
Fine-tuning the library models for token classification task such as Named Entity Recognition (NER), Parts-of-speech
22-
tagging (POS) or phrase extraction (CHUNKS). The main scrip `run_ner.py` leverages the 🤗 Datasets library and the Trainer API. You can easily
22+
tagging (POS) or phrase extraction (CHUNKS). The main script `run_ner.py` leverages the 🤗 Datasets library and the Trainer API. You can easily
2323
customize it to your needs if you need extra processing on your datasets.
2424

2525
It will either run on a datasets hosted on our [hub](https://huggingface.co/datasets) or with your own text files for

0 commit comments

Comments
 (0)