You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[LlamaIndex](https://github.com/run-llama/llama_index) is a project that provides a central interface to connect your LLM's with external data. It provides various kinds of indices and retreival mechanisms to perform different LLM tasks and obtain knowledge-augmented results.
50
+
[LlamaIndex](https://github.com/run-llama/llama_index) is a project that provides a central interface to connect your LLM's with external data. It provides various kinds of indices and retrieval mechanisms to perform different LLM tasks and obtain knowledge-augmented results.
51
51
52
52
Keywords: LLMs, Large Language Models, Data Retrieval, Indices, Knowledge Augmentation
Copy file name to clipboardExpand all lines: docs/source/en/add_new_model.md
+4-4
Original file line number
Diff line number
Diff line change
@@ -57,7 +57,7 @@ There is never more than two levels of abstraction for any model to keep the cod
57
57
58
58
Other important functions like the forward method are defined in the `modeling.py` file.
59
59
60
-
Specific model heads (for example, sequence classification or language modeling) should call the base model in the forward pass rather than inherting from it to keep abstraction low.
60
+
Specific model heads (for example, sequence classification or language modeling) should call the base model in the forward pass rather than inheriting from it to keep abstraction low.
61
61
62
62
New models require a configuration, for example `BrandNewLlamaConfig`, that is stored as an attribute of [`PreTrainedModel`].
63
63
@@ -233,7 +233,7 @@ If you run into issues, you'll need to choose one of the following debugging str
233
233
This strategy relies on breaking the original model into smaller sub-components, such as when the code can be easily run in eager mode. While more difficult, there are some advantages to this approach.
234
234
235
235
1. It is easier later to compare the original model to your implementation. You can automatically verify that each individual component matches its corresponding component in the Transformers' implementation. This is better than relying on a visual comparison based on print statements.
236
-
2. It is easier to port individal components instead of the entire model.
236
+
2. It is easier to port individual components instead of the entire model.
237
237
3. It is easier for understanding how a model works by breaking it up into smaller parts.
238
238
4. It is easier to prevent regressions at a later stage when you change your code thanks to component-by-component tests.
The initialization scheme can look different if you need to adapt it to your model. For example, [`Wav2Vec2ForPreTraining`] initializes [nn.Linear](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html) in its last two linear layers.
330
330
331
-
The `_is_hf_initialized` flag makes sure the submodule is only initialized once. Setting `module.project_q` and `module.project_hid` to `True` ensures the custom initialization is not overriden later. The `_init_weights` function won't be applied to these modules.
331
+
The `_is_hf_initialized` flag makes sure the submodule is only initialized once. Setting `module.project_q` and `module.project_hid` to `True` ensures the custom initialization is not overridden later. The `_init_weights` function won't be applied to these modules.
332
332
333
333
```py
334
334
def_init_weights(self, module):
@@ -457,7 +457,7 @@ Don't be discouraged if your forward pass isn't identical with the output from t
457
457
Your output should have a precision of *1e-3*. Ensure the output shapes and output values are identical. Common reasons for why the outputs aren't identical include:
458
458
459
459
- Some layers were not added (activation layer or a residual connection).
460
-
- The word embedding matix is not tied.
460
+
- The word embedding matrix is not tied.
461
461
- The wrong positional embeddings are used because the original implementation includes an offset.
462
462
- Dropout is applied during the forward pass. Fix this error by making sure `model.training` is `False` and passing `self.training` to [torch.nn.functional.dropout](https://pytorch.org/docs/stable/nn.functional.html?highlight=dropout#torch.nn.functional.dropout).
Copy file name to clipboardExpand all lines: docs/source/en/deepspeed.md
+2-2
Original file line number
Diff line number
Diff line change
@@ -840,7 +840,7 @@ Unless you have a lot of free CPU memory, fp32 weights shouldn't be saved during
840
840
<hfoptions id="save">
841
841
<hfoption id="offline">
842
842
843
-
DeepSpeed provies a [zero_to_fp32.py](https://github.com/microsoft/DeepSpeed/blob/91829476a8fd4d0d9268c03c1d56795d20a51c12/deepspeed/utils/zero_to_fp32.py#L14) script at the top-level checkpoint folder for extracting weights at any point. This is a standalone script and you don't need a config file or [`Trainer`].
843
+
DeepSpeed provides a [zero_to_fp32.py](https://github.com/microsoft/DeepSpeed/blob/91829476a8fd4d0d9268c03c1d56795d20a51c12/deepspeed/utils/zero_to_fp32.py#L14) script at the top-level checkpoint folder for extracting weights at any point. This is a standalone script and you don't need a config file or [`Trainer`].
844
844
845
845
For example, if your checkpoint folder looks like the one shown below, then you can run the following command to create and consolidate the fp32 weights from multiple GPUs into a single `pytorch_model.bin` file. The script automatically discovers the subfolder `global_step1` which contains the checkpoint.
846
846
@@ -942,7 +942,7 @@ import deepspeed
942
942
ds_config = {...}
943
943
# must run before instantiating the model to detect zero 3
944
944
dschf = HfDeepSpeedConfig(ds_config) # keep this object alive
Copy file name to clipboardExpand all lines: docs/source/en/generation_features.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -50,7 +50,7 @@ The `streamer` parameter is compatible with any class with a [`~TextStreamer.put
50
50
51
51
Watermarking is useful for detecting whether text is generated. The [watermarking strategy](https://hf.co/papers/2306.04634) in Transformers randomly "colors" a subset of the tokens green. When green tokens are generated, they have a small bias added to their logits, and a higher probability of being generated. You can detect generated text by comparing the proportion of green tokens to the amount of green tokens typically found in human-generated text.
52
52
53
-
Watermarking is supported for any generative model in Transformers and doesn't require an extra classfication model to detect the watermarked text.
53
+
Watermarking is supported for any generative model in Transformers and doesn't require an extra classification model to detect the watermarked text.
54
54
55
55
Create a [`WatermarkingConfig`] with the bias value to add to the logits and watermarking algorithm. The example below uses the `"selfhash"` algorithm, where the green token selection only depends on the current token. Pass the [`WatermarkingConfig`] to [`~GenerationMixin.generate`].
[`~GenerationMixin.generate`] can also be extended with external libraries or custom code. The `logits_processor` parameter accepts custom [`LogitsProcessor`] instances for manupulating the next token probability distribution. `stopping_criteria` supports custom [`StoppingCriteria`] to stop text generation. Check out the [logits-processor-zoo](https://github.com/NVIDIA/logits-processor-zoo) for more examples of external [`~GenerationMixin.generate`]-compatible extensions.
90
+
[`~GenerationMixin.generate`] can also be extended with external libraries or custom code. The `logits_processor` parameter accepts custom [`LogitsProcessor`] instances for manipulating the next token probability distribution. `stopping_criteria` supports custom [`StoppingCriteria`] to stop text generation. Check out the [logits-processor-zoo](https://github.com/NVIDIA/logits-processor-zoo) for more examples of external [`~GenerationMixin.generate`]-compatible extensions.
91
91
92
92
Refer to the [Generation strategies](./generation_strategies) guide to learn more about search, sampling, and decoding strategies.
Copy file name to clipboardExpand all lines: docs/source/en/modular_transformers.md
+2-2
Original file line number
Diff line number
Diff line change
@@ -355,7 +355,7 @@ class Olmo2Model(OlmoModel):
355
355
)
356
356
```
357
357
358
-
You only need to change the *type* of the `self.norm` attribute to use `RMSNorm`isntead of `LayerNorm`. This change doesn't affect the logic in the forward method (layer name and usage is identical to the parent class), so you don't need to overwrite it. The linter automatically unravels it.
358
+
You only need to change the *type* of the `self.norm` attribute to use `RMSNorm`instead of `LayerNorm`. This change doesn't affect the logic in the forward method (layer name and usage is identical to the parent class), so you don't need to overwrite it. The linter automatically unravels it.
359
359
360
360
### Model head
361
361
@@ -374,7 +374,7 @@ The logic is identical to `OlmoForCausalLM` which means you don't need to make a
374
374
375
375
The [modeling_olmo2.py](https://github.com/huggingface/transformers/blob/main/src/transformers/models/olmo2/modeling_olmo2.py) generated by the linter also contains some classes (`Olmo2MLP`, `Olmo2RotaryEmbedding`, `Olmo2PreTrainedModel`) that weren't explicitly defined in `modular_olmo2.py`.
376
376
377
-
Classes that are a dependency of an inherited class but aren't explicitly defined are automatically added as a part of depdendency tracing. This is similar to how some functions were added to the `Attention` class without drrectly importing them.
377
+
Classes that are a dependency of an inherited class but aren't explicitly defined are automatically added as a part of dependency tracing. This is similar to how some functions were added to the `Attention` class without directly importing them.
378
378
379
379
For example, `OlmoDecoderLayer` has an attribute defined as `self.mlp = OlmoMLP(config)`. This class was never explicitly redefined in `Olmo2MLP`, so the linter automatically created a `Olmo2MLP` class similar to `OlmoMLP`. It is identical to the code below if it was explicitly written in `modular_olmo2.py`.
Copy file name to clipboardExpand all lines: docs/source/en/perf_hardware.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -29,7 +29,7 @@ It is important the PSU has stable voltage otherwise it may not be able to suppl
29
29
30
30
## Cooling
31
31
32
-
An overheated GPU throttles its performance and can even shutdown if it's too hot to prevent damage. Keeping the GPU temperature low, anywhere between 158 - 167F, is essential for delivering full perfomance and maintaining its lifespan. Once temperatures reach 183 - 194F, the GPU may begin to throttle performance.
32
+
An overheated GPU throttles its performance and can even shutdown if it's too hot to prevent damage. Keeping the GPU temperature low, anywhere between 158 - 167F, is essential for delivering full performance and maintaining its lifespan. Once temperatures reach 183 - 194F, the GPU may begin to throttle performance.
Copy file name to clipboardExpand all lines: docs/source/en/perf_train_gpu_many.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -33,7 +33,7 @@ Use the [Model Memory Calculator](https://huggingface.co/spaces/hf-accelerate/mo
33
33
34
34
## Data parallelism
35
35
36
-
Data parallelism evenly distributes data across multiple GPUs. Each GPU holds a copy of the model and concurrently proccesses their portion of the data. At the end, the results from each GPU are synchronized and combined.
36
+
Data parallelism evenly distributes data across multiple GPUs. Each GPU holds a copy of the model and concurrently processes their portion of the data. At the end, the results from each GPU are synchronized and combined.
37
37
38
38
Data parallelism significantly reduces training time by processing data in parallel, and it is scalable to the number of GPUs available. However, synchronizing results from each GPU can add overhead.
Copy file name to clipboardExpand all lines: docs/source/en/pipeline_tutorial.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -24,7 +24,7 @@ Tailor the [`Pipeline`] to your task with task specific parameters such as addin
24
24
25
25
Transformers has two pipeline classes, a generic [`Pipeline`] and many individual task-specific pipelines like [`TextGenerationPipeline`] or [`VisualQuestionAnsweringPipeline`]. Load these individual pipelines by setting the task identifier in the `task` parameter in [`Pipeline`]. You can find the task identifier for each pipeline in their API documentation.
26
26
27
-
Each task is configured to use a default pretrained model and preprocessor, but this can be overriden with the `model` parameter if you want to use a different model.
27
+
Each task is configured to use a default pretrained model and preprocessor, but this can be overridden with the `model` parameter if you want to use a different model.
28
28
29
29
For example, to use the [`TextGenerationPipeline`] with [Gemma 2](./model_doc/gemma2), set `task="text-generation"` and `model="google/gemma-2-2b"`.
Copy file name to clipboardExpand all lines: examples/pytorch/text-generation/README.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -19,7 +19,7 @@ limitations under the License.
19
19
Based on the script [`run_generation.py`](https://github.com/huggingface/transformers/blob/main/examples/pytorch/text-generation/run_generation.py).
20
20
21
21
Conditional text generation using the auto-regressive models of the library: GPT, GPT-2, GPT-J, Transformer-XL, XLNet, CTRL, BLOOM, LLAMA, OPT.
22
-
A similar script is used for our official demo [Write With Transfomer](https://transformer.huggingface.co), where you
22
+
A similar script is used for our official demo [Write With Transformer](https://transformer.huggingface.co), where you
23
23
can try out the different models available in the library.
0 commit comments