Skip to content

Commit 74e7fc3

Browse files
author
SeanNaren
committed
Address code review
1 parent e9230d0 commit 74e7fc3

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

docs/source/advanced/advanced_gpu.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ But for **fine-tuning** a model, you can reach 10 to 20 Billion parameter models
3333
When Shouldn't I use an Optimized Distributed Plugin?
3434
"""""""""""""""""""""""""""""""""""""""""""""""""""""
3535

36-
Sharding techniques help when model sizes are fairly large; roughly 500M+ parameters is where we've seen benefits. However, in cases where your model is small (ResNet50 of around 80M Parameters) it may be best to stick to normal distributed training, unless you are using unusually large batch sizes or inputs.
36+
Sharding techniques help when model sizes are fairly large; roughly 500M+ parameters is where we've seen benefits. However, in cases where your model is small (ResNet50 of around 80M Parameters) it may be best to stick to ordinary distributed training, unless you are using unusually large batch sizes or inputs.
3737

3838
----------
3939

0 commit comments

Comments
 (0)