Skip to content

Add llama4 #37307

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 254 commits into from
Apr 5, 2025
Merged
Show file tree
Hide file tree
Changes from 239 commits
Commits
Show all changes
254 commits
Select commit Hold shift + click to select a range
9a75c63
remove one of the last deps
ArthurZucker Mar 13, 2025
e3c52a2
update fast image processor after refactor
yonigozlan Mar 13, 2025
1854fc9
styling
ArthurZucker Mar 13, 2025
660dc8c
more quality of life improvements
ArthurZucker Mar 13, 2025
2defa9c
Merge branch 'final-version' of github.com:huggingface/new-model-addi…
ArthurZucker Mar 13, 2025
0cf2e77
nit
ArthurZucker Mar 13, 2025
693fc47
update
ArthurZucker Mar 13, 2025
8da4b6e
cleanups
ArthurZucker Mar 13, 2025
ba7a8aa
some cleanups
ArthurZucker Mar 19, 2025
db2821e
vllm updates
ArthurZucker Mar 19, 2025
6c04e10
update fake image token
ArthurZucker Mar 21, 2025
5e9d84f
[convert] Fix typo
pcuenca Mar 25, 2025
aa595de
[convert] Strip extraneous bytes from shards
pcuenca Mar 25, 2025
507857d
[convert] Minor fixes
pcuenca Mar 25, 2025
d9e3f86
[convert] Use num_experts
pcuenca Mar 25, 2025
5bebf97
multi-image fixes in modeling + processor
molbap Mar 25, 2025
671c37b
fixup size
molbap Mar 25, 2025
972c465
128 experts
pcuenca Mar 25, 2025
1be3ddc
Use default rope
pcuenca Mar 26, 2025
347a762
Merge branch 'final-version' into fixes_cleanups
molbap Mar 26, 2025
b06a26b
Unfuse mlp
pcuenca Mar 26, 2025
52787d5
simplify a lot inputs embeds merging
molbap Mar 26, 2025
9c0ef18
Merge branch 'fixes_cleanups' of github.com:huggingface/new-model-add…
molbap Mar 26, 2025
03e9939
remove .item() :eyes:
molbap Mar 26, 2025
ddf7adc
fix from review
molbap Mar 26, 2025
82004d9
Merge pull request #5 from huggingface/fixes_cleanups
molbap Mar 26, 2025
ca0cd0e
Merge branch 'final-version' into moe-128
pcuenca Mar 28, 2025
54be1a0
Address feedback
pcuenca Mar 28, 2025
b38318d
Use None "default" for rope_scaling. Add eot.
pcuenca Mar 30, 2025
ed00fb3
set seed
ArthurZucker Mar 31, 2025
b5373e2
Merge branch 'main' of github.com:huggingface/new-model-addition-meta…
ArthurZucker Mar 31, 2025
fb748af
return aspect ratios and bug fixes
youngkent Mar 28, 2025
189a103
Moe 128 rebased (#8)
liuzijing2014 Mar 31, 2025
24d4599
un-comment write_tokenizer from converting script
liuzijing2014 Mar 31, 2025
7352034
remove un-used imports
liuzijing2014 Apr 1, 2025
ca64ae5
[llama4] Pop aspect_ratios from image processor output in Llama4Proce…
jmswen Apr 1, 2025
3bf26c2
Merge pull request #11 from huggingface/remove-aspect-ratios
jmswen Apr 1, 2025
4a1fec8
Merge remote-tracking branch 'origin/final-version' into moe-128
pcuenca Apr 1, 2025
4af4c77
Fix parameter_count name
pcuenca Apr 1, 2025
b077bb5
Update src/transformers/models/llama4/configuration_llama4.py
ArthurZucker Apr 1, 2025
c487c62
Merge pull request #4 from huggingface/moe-128
ArthurZucker Apr 1, 2025
55a17c5
nit
ArthurZucker Apr 1, 2025
90d5876
Merge branch 'final-version' of github.com:huggingface/new-model-addi…
ArthurZucker Apr 1, 2025
e53363d
Add changes for no_rope, moe_layers, chunked attention. Just need to …
ArthurZucker Apr 1, 2025
5b8dd83
Update src/transformers/models/llama4/image_processing_llama4_fast.py
ArthurZucker Apr 1, 2025
87abef5
Merge pull request #13 from huggingface/meta_vllm
ArthurZucker Apr 1, 2025
71385f1
nit
ArthurZucker Apr 1, 2025
0c3f25a
Merge branch 'main' of github.com:huggingface/new-model-addition-meta…
ArthurZucker Apr 1, 2025
ec85fa3
Merge branch 'final-version' of github.com:huggingface/new-model-addi…
ArthurZucker Apr 1, 2025
c358a1b
fix post merge with main
ArthurZucker Apr 1, 2025
0c3dc0c
support flex attention
ArthurZucker Apr 1, 2025
1f4072b
Merge branch 'final-version' into norope
ArthurZucker Apr 1, 2025
d728d06
fixes
ArthurZucker Apr 1, 2025
31d88f1
fix
MekkCyber Apr 1, 2025
c338736
add layer
MekkCyber Apr 1, 2025
6529cad
small updates
ArthurZucker Apr 1, 2025
558c096
rebase and delete llm_compressor
MekkCyber Apr 1, 2025
7251716
nit
MekkCyber Apr 1, 2025
5be1b28
[llama4/mm] Add back <|image|> token that delimits global tile
jmswen Apr 1, 2025
6f63da6
Merge pull request #16 from huggingface/global-tile
jmswen Apr 1, 2025
f4f9fbc
[llama4/mm] Fix Llama 4 image processing unit tests
jmswen Apr 1, 2025
2ad69a4
add explicit dtype
jmswen Apr 1, 2025
0a9da1b
sdpa works
ArthurZucker Apr 1, 2025
21eb873
Merge pull request #17 from huggingface/tests
jmswen Apr 1, 2025
4047e86
Merge pull request #15 from huggingface/fix_quantization
MekkCyber Apr 1, 2025
6da9409
comment todo small
ArthurZucker Apr 1, 2025
233c7df
fix model loading
liuzijing2014 Apr 2, 2025
cd4a2da
Merge pull request #18 from huggingface/meta/fix-model-loading
ArthurZucker Apr 2, 2025
fa75c34
revert
MekkCyber Apr 2, 2025
9679739
nits
ArthurZucker Apr 2, 2025
eb677fa
Merge pull request #19 from huggingface/reverting_quantization_fix
ArthurZucker Apr 2, 2025
b61c859
small fix for TP on 1 node
molbap Apr 2, 2025
822f296
Read new params from config
pcuenca Apr 2, 2025
a417896
Add <|eom|>
pcuenca Apr 2, 2025
37391a3
lol don't know how this got here
pcuenca Apr 2, 2025
fe240a6
adding fp8
MekkCyber Apr 2, 2025
ef31789
Save processor, fix chat template
pcuenca Apr 2, 2025
afcc7ec
style
pcuenca Apr 2, 2025
ce5d1ea
Add boi/eoi tokens
pcuenca Apr 2, 2025
da1e691
fixes for now flex seems to work :)
ArthurZucker Apr 2, 2025
7a2afb3
updates
ArthurZucker Apr 2, 2025
85cf8b9
nits
ArthurZucker Apr 2, 2025
ab268fb
updates
ArthurZucker Apr 2, 2025
f418d06
missking keys
MekkCyber Apr 2, 2025
2133277
add context parallel
ArthurZucker Apr 2, 2025
c29469c
update
ArthurZucker Apr 2, 2025
8b0a8c9
update
MekkCyber Apr 2, 2025
e472a4e
fix
ArthurZucker Apr 2, 2025
2f8d05b
nits
ArthurZucker Apr 2, 2025
196d87e
add worldsize and make eager attn work for vision
mht-sharma Apr 2, 2025
ef479fa
Merge pull request #23 from huggingface/minor_tgi_fix
ArthurZucker Apr 2, 2025
1245170
Ignore new key present in base models
pcuenca Apr 2, 2025
ddf8993
add tp_plan
MekkCyber Apr 2, 2025
b98cde8
fix nope
liuzijing2014 Apr 2, 2025
b25084b
minor fix
liuzijing2014 Apr 2, 2025
0f5b27b
Merge pull request #26 from huggingface/meta/fix-nope
ArthurZucker Apr 2, 2025
99ec54b
Clean up Llama4 vision model
sarckk Apr 2, 2025
0a10252
Merge pull request #28 from huggingface/cleanup-mllama4
ArthurZucker Apr 3, 2025
90e8e2c
current updates
ArthurZucker Apr 3, 2025
5e87ba9
add support for `attn_temperature_tuning`
ArthurZucker Apr 3, 2025
9e2e0f9
add floor scale
ArthurZucker Apr 3, 2025
5b1721b
add missing attn scales
ArthurZucker Apr 3, 2025
c06da80
push what works, dirty trick for the device synch
ArthurZucker Apr 3, 2025
29f55d2
oups
ArthurZucker Apr 3, 2025
cf83f0b
Fix pad_token_id
pcuenca Apr 3, 2025
06413dc
fix causallml loading
SunMarc Apr 3, 2025
ed6cba8
rm
SunMarc Apr 3, 2025
6d564d0
Merge pull request #20 from huggingface/conversion-fixes
ArthurZucker Apr 3, 2025
ff1df03
fix tied-weights
SunMarc Apr 3, 2025
6decf84
fix sdpa
molbap Apr 3, 2025
ba2e464
Merge branch 'norope' of github.com:huggingface/new-model-addition-me…
molbap Apr 3, 2025
4eabf8f
Merge pull request #32 from huggingface/remove-warning
ArthurZucker Apr 3, 2025
7a00169
push current version
ArthurZucker Apr 3, 2025
a820dbe
Merge branch 'norope' of github.com:huggingface/new-model-addition-me…
ArthurZucker Apr 3, 2025
24dbcad
should work with both short and long
ArthurZucker Apr 3, 2025
f2bbb4b
add compressed_tensos & fix fbgemm tp
MekkCyber Apr 3, 2025
aeaad13
Fix flex impl
drisspg Apr 4, 2025
96066e0
style
ArthurZucker Apr 4, 2025
eb535ee
chunking
Cyrilvallez Apr 3, 2025
60a58cb
Merge branch 'final-version' into norope
ArthurZucker Apr 4, 2025
e19af4b
try to revert the potentially breaking change
ArthurZucker Apr 4, 2025
eb167f2
fix auto factory
ArthurZucker Apr 4, 2025
7f8941d
fix shapes in general
Cyrilvallez Apr 4, 2025
30cacf7
rm processing
SunMarc Apr 4, 2025
99f2297
Merge pull request #30 from huggingface/fix-causal-lm-loading
ArthurZucker Apr 4, 2025
7990c78
commit cache utils cleanup
ArthurZucker Apr 4, 2025
c7d4c88
Fix context length
pcuenca Apr 4, 2025
efb4577
fix
MekkCyber Apr 4, 2025
9f9974b
Merge branch 'final-version' into add_fbgemm
MekkCyber Apr 4, 2025
174eda3
allocate
MekkCyber Apr 4, 2025
bdfb573
update tp_plan
MekkCyber Apr 4, 2025
aa8daba
Merge pull request #21 from huggingface/add_fbgemm
MekkCyber Apr 4, 2025
05cc59e
fix SDPA!
ArthurZucker Apr 4, 2025
dcb29eb
Add support for sparse `Llama4TextMoe` layer from the kernel hub
danieldk Apr 4, 2025
61626d0
cleanup
ArthurZucker Apr 4, 2025
373a472
better merge
ArthurZucker Apr 4, 2025
d7d09a1
Merge branch 'norope' of github.com:huggingface/new-model-addition-me…
ArthurZucker Apr 4, 2025
64c2133
update
ArthurZucker Apr 4, 2025
85b3c7a
still broken fixing now
ArthurZucker Apr 4, 2025
bfc8049
nits
ArthurZucker Apr 4, 2025
5da0832
revert print
ArthurZucker Apr 4, 2025
bc44b2b
Write max_position_embeddings and max_model_length
pcuenca Apr 4, 2025
1a76267
Update modeling_llama4.py
Cyrilvallez Apr 4, 2025
fd0f273
Save attention_chunk_size
pcuenca Apr 4, 2025
f03660a
Sync eos terminators
pcuenca Apr 4, 2025
3612b9c
Read initializer_range
pcuenca Apr 4, 2025
f781885
style
pcuenca Apr 4, 2025
206c8ae
remove `dict`
pcuenca Apr 4, 2025
51f7cd2
fix
ArthurZucker Apr 4, 2025
cb58cea
eager should use `chunked_attention_mask`
ArthurZucker Apr 4, 2025
7414235
revert
ArthurZucker Apr 4, 2025
04b302a
fixup
ArthurZucker Apr 4, 2025
a515579
Merge pull request #14 from huggingface/norope
ArthurZucker Apr 4, 2025
a9045fc
Merge branch 'final-version' of github.com:huggingface/new-model-addi…
ArthurZucker Apr 4, 2025
ccda19f
Merge pull request #36 from huggingface/sparse-llama4-moe
ArthurZucker Apr 4, 2025
598dded
Merge branch 'final-version' into fix-context-length
ArthurZucker Apr 4, 2025
ec7656a
Merge pull request #35 from huggingface/fix-context-length
ArthurZucker Apr 4, 2025
fcee23d
fix config
SunMarc Apr 4, 2025
6ca6f66
Revert "Merge pull request #36 from huggingface/sparse-llama4-moe"
LysandreJik Apr 4, 2025
535030a
Fix typo and remove warning with compiled flex and chunked prefill
Cyrilvallez Apr 4, 2025
a43e056
Fix MoE vs FF (#41)
pcuenca Apr 4, 2025
f5dd6fb
fix
MekkCyber Apr 4, 2025
7c03c7e
Use correct no_rope_layers if provided one is empty list
sarckk Apr 4, 2025
6a8b9f6
Merge pull request #46 from huggingface/keep-nrope-layers-fix
yeqcharlotte Apr 4, 2025
7bda11f
update tests
MekkCyber Apr 4, 2025
e547b10
fix
MekkCyber Apr 4, 2025
0130b2d
skipping some tests
MekkCyber Apr 4, 2025
93022de
fix fp8 loading
liuzijing2014 Apr 5, 2025
45cf582
fix text geneartion pipeline
liuzijing2014 Apr 5, 2025
a3e8267
eager needs 4D mask
ArthurZucker Apr 5, 2025
6ab0682
fix
SunMarc Apr 5, 2025
fd150bb
Merge pull request #50 from huggingface/fix-eager
ArthurZucker Apr 5, 2025
ef8dbe2
Some cleanup
LysandreJik Apr 5, 2025
c38bf3a
fix
LysandreJik Apr 5, 2025
141da65
update
MekkCyber Apr 5, 2025
66c36a4
fix
MekkCyber Apr 5, 2025
9b2e35d
replace correctly module
SunMarc Apr 5, 2025
ce91d95
patch
MekkCyber Apr 5, 2025
2374ff7
modulelist
MekkCyber Apr 5, 2025
61f45af
update
MekkCyber Apr 5, 2025
a471b10
update
MekkCyber Apr 5, 2025
4c4bc81
clean up
MekkCyber Apr 5, 2025
f642d32
Don't move to `cuda:0` in distributed mode
pcuenca Apr 5, 2025
3d58f8e
restrict to compressed tensors for now
SunMarc Apr 5, 2025
8dbf7cb
rm print
SunMarc Apr 5, 2025
48b4f56
Docs!
LysandreJik Apr 5, 2025
46b0815
Fixes
LysandreJik Apr 5, 2025
0849d32
Update docs/source/en/model_doc/llama4.md
LysandreJik Apr 5, 2025
f7756b4
Fixes
LysandreJik Apr 5, 2025
27364da
cuda graph fix
mht-sharma Apr 5, 2025
b239675
Merge pull request #38 from huggingface/smol-fix
ArthurZucker Apr 5, 2025
aeec2dc
Merge pull request #49 from huggingface/fix-quantization
ArthurZucker Apr 5, 2025
8578252
Merge pull request #53 from huggingface/l4-docs
LysandreJik Apr 5, 2025
eb9e4af
Merge branch 'final-version' of github.com:huggingface/new-model-addi…
ArthurZucker Apr 5, 2025
ad839d3
revert some stuff
ArthurZucker Apr 5, 2025
9f03f05
fixup
ArthurZucker Apr 5, 2025
83282a1
styling
ArthurZucker Apr 5, 2025
fb495fd
Merge pull request #44 from huggingface/fix_style
ArthurZucker Apr 5, 2025
2902839
Merge pull request #54 from huggingface/fix-tp-pipeline
ArthurZucker Apr 5, 2025
3eab443
Update src/transformers/models/llama4/modeling_llama4.py
mht-sharma Apr 5, 2025
688dc5c
Merge branch 'final-version' into code-quality
ArthurZucker Apr 5, 2025
695c1e7
fixup
ArthurZucker Apr 5, 2025
54785ef
Merge branch 'code-quality' of github.com:huggingface/new-model-addit…
ArthurZucker Apr 5, 2025
26b5674
commit licence, cleanup here and there and style
ArthurZucker Apr 5, 2025
c53e259
more styling changes
ArthurZucker Apr 5, 2025
f87c237
Merge pull request #51 from huggingface/code-quality
ArthurZucker Apr 5, 2025
7d5d5f0
Merge pull request #55 from huggingface/tgi_cuda_graph_fix
ArthurZucker Apr 5, 2025
1895d02
fix dummies
ArthurZucker Apr 5, 2025
931dad9
Merge branch 'final-version' of github.com:huggingface/new-model-addi…
ArthurZucker Apr 5, 2025
ed669a3
fix and clean docstrings
ArthurZucker Apr 5, 2025
7f292e1
remove comment
ArthurZucker Apr 5, 2025
b97451e
Merge branch 'main' of github.com:huggingface/new-model-addition-meta…
ArthurZucker Apr 5, 2025
34f6e9e
remove warning
ArthurZucker Apr 5, 2025
bac11b5
Only fast image processor is supported
LysandreJik Apr 5, 2025
d73aea8
nit
LysandreJik Apr 5, 2025
ab8bbad
trigger CI
ydshieh Apr 5, 2025
6c6e901
fix issue with flex encoder
ArthurZucker Apr 5, 2025
4994729
Merge branch 'final-version' of github.com:huggingface/new-model-addi…
ArthurZucker Apr 5, 2025
5b96e5d
Merge pull request #58 from huggingface/only-fast-image-processor
ArthurZucker Apr 5, 2025
5ce5746
fix dynamic cache
ArthurZucker Apr 5, 2025
555c4ee
Merge branch 'final-version' of github.com:huggingface/new-model-addi…
ArthurZucker Apr 5, 2025
6ba8ef7
Code quality
LysandreJik Apr 5, 2025
ecaa1a7
Code quality
LysandreJik Apr 5, 2025
0c8624b
fix more tests for now
ArthurZucker Apr 5, 2025
8167ac4
Code quality
LysandreJik Apr 5, 2025
71521af
Code quality
LysandreJik Apr 5, 2025
949b1b7
Nuke bunch of failing stuff
ArthurZucker Apr 5, 2025
b878647
Merge branch 'final-version' of github.com:huggingface/new-model-addi…
ArthurZucker Apr 5, 2025
cbb6e59
Code quality
LysandreJik Apr 5, 2025
8c50934
Code quality
LysandreJik Apr 5, 2025
44a90c0
cleanup removal of slow image processor
ArthurZucker Apr 5, 2025
99b6bc8
ruff fix fast image processor
ArthurZucker Apr 5, 2025
7c471ea
fix
LysandreJik Apr 5, 2025
538ba2b
fix styling
ArthurZucker Apr 5, 2025
50a8daa
git push Merge branch 'final-version' of github.com:huggingface/new-m…
ArthurZucker Apr 5, 2025
07eaf8c
Docs
LysandreJik Apr 5, 2025
8b39d94
Repo consistency
LysandreJik Apr 5, 2025
3736b90
Repo consistency
LysandreJik Apr 5, 2025
9274653
fix sliding window issue
ArthurZucker Apr 5, 2025
22a33e3
git push Merge branch 'add-llama4' of github.com:huggingface/transfor…
ArthurZucker Apr 5, 2025
748d622
separate llama cache
ArthurZucker Apr 5, 2025
6a777c0
styling
ArthurZucker Apr 5, 2025
457f3c6
Repo consistency
LysandreJik Apr 5, 2025
1226014
Repo consistency
LysandreJik Apr 5, 2025
ac54e8f
push waht works
ArthurZucker Apr 5, 2025
69e9470
Merge branch 'add-llama4' of github.com:huggingface/transformers into…
ArthurZucker Apr 5, 2025
8f08b70
L4 Repo consistency
LysandreJik Apr 5, 2025
e9769f0
Docs
LysandreJik Apr 5, 2025
2ec5fbe
fix last last alst alst alst alstsaltlsltlaslt
ArthurZucker Apr 5, 2025
9bfae24
Merge branch 'add-llama4' of github.com:huggingface/transformers into…
ArthurZucker Apr 5, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -507,6 +507,8 @@
title: Llama2
- local: model_doc/llama3
title: Llama3
- local: model_doc/llama4
title: Llama4
- local: model_doc/longformer
title: Longformer
- local: model_doc/longt5
Expand Down
Loading
Loading