Fix to loading VLMs after transformers bump #1068

merveenoyan · 2025-03-24T12:50:39Z

No description provided.

merveenoyan · 2025-03-24T13:08:35Z

@aymeric-roucher @albertvillanova tested with the RAG example and SmolLM and SmolVLM

albertvillanova

Thanks!

The CI is red: some tests are not passing.

albertvillanova · 2025-03-24T13:34:53Z

src/smolagents/models.py

+        api = HfApi()
+        pipeline_tag = api.model_info(model_id).pipeline_tag


We usually use the functions from the root of the package, so we use a single API instance:

huggingface_hub.model_info

merveenoyan · 2025-03-24T18:10:55Z

pushed a fix to handle models when a path is passed (and not necessarily loading from HF cache)

merveenoyan · 2025-03-24T18:43:58Z

@albertvillanova the mocked test-model doesn't have a model card hence the error. I'm not sure where/how this is mocked though, can you give me a ref?
basically if a model doesn't have a filled model card (which is always true for transformers) we can't infer if it's a VLM or LLM. if we want to bypass this we can check config for one of the extensions for VLMs (ForVision2Seq, ForConditionalGeneration or ForImageTextToText) I can do either of them

albertvillanova

@merveenoyan, some thoughts:

It is true we did not consider is that model_id could be a local path, meaning the Hub-centric model_info approach can't be used in that case. We may need a different way to handle this.
This logic might already be implemented somewhere in transformers, so we should check if we can reuse it instead of reimplementing it.
If all the logic becomes too complex, it might be worth extracting it into a dedicated method for better maintainability.
Regarding the tests, we can address them once the implementation is finalized, so we clearly know what patches need to be added

merveenoyan · 2025-03-25T15:25:53Z

@albertvillanova

a model may exist locally yet not be on Hub, in that case checking pipeline tag would fail, hence I added it myself for local loading.
we have to check the pipeline tag to infer which model class to load, hence we cannot use transformers to do it. it's different for local and remote hence two way of doing.

thus this seems to be the most feasible way of doing it imo, I don't think it's too complex.

albertvillanova · 2025-03-26T14:05:06Z

@merveenoyan I think I may not have explained myself clearly. What I meant regarding local models (not on the Hub) is:

When a user trains a model locally using transformers and then pushes it to the Hub, I assume that pipeline_tag might be inferred automatically by transformers. That's why I suggested that the inferring logic may already exist within transformers, and we could potentially reuse it.
If that is not the case, and a model can be pushed to the Hub without pipeline_tag being inferred by transformers, then I assume the Hub itself has an inference mechanism in place. In that case, this logic might be accessible via huggingface-hub, meaning we wouldn’t need to reimplement it.

Let me know what you think!

sysradium · 2025-03-27T23:42:53Z

I agree that maybe some part from transformers could be reused. For example given a pipeline tag you can infer the auto_model from https://github.com/huggingface/transformers/blob/348f3285c5114159d2ff4933b4b8ae36866d01a7/utils/update_metadata.py#L63

Maybe that would make code more sustainable to changes 🤔

merveenoyan · 2025-03-28T00:14:09Z

@albertvillanova for transformers, if a model card doesn't exist, config.json always exists for instance (you can't load a model in transformers otherwise), for which you can have three classes for vision LMs (two of them are deprecated) AutoModelForVision2Seq, AutoModelForImageTextToText and AutoModelForConditionalGeneration. If you feel like it I can replace them (it looked a bit too hackish but it's fool-proof). V2S and ConditionalGeneration are deprecated in favor of IT2T, all the models from now on will have IT2T AFAIK.

merveenoyan · 2025-04-09T13:43:36Z

closing this in favor of hotfix #1070 but would be better to get out of try except

merveenoyan and others added 2 commits March 24, 2025 13:50

fix to vlm loading

0bfb488

nit

b67d230

albertvillanova reviewed Mar 24, 2025

View reviewed changes

merveenoyan added 3 commits March 24, 2025 19:04

fix for loading from local path

b17503a

fix for loading from local path

7f94a74

fix for loading from local path

86073b1

albertvillanova reviewed Mar 25, 2025

View reviewed changes

merveenoyan closed this Apr 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix to loading VLMs after transformers bump #1068

Fix to loading VLMs after transformers bump #1068

merveenoyan commented Mar 24, 2025

merveenoyan commented Mar 24, 2025

albertvillanova left a comment

albertvillanova Mar 24, 2025

merveenoyan commented Mar 24, 2025

merveenoyan commented Mar 24, 2025 •

edited

Loading

albertvillanova left a comment

merveenoyan commented Mar 25, 2025

albertvillanova commented Mar 26, 2025

sysradium commented Mar 27, 2025

merveenoyan commented Mar 28, 2025 •

edited

Loading

merveenoyan commented Apr 9, 2025

		api = HfApi()
		pipeline_tag = api.model_info(model_id).pipeline_tag

Fix to loading VLMs after transformers bump #1068

Fix to loading VLMs after transformers bump #1068

Conversation

merveenoyan commented Mar 24, 2025

merveenoyan commented Mar 24, 2025

albertvillanova left a comment

Choose a reason for hiding this comment

albertvillanova Mar 24, 2025

Choose a reason for hiding this comment

merveenoyan commented Mar 24, 2025

merveenoyan commented Mar 24, 2025 • edited Loading

albertvillanova left a comment

Choose a reason for hiding this comment

merveenoyan commented Mar 25, 2025

albertvillanova commented Mar 26, 2025

sysradium commented Mar 27, 2025

merveenoyan commented Mar 28, 2025 • edited Loading

merveenoyan commented Apr 9, 2025

merveenoyan commented Mar 24, 2025 •

edited

Loading

merveenoyan commented Mar 28, 2025 •

edited

Loading