Skip to content

Fix to loading VLMs after transformers bump #1068

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

merveenoyan
Copy link
Contributor

No description provided.

@merveenoyan
Copy link
Contributor Author

@aymeric-roucher @albertvillanova tested with the RAG example and SmolLM and SmolVLM

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

The CI is red: some tests are not passing.

Comment on lines 685 to 686
api = HfApi()
pipeline_tag = api.model_info(model_id).pipeline_tag
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually use the functions from the root of the package, so we use a single API instance:

huggingface_hub.model_info

@merveenoyan
Copy link
Contributor Author

pushed a fix to handle models when a path is passed (and not necessarily loading from HF cache)

@merveenoyan
Copy link
Contributor Author

merveenoyan commented Mar 24, 2025

@albertvillanova the mocked test-model doesn't have a model card hence the error. I'm not sure where/how this is mocked though, can you give me a ref?
basically if a model doesn't have a filled model card (which is always true for transformers) we can't infer if it's a VLM or LLM. if we want to bypass this we can check config for one of the extensions for VLMs (ForVision2Seq, ForConditionalGeneration or ForImageTextToText) I can do either of them

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@merveenoyan, some thoughts:

  • It is true we did not consider is that model_id could be a local path, meaning the Hub-centric model_info approach can't be used in that case. We may need a different way to handle this.
  • This logic might already be implemented somewhere in transformers, so we should check if we can reuse it instead of reimplementing it.
  • If all the logic becomes too complex, it might be worth extracting it into a dedicated method for better maintainability.
  • Regarding the tests, we can address them once the implementation is finalized, so we clearly know what patches need to be added

@merveenoyan
Copy link
Contributor Author

@albertvillanova

  • a model may exist locally yet not be on Hub, in that case checking pipeline tag would fail, hence I added it myself for local loading.
  • we have to check the pipeline tag to infer which model class to load, hence we cannot use transformers to do it. it's different for local and remote hence two way of doing.

thus this seems to be the most feasible way of doing it imo, I don't think it's too complex.

@albertvillanova
Copy link
Member

@merveenoyan I think I may not have explained myself clearly. What I meant regarding local models (not on the Hub) is:

  • When a user trains a model locally using transformers and then pushes it to the Hub, I assume that pipeline_tag might be inferred automatically by transformers. That's why I suggested that the inferring logic may already exist within transformers, and we could potentially reuse it.
  • If that is not the case, and a model can be pushed to the Hub without pipeline_tag being inferred by transformers, then I assume the Hub itself has an inference mechanism in place. In that case, this logic might be accessible via huggingface-hub, meaning we wouldn’t need to reimplement it.

Let me know what you think!

@sysradium
Copy link
Contributor

I agree that maybe some part from transformers could be reused. For example given a pipeline tag you can infer the auto_model from https://github.com/huggingface/transformers/blob/348f3285c5114159d2ff4933b4b8ae36866d01a7/utils/update_metadata.py#L63

Maybe that would make code more sustainable to changes 🤔

@merveenoyan
Copy link
Contributor Author

merveenoyan commented Mar 28, 2025

@albertvillanova for transformers, if a model card doesn't exist, config.json always exists for instance (you can't load a model in transformers otherwise), for which you can have three classes for vision LMs (two of them are deprecated) AutoModelForVision2Seq, AutoModelForImageTextToText and AutoModelForConditionalGeneration. If you feel like it I can replace them (it looked a bit too hackish but it's fool-proof). V2S and ConditionalGeneration are deprecated in favor of IT2T, all the models from now on will have IT2T AFAIK.

@merveenoyan
Copy link
Contributor Author

closing this in favor of hotfix #1070 but would be better to get out of try except

@merveenoyan merveenoyan closed this Apr 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants