[Contributions Welcome] Add Fast Image Processors #36978

yonigozlan · 2025-03-25T20:03:27Z

Community contributions: Add Fast Image Processors

Fast image processors have been rolling out progressively for a while. Now that the BaseImageProcessorFast, from which all fast image processors inherit, is in a more stable state, I'm opening this issue to encourage contributors to add fast image processors for models that still only have a "slow" image processor.

How to implement a Fast Image Processor

The core principle of fast image processors is to use torch and torchvision functions for image transformations instead of PIL or numpy. Among other performance benefits, this enables processing images on GPU, significantly improving inference speed.

Another key difference compared to slow image processors is that, unlike BaseImageProcessor, which provides only a minimal skeleton, BaseImageProcessorFast includes all the fundamental functionalities needed for a basic image processor. This allows optimizations made in BaseImageProcessorFast to propagate to its inherited classes. Additionally, most repetitive logic for image loading and argument handling is managed within BaseImageProcessorFast. Except in rare cases, inherited classes do not need to handle image loading, conversion, or retrieving arguments from class attributes in the call/preprocess function, this is all handled in BaseImageProcessorFast.

Getting Started

Run the following command:

transformers-cli add-fast-image-processor --model-name model_name

where model_name is the name of the model (as found in its folder under transformers/src/transformers/models) for which you're adding the fast image processor.

This command will handle all necessary imports and generate a basic fast image processor, which will look similar to this example for Beit:

# coding=utf-8
# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Fast Image processor class for Beit."""

from ...image_processing_utils_fast import BASE_IMAGE_PROCESSOR_FAST_DOCSTRING, BaseImageProcessorFast
from ...image_utils import IMAGENET_STANDARD_MEAN, IMAGENET_STANDARD_STD, PILImageResampling
from ...utils import add_start_docstrings


@add_start_docstrings(
    "Constructs a fast Beit image processor.",
    BASE_IMAGE_PROCESSOR_FAST_DOCSTRING,
)
class BeitImageProcessorFast(BaseImageProcessorFast):
    # This generated class can be used as a starting point for the fast image processor.
    # if the image processor is only used for simple augmentations, such as resizing, center cropping, rescaling, or normalizing,
    # only the default values should be set in the class.
    # If the image processor requires more complex augmentations, methods from BaseImageProcessorFast can be overridden.
    # In most cases, only the `_preprocess` method should be overridden.

    # For an example of a fast image processor requiring more complex augmentations, see `LlavaNextImageProcessorFast`.

    # Default values should be checked against the slow image processor
    # None values left after checking can be removed
    resample = PILImageResampling.BICUBIC
    image_mean = IMAGENET_STANDARD_MEAN
    image_std = IMAGENET_STANDARD_STD
    size = {"height": 256, "width": 256}
    default_to_square = None
    crop_size = {"height": 224, "width": 224}
    do_resize = True
    do_center_crop = True
    do_rescale = True
    do_normalize = True
    do_convert_rgb = None


__all__ = ["BeitImageProcessorFast"]

As explained in the generated file, if the image processor only performs basic augmentations such as resizing, center cropping, rescaling, and normalizing, the generated file might be sufficient for a working fast image processor. The class attributes, such as resample and image_mean, are automatically parsed from the slow image processor when running the script above. However, you should verify their correctness and check for any missing or incorrectly assigned values.

Customizing the Image Processor

If the image processor requires additional functionalities beyond the basic augmentations, you will need to override the _preprocess function in BaseImageProcessorFast. Check the _preprocess implementation in BaseImageProcessorFast for reference. Notably, it leverages group_images_by_shape and reorder_images to enable batch processing, significantly increasing processing speed, particularly on GPUs. If you create new image processing functions, ensure they support batch processing by utilizing group_images_by_shape and reorder_images where possible.

If your image processor requires additional kwargs not present in DefaultFastImageProcessorKwargs, you must create a ModelNameFastImageProcessorKwargs class that inherits from DefaultFastImageProcessorKwargs and defines the new kwargs. Additionally, you should document the added kwargs in the class and the preprocess function using add_start_docstrings. (This documentation process may be simplified soon, but is necessary for now to get a correct documentation).

For an example of handling custom kwargs and documentation, refer to LlavaNextImageProcessorFast.

Important Notes

In nearly all cases, _preprocess is the only function in BaseImageProcessorFast that needs to be overridden.
The _preprocess function does not require default values for its arguments, as they are automatically derived from class attributes if not explicitly provided.
Even if PIL images or numpy arrays are passed to the image processor, the images argument in _preprocess will always be a list of tensors, with the channel dimension first.

Handling Edge Cases

Nested Images: If images are provided as nested lists (e.g., [[image1, image2], [image3]]), they will be flattened to [image1, image2, image3] by default before being passed to _preprocess. This behavior can be modified by overriding _prepare_images_structure, though flattening is generally recommended.
Formatting Custom Kwargs: If any custom kwargs require formatting before _preprocess, override _further_process_kwargs.
Validating Custom Kwargs: If additional validation is needed for custom kwargs or existing ones, override _validate_preprocess_kwargs.

Testing

In the case where the model already has a test_image_processing_model_name.py file under transformers/tests/models/model_name, the script ran before should have imported the fast image processor to the file, and added it as a fast_image_processing_class class attribute to the ModelNameImageProcessingTest class.
However this is not enough to get all the tests to run on the fast image processor. For all the test functions under ModelNameImageProcessingTest, you need to replace image_processing = self.image_processing_class(**self.image_processor_dict) with a loop over self.image_processor_list.

For example, the test_image_processor_properties test in test_image_processing_beit.py which looks like this:

    def test_image_processor_properties(self):
        image_processing = self.image_processing_class(**self.image_processor_dict)
        self.assertTrue(hasattr(image_processing, "do_resize"))
        self.assertTrue(hasattr(image_processing, "size"))
        self.assertTrue(hasattr(image_processing, "do_center_crop"))
        self.assertTrue(hasattr(image_processing, "center_crop"))
        self.assertTrue(hasattr(image_processing, "do_normalize"))
        self.assertTrue(hasattr(image_processing, "image_mean"))
        self.assertTrue(hasattr(image_processing, "image_std"))
        self.assertTrue(hasattr(image_processing, "do_reduce_labels"))

should be changed to this:

    def test_image_processor_properties(self):
        for image_processing_class in self.image_processor_list:
            image_processing = image_processing_class(**self.image_processor_dict)
            self.assertTrue(hasattr(image_processing, "do_resize"))
            self.assertTrue(hasattr(image_processing, "size"))
            self.assertTrue(hasattr(image_processing, "do_center_crop"))
            self.assertTrue(hasattr(image_processing, "center_crop"))
            self.assertTrue(hasattr(image_processing, "do_normalize"))
            self.assertTrue(hasattr(image_processing, "image_mean"))
            self.assertTrue(hasattr(image_processing, "image_std"))
            self.assertTrue(hasattr(image_processing, "do_reduce_labels"))

In the case where no image processing test file is present, now is a great time to add one! You can have a look at the CLIP image processing test file to use as a simple starting point.

Don't hesitate to add model-specific tests if you feel like there are some non-standard image processing techniques in the processor :).

To run the tests, use this command:

RUN_SLOW=1 python -m pytest tests/models/model_name/test_image_processing_model_name.py

Choosing an Image Processor to Implement

The difficulty of implementing a fast image processor varies by model. If this is your first issue, consider starting with an easier one!

Happy coding!

Here is the list of fast image processors left to implement:

The text was updated successfully, but these errors were encountered:

MinJu-Ha · 2025-03-26T05:55:49Z

Hey! I'd like to work on this issue with MobileViT 😊

edgarriba · 2025-03-26T11:02:08Z

@yonigozlan have you considered adopting kornia for that ? we have been curating algorithms (+700 ops) already for several years in terms of image processing and low level vision using exclusively pytorch.

Knight7561 · 2025-03-26T14:43:39Z

I would love to pick one and start contributing. Good task for this week..!

yonigozlan · 2025-03-26T20:00:30Z

@edgarriba I love kornia! But for image processors at inference time, it might be a bit overkill since 90% of the time, we only need a mix of resizing, normalizing, padding, and cropping, combined with some model-specific logic. I’ve found that torch/torchvision functional transforms usually cover these needs well, and pipelines like kornia ImageSequential or torchvision Compose aren’t always a good fit because some models require additional processing steps or custom logic in between. We also wanted to avoid adding an extra dependency to Transformers for fast image processors.

That said, I do think kornia could be valuable down the line, especially for batch processing. I'm still exploring how to optimize batch processing performance on both GPU and CPU, and kornia likely handles this more efficiently than our current approach.

edgarriba · 2025-03-26T20:14:23Z

The core of the library is pure functional -- that has been always the scope. The top layers you mention were added later just for commodity just for the case of augmentations but big part of the library has been designed as free functions for the exact purposes you mention. In any case, we are always open to collaborations and improvements.

goravaa · 2025-03-26T21:14:02Z

Hi! I'd like to start with YOLOS 🚀

capnmav77 · 2025-03-27T00:22:21Z

Hi! i'd like to work on Segformer 😊 , any thoughts on this , it's my first contribution
here's my draft PR, thank you

mariorch22 · 2025-03-27T18:12:45Z

I'd like to start with mllama

zshn25 · 2025-03-27T21:11:49Z

I would like to start with EfficientNet but some tests don't pass. Would be great if someone could have a look

#37055

Yann-CV · 2025-03-28T22:41:57Z

@zshn25 sad I have also worked on Efficientnet... I succeeded to fix the tests so you can have a look . anyway, I will let the maintainers decide which pull request to keep.

JaiJoshi123 · 2025-03-29T04:55:19Z

Hi! I'd like to work on this issue with ImageGPT 🤗

samrae7 · 2025-03-29T14:08:02Z

Hi. I would like to do this for ZoeDepth if that's ok?

RaghavPrabhakar66 · 2025-03-29T16:47:36Z

Hi, I would like to work on LayoutLMv3.

henrikm11 · 2025-04-10T20:47:16Z

Working on ViTMatte, should be able to create the PR sometime this weekend.

UPDATE: This will take a bit longer since it appears that the original preprocessing may have a bug if the input format is ChannelDimension.First which makes it hard to compare the performance on torch.Tensor....

arkhamHack · 2025-04-12T05:47:26Z

@yonigozlan hi i would like to work on Superpoint, will raise pr soon.

Kim-Ju-won · 2025-04-14T13:49:24Z

@yonigozlan Hi, I would work on TVP, will raise pr soon! Thanks

olccihyeon · 2025-04-17T10:14:09Z

@yonigozlan Hi, I would work on instructblipvideo, will raise pr! thank you

Rishik00 · 2025-04-18T05:05:41Z

Hi @yonigozlan I'd like to work on the image processor for mobilenet. Will raise a PR! Thanks

NahieliV · 2025-04-21T20:04:56Z

Hi! @yonigozlan Here is the PR for Nougat #37661.

Shoumik-Gandre · 2025-05-04T05:25:09Z

@yonigozlan
I need help for OneFormer, it has a Kwarg called max_size and I am unsure how to handle this scenario.
This example does not ignite insight: LlavaNextImageProcessorFast.

Kim-Ju-won · 2025-05-05T07:35:35Z

Hi @yonigozlan @zucchini-nlp,
I've been working onn a fast processor for the TVP model. However, after reviewing issue #37611, I noticed that the InstructBLIP model was removed from the list. Based on the issue discussion, I understand that fast image processors are not needed for video-only models.

That said, I also see that the TVP and VideoMAE models — which are video-only — are still included on the list, and a PR for VideoMAE has been opened, though it may not be merged based on recent discussions.

Would it be appropriate to open a draft PR for the TVP model?
I’d like to check in and get your thoughts before opening PR.

Thank you!

henrikm11 · 2025-05-06T21:30:28Z

I can also do ZoeDepth sometime soon as it seems that's up for grabs again, may take a week or two though.
Update: Been busier than expected, but I am almost there, PR will be raised very soon.

jgyasu · 2025-05-07T20:00:25Z

Hi @yonigozlan , I would love to work on vivit! Will raise a PR soon :)

aryanchauhan31 · 2025-05-19T23:43:58Z

Hi! I'd like to work on glpn. Let me know if it's still available!

Ajaykashela · 2025-05-20T06:13:56Z

Hi! I'd like to work on glpn. Let me know if it's still available!

Heya @aryanchauhan31 , I was originally working on GLPN, but due to other commitments, I haven't been able to dedicate enough time to it. Please feel free to take it over. If I can be of any help please feel free to reach out.

aryanchauhan31 · 2025-05-20T14:49:01Z

Hi! I'd like to work on glpn. Let me know if it's still available!

Heya @aryanchauhan31 , I was originally working on GLPN, but due to other commitments, I haven't been able to dedicate enough time to it. Please feel free to take it over. If I can be of any help please feel free to reach out.

Thanks. Sure i'll let you know

AnimeshMaheshwari22 · 2025-05-30T19:57:04Z

Hi. I'd like to work on VitPose

Ishubhammohole · 2025-06-02T23:37:01Z

Hi @yonigozlan 👋,

I'd like to contribute by implementing the Fast Image Processor for Pix2Struct.
This seems like a valuable opportunity to contribute to a multimodal model, and I’m excited to dive in.

Please let me know if this model is still unassigned or if there’s anything specific I should be aware of before getting started.

Thanks!
— Shubham

yonigozlan added contributions-welcome Good First Issue Good Second Issue Issues that are more difficult to do than "Good First" issues - give it a try if you want! Vision Processing labels Mar 25, 2025

sushmanthreddy mentioned this issue Mar 26, 2025

Add Fast SamImageProcessor #36999

Open

ariG23498 mentioned this issue Mar 26, 2025

[Fast Processor] BEiT #37005

Merged

keetrap mentioned this issue Mar 26, 2025

Add Fast Chinese-CLIP Processor #37012

Merged

ankithsavio mentioned this issue Mar 27, 2025

Add Fast Image Processor for Video-LLaVA #37023

Open

capnmav77 mentioned this issue Mar 27, 2025

Add Fast Segformer Processor #37024

Open

rootonchair mentioned this issue Mar 27, 2025

Add Fast Image Processor for Idefics3 #37045

Open

5 tasks

sushmanthreddy mentioned this issue Mar 27, 2025

Add Idefics2 Fast ImageProcessor #37053

Closed

zshn25 mentioned this issue Mar 27, 2025

Add EfficientNet Image PreProcessor #37055

Merged

5 tasks

keetrap mentioned this issue Mar 28, 2025

Add Fast Conditional-DETR Processor #37071

Merged

rootonchair mentioned this issue Mar 28, 2025

Add Fast Image Processor for Donut #37081

Merged

5 tasks

Yann-CV mentioned this issue Mar 28, 2025

Add ImageProcessorFast to Efficientnet processor #37094

Open

5 tasks

keetrap mentioned this issue Mar 29, 2025

Add Fast Grounding-Dino Processor #37108

Merged

samrae7 linked a pull request Apr 14, 2025 that will close this issue

36978 | Fast image processor for DPT model #37481

Open

5 tasks

Kim-Ju-won mentioned this issue Apr 14, 2025

Fast Image Processor for EfficientNet: Deprecated folder issue #37488

Closed

yonigozlan mentioned this issue Apr 14, 2025

Added fast image processing for ImageGPT - initial commit #37320

Open

cjfghk5697 mentioned this issue Apr 14, 2025

Fix IndexError in add_import_statement_init #37496

Closed

5 tasks

rootonchair mentioned this issue Apr 15, 2025

Mllama fast image processor #37539

Open

5 tasks

Player256 mentioned this issue Apr 16, 2025

Nougat Fast Image Processor #37561

Closed

olccihyeon mentioned this issue Apr 18, 2025

Add FastImageProcessor for InstructBLIPVideo #37611

Draft

5 tasks

NahieliV mentioned this issue Apr 21, 2025

add fast image processor nougat #37661

Open

3 tasks

arkhamHack linked a pull request Apr 26, 2025 that will close this issue

Superpoint fast image processor #37804

Open

5 tasks

Player256 mentioned this issue May 24, 2025

[WIP] Add OneformerFastImageProcessor #38343

Open

4 tasks

yonigozlan mentioned this issue May 27, 2025

Add EoMT Model #37610

Open

zucchini-nlp mentioned this issue May 28, 2025

support MiniCPM-o2.6 #37917

Open

AnimeshMaheshwari22 linked a pull request May 31, 2025 that will close this issue

Add fast imageprocessor vitpose #38502

Open

5 tasks

henrikm11 mentioned this issue Jun 1, 2025

added fast image processor for ZoeDepth and expanded tests accordingly #38515

Open

yonigozlan mentioned this issue Jun 2, 2025

Add glpn fast processor #38461

Open

[Contributions Welcome] Add Fast Image Processors #36978

[Contributions Welcome] Add Fast Image Processors #36978

Comments

yonigozlan commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Community contributions: Add Fast Image Processors

How to implement a Fast Image Processor

Getting Started

Customizing the Image Processor

Important Notes

Handling Edge Cases

Testing

Choosing an Image Processor to Implement

MinJu-Ha commented Mar 26, 2025

Uh oh!

edgarriba commented Mar 26, 2025

Uh oh!

Knight7561 commented Mar 26, 2025

Uh oh!

yonigozlan commented Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

edgarriba commented Mar 26, 2025

Uh oh!

goravaa commented Mar 26, 2025

Uh oh!

capnmav77 commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mariorch22 commented Mar 27, 2025

Uh oh!

zshn25 commented Mar 27, 2025

Uh oh!

Yann-CV commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JaiJoshi123 commented Mar 29, 2025

Uh oh!

samrae7 commented Mar 29, 2025

Uh oh!

RaghavPrabhakar66 commented Mar 29, 2025

Uh oh!

henrikm11 commented Apr 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arkhamHack commented Apr 12, 2025

Uh oh!

Kim-Ju-won commented Apr 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

olccihyeon commented Apr 17, 2025

Uh oh!

Rishik00 commented Apr 18, 2025

Uh oh!

NahieliV commented Apr 21, 2025

Uh oh!

Shoumik-Gandre commented May 4, 2025

Uh oh!

Kim-Ju-won commented May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

henrikm11 commented May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jgyasu commented May 7, 2025

Uh oh!

aryanchauhan31 commented May 19, 2025

Uh oh!

Ajaykashela commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aryanchauhan31 commented May 20, 2025

Uh oh!

AnimeshMaheshwari22 commented May 30, 2025

Uh oh!

Ishubhammohole commented Jun 2, 2025

Uh oh!

yonigozlan commented Mar 25, 2025 •

edited

Loading

yonigozlan commented Mar 26, 2025 •

edited

Loading

capnmav77 commented Mar 27, 2025 •

edited

Loading

Yann-CV commented Mar 28, 2025 •

edited

Loading

henrikm11 commented Apr 10, 2025 •

edited

Loading

Kim-Ju-won commented Apr 14, 2025 •

edited

Loading

Kim-Ju-won commented May 5, 2025 •

edited

Loading

henrikm11 commented May 6, 2025 •

edited

Loading

Ajaykashela commented May 20, 2025 •

edited

Loading