-
Notifications
You must be signed in to change notification settings - Fork 29.2k
[Contributions Welcome] Add Fast Image Processors #36978
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hey! I'd like to work on this issue with MobileViT 😊 |
@yonigozlan have you considered adopting |
I would love to pick one and start contributing. Good task for this week..! |
@edgarriba I love kornia! But for image processors at inference time, it might be a bit overkill since 90% of the time, we only need a mix of resizing, normalizing, padding, and cropping, combined with some model-specific logic. I’ve found that That said, I do think kornia could be valuable down the line, especially for batch processing. I'm still exploring how to optimize batch processing performance on both GPU and CPU, and kornia likely handles this more efficiently than our current approach. |
The core of the library is pure functional -- that has been always the scope. The top layers you mention were added later just for commodity just for the case of augmentations but big part of the library has been designed as free functions for the exact purposes you mention. In any case, we are always open to collaborations and improvements. |
Hi! I'd like to start with YOLOS 🚀 |
Hi! i'd like to work on Segformer 😊 , any thoughts on this , it's my first contribution |
I'd like to start with mllama |
I would like to start with EfficientNet but some tests don't pass. Would be great if someone could have a look |
@zshn25 sad I have also worked on Efficientnet... I succeeded to fix the tests so you can have a look . anyway, I will let the maintainers decide which pull request to keep. |
Hi! I'd like to work on this issue with ImageGPT 🤗 |
Hi. I would like to do this for ZoeDepth if that's ok? |
Hi, I would like to work on |
Working on ViTMatte, should be able to create the PR sometime this weekend. UPDATE: This will take a bit longer since it appears that the original preprocessing may have a bug if the input format is ChannelDimension.First which makes it hard to compare the performance on torch.Tensor.... |
@yonigozlan hi i would like to work on Superpoint, will raise pr soon. |
@yonigozlan Hi, I would work on TVP, will raise pr soon! Thanks |
@yonigozlan Hi, I would work on instructblipvideo, will raise pr! thank you |
Hi @yonigozlan I'd like to work on the image processor for mobilenet. Will raise a PR! Thanks |
Hi! @yonigozlan Here is the PR for Nougat #37661. |
@yonigozlan |
Hi @yonigozlan @zucchini-nlp, That said, I also see that the TVP and VideoMAE models — which are video-only — are still included on the list, and a PR for VideoMAE has been opened, though it may not be merged based on recent discussions. Would it be appropriate to open a draft PR for the TVP model? Thank you! |
I can also do ZoeDepth sometime soon as it seems that's up for grabs again, may take a week or two though. |
Hi @yonigozlan , I would love to work on |
Hi! I'd like to work on glpn. Let me know if it's still available! |
Heya @aryanchauhan31 , I was originally working on GLPN, but due to other commitments, I haven't been able to dedicate enough time to it. Please feel free to take it over. If I can be of any help please feel free to reach out. |
Thanks. Sure i'll let you know |
Hi. I'd like to work on VitPose |
Hi @yonigozlan 👋, I'd like to contribute by implementing the Fast Image Processor for Pix2Struct. Please let me know if this model is still unassigned or if there’s anything specific I should be aware of before getting started. Thanks! |
Uh oh!
There was an error while loading. Please reload this page.
Community contributions: Add Fast Image Processors
Fast image processors have been rolling out progressively for a while. Now that the BaseImageProcessorFast, from which all fast image processors inherit, is in a more stable state, I'm opening this issue to encourage contributors to add fast image processors for models that still only have a "slow" image processor.
How to implement a Fast Image Processor
The core principle of fast image processors is to use
torch
andtorchvision
functions for image transformations instead ofPIL
ornumpy
. Among other performance benefits, this enables processing images on GPU, significantly improving inference speed.Another key difference compared to slow image processors is that, unlike
BaseImageProcessor
, which provides only a minimal skeleton,BaseImageProcessorFast
includes all the fundamental functionalities needed for a basic image processor. This allows optimizations made in BaseImageProcessorFast to propagate to its inherited classes. Additionally, most repetitive logic for image loading and argument handling is managed within BaseImageProcessorFast. Except in rare cases, inherited classes do not need to handle image loading, conversion, or retrieving arguments from class attributes in the call/preprocess function, this is all handled inBaseImageProcessorFast
.Getting Started
Run the following command:
where
model_name
is the name of the model (as found in its folder undertransformers/src/transformers/models
) for which you're adding the fast image processor.This command will handle all necessary imports and generate a basic fast image processor, which will look similar to this example for Beit:
As explained in the generated file, if the image processor only performs basic augmentations such as resizing, center cropping, rescaling, and normalizing, the generated file might be sufficient for a working fast image processor. The class attributes, such as
resample
andimage_mean
, are automatically parsed from the slow image processor when running the script above. However, you should verify their correctness and check for any missing or incorrectly assigned values.Customizing the Image Processor
If the image processor requires additional functionalities beyond the basic augmentations, you will need to override the
_preprocess
function inBaseImageProcessorFast
. Check the_preprocess
implementation inBaseImageProcessorFast
for reference. Notably, it leveragesgroup_images_by_shape
andreorder_images
to enable batch processing, significantly increasing processing speed, particularly on GPUs. If you create new image processing functions, ensure they support batch processing by utilizinggroup_images_by_shape
andreorder_images
where possible.If your image processor requires additional kwargs not present in
DefaultFastImageProcessorKwargs
, you must create aModelNameFastImageProcessorKwargs
class that inherits fromDefaultFastImageProcessorKwargs
and defines the new kwargs. Additionally, you should document the added kwargs in the class and thepreprocess
function usingadd_start_docstrings
. (This documentation process may be simplified soon, but is necessary for now to get a correct documentation).For an example of handling custom kwargs and documentation, refer to LlavaNextImageProcessorFast.
Important Notes
_preprocess
is the only function inBaseImageProcessorFast
that needs to be overridden._preprocess
function does not require default values for its arguments, as they are automatically derived from class attributes if not explicitly provided.PIL
images ornumpy
arrays are passed to the image processor, theimages
argument in_preprocess
will always be a list of tensors, with the channel dimension first.Handling Edge Cases
[[image1, image2], [image3]]
), they will be flattened to[image1, image2, image3]
by default before being passed to_preprocess
. This behavior can be modified by overriding_prepare_images_structure
, though flattening is generally recommended._preprocess
, override_further_process_kwargs
._validate_preprocess_kwargs
.Testing
In the case where the model already has a
test_image_processing_model_name.py
file undertransformers/tests/models/model_name
, the script ran before should have imported the fast image processor to the file, and added it as afast_image_processing_class
class attribute to theModelNameImageProcessingTest
class.However this is not enough to get all the tests to run on the fast image processor. For all the test functions under
ModelNameImageProcessingTest
, you need to replaceimage_processing = self.image_processing_class(**self.image_processor_dict)
with a loop overself.image_processor_list
.For example, the
test_image_processor_properties
test intest_image_processing_beit.py
which looks like this:should be changed to this:
In the case where no image processing test file is present, now is a great time to add one! You can have a look at the CLIP image processing test file to use as a simple starting point.
Don't hesitate to add model-specific tests if you feel like there are some non-standard image processing techniques in the processor :).
To run the tests, use this command:
Choosing an Image Processor to Implement
The difficulty of implementing a fast image processor varies by model. If this is your first issue, consider starting with an easier one!
Happy coding!
Here is the list of fast image processors left to implement:
Deta(deprecated)EfficientFormer(deprecated)TVLT(deprecated)ViT hybrid(deprecated)The text was updated successfully, but these errors were encountered: