Skip to content

Files

Latest commit

 

History

History
42 lines (27 loc) · 1.47 KB

File metadata and controls

42 lines (27 loc) · 1.47 KB

Evaluation scripts for VLM tasks

This folder includes popular 3rd-party VLM benchmarks for VLM accuracy evaluation.

The following instructions show how to evaluate the VLM (including Model Optimizer quantized LLM) with the benchmarks, including the TensorRT-LLM deployment.

GQA

GQA: a dataset for real-world visual reasoning and compositional question answering. Upon completing the benchmark, the model's accuracy (in percentage format) will be displayed, providing a clear metric for performance evaluation.

First log in to Hugging Face account with your token.

huggingface-cli login

Baseline

bash gqa.sh --hf_model <dir_of_hf_model>

Quantized (simulated)

# MODELOPT_QUANT_CFG: Choose from [INT8_SMOOTHQUANT_CFG|FP8_DEFAULT_CFG|INT4_AWQ_CFG|W4A8_AWQ_BETA_CFG]
bash gqa.sh --hf_model <dir_hf_model> --quant_cfg MODELOPT_QUANT_CFG

Evaluate the TensorRT-LLM engine

TensorRT engine could be built following this guide

bash gqa.sh --hf_model <dir_hf_model> --visual_engine <dir_visual_engine> --llm_engine <dir_llm_engine>

If you encounter Out of Memory (OOM) issues during evaluation, you can try lowering the --kv_cache_free_gpu_memory_fraction argument (default is 0.8) to reduce GPU memory usage for kv_cache:

bash gqa.sh --hf_model <dir_hf_model> --visual_engine <dir_visual_engine> --llm_engine <dir_llm_engine> --kv_cache_free_gpu_memory_fraction 0.5