Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Dataset Card for Dataset Name
V-STaR is a spatio-temporal reasoning benchmark for Video-LLMs, evaluating Video-LLM’s spatio-temporal reasoning ability in answering questions explicitly in the context of “when”, “where”, and “what”. Github repository: V-STaR
Dataset Details
Comprehensive Dimensions: We evaluate Video-LLM’s spatio-temporal reasoning ability in answering questions explicitly in the context of “when”, “where”, and “what”. Human Alignment: We conducted… See the full description on the dataset page: https://huggingface.co/datasets/Cade921/vstar_sub.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
If you like our project, please give us a star ⭐ on Github for the latest update.
VideoRefer-Bench is a comprehensive benchmark to evaluate the object-level video understanding capabilities of a model, which consists of two sub-benchmarks: VideoRefer-Bench-D and VideoRefer-Bench-Q.
VideoRefer-Bench-D
The benchmark is designed to evaluate the description generation performance… See the full description on the dataset page: https://huggingface.co/datasets/DAMO-NLP-SG/VideoRefer-Bench.
Comparison of Price: USD per 1M Tokens; Lower is better by Provider
Comparison of Seconds to Output 500 Tokens, including reasoning model 'thinking' time; Lower is better by Model
Comparison of Output Speed: Output Tokens per Second by Provider
Comparison of Seconds to First Answer Token Received; Accounts for Reasoning Model 'Thinking' time by Model
Comprehensive comparison of Artificial Analysis Intelligence Index vs. Output Speed (Output Tokens per Second) by Model
Comprehensive comparison of Artificial Analysis Intelligence Index vs. Price (USD per M Tokens, Log Scale, More Expensive to Cheaper) by Model
Comprehensive comparison of Generation Time (Seconds) vs. Price by Model
Comprehensive comparison of Quality ELO vs. Generation Time (Seconds) by Model
Comparison of Generation time: Seconds to generate 1 image, Lower is better by Model
Comparison of ELO score in Artificial Analysis Image Arena (relative metric of image generation quality), Higher is better by Model
Comparison of Price: USD per 1000 image generations, Lower is better by Model
Comprehensive comparison of Quality ELO vs. Price by Model
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Dataset Card for Dataset Name
V-STaR is a spatio-temporal reasoning benchmark for Video-LLMs, evaluating Video-LLM’s spatio-temporal reasoning ability in answering questions explicitly in the context of “when”, “where”, and “what”. Github repository: V-STaR
Dataset Details
Comprehensive Dimensions: We evaluate Video-LLM’s spatio-temporal reasoning ability in answering questions explicitly in the context of “when”, “where”, and “what”. Human Alignment: We conducted… See the full description on the dataset page: https://huggingface.co/datasets/Cade921/vstar_sub.