RoadSocial: A Diverse VideoQA Dataset and Benchmark for Road Event Understanding from Social Video Narratives

 

Chirag Parikh, Deepti Rawat,Rakshitha R. T., Tathagata Ghosh, Ravi Kiran Sarvadevabhatla

iHub-Data|CVIT|IIIT Hyderabad

The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2025

 

[Paper] [arxiv] [Dataset] [Code] [Checkpoints]

 

 

What is RoadSocial

RoadSocial is a large-scale, diverse VideoQA dataset tailored for generic road event understanding from social media narratives. It differentiates itself from existing datasets by capturing the global complexity of road events with varied geographies, camera viewpoints (CCTV, handheld, drones) and rich social discourse. RoadSocial highlights:

  • 14M frames, 414K social comments
  • 13.2K videos (7.9K minutes)
  • 674 unique video tags (total 100K+)
  • 260K high-quality socially-informed QA pairs
  • Scalable QA generation pipeline using social video narratives
  • 12 challenging video QA tasks for generic road event understanding
  • New tasks to test robustness of Video LLMs to hallucination
  • Improves generic road event understanding capability of Video LLMs
  • Critical insights onto zero-shot capabilities of 18 Video LLMs

 

Dataset Examples

 

 

Dataset Statistics

 

 

VideoQA Leaderboard

GPT-3.5 score (scale: 0-100) is reported for all tasks except Temporal Grounding (TG). Overall scores are reported for ALL QA tasks, Road-event Tasks (RT), Generic QAs, and Specific QAs.

Abbreviations: F, C, I, H, WR, KE, VP, DS, WY, CQ, AD, IN, CF, AV, IC.
#ModelParamsFactualComplexImaginativeHallucinationOverallOverallOverallOverall
WRKEVPDSWYCQTGADINCFAVIC(ALL)(RT)(Generic)(Specific)
1 Dolphin 9B 61.3 34.5 67.8 35.8 25.2 37.2 0.01 49.8 39.1 45.5 71.8 21.3 40.8 42.5 29.8 46.5
2 GPT-4o - 77.0 66.6 84.3 70.2 70.8 72.1 7.8 77.7 76.4 77.0 90.0 67.6 69.8 70.0 69.5 74.4
3 Gemini-1.5-Pro - 77.7 56.7 85.4 61.9 61.4 60.1 18.6 72.1 70.2 75.7 72.3 48.7 63.4 64.7 60.1 68.3
4 InternVL2 76B 72.4 51.3 81.4 57.1 59.0 62.1 1.07 70.5 67.0 69.2 58.6 27.6 56.4 59.1 55.5 65.1
5 Qwen2-VL 72B 76.6 56.6 85.1 60.2 64.0 67.6 0.01 71.9 72.4 71.6 37.0 40.2 58.6 60.3 58.3 68.8
6 LLaVA-Video 72B 75.8 52.4 76.8 52.4 55.0 52.2 9.94 68.3 63.7 64.9 83.5 24.7 56.7 59.6 51.1 63.3
7 LLaVA-OV 72B 75.1 54.1 78.7 53.0 53.3 54.1 3.99 67.8 61.9 63.1 45.1 19.9 52.5 55.5 51.8 63.0
8 VITA 8x7B 66.6 52.1 71.6 48.1 55.6 56.3 2.27 66.7 66.0 62.4 56.3 22.0 52.2 54.9 49.8 60.4
9 Tarsier 34B 73.7 58.1 78.2 58.2 59.0 58.8 0.32 71.6 71.1 67.4 83.2 82.3 63.5 61.8 58.4 66.1
10 ARIA 25.3B 75.4 53.1 86.2 58.4 56.9 70.2 8.96 75.1 74.7 74.0 86.4 29.2 62.4 65.4 56.7 68.5
11 InternVL2 8B 67.7 51.7 78.0 55.7 59.3 60.9 0.77 66.7 66.8 70.0 68.1 26.1 56.0 58.7 53.7 64.0
12 Mini-CPM-V 2.6 8B 77.7 57.6 80.6 55.0 50.5 57.5 0.4 61.6 52.3 59.3 73.5 30.0 54.7 56.9 51.0 62.0
13 IXC-2.5 7B 78.5 58.7 85.4 61.7 65.3 68.5 0.69 73.9 75.6 75.7 85.8 29.2 63.3 66.4 60.7 70.3
14 Tarsier 7B 69.9 54.7 72.3 52.0 53.4 55.2 0.11 69.5 69.3 63.5 79.1 67.3 58.9 58.1 54.0 61.7
15 LongVU 7B 73.0 53.0 76.3 51.1 50.2 55.0 0.84 59.7 55.8 58.2 48.9 32.7 51.2 52.9 47.7 59.7
16 Qwen2-VL 7B 75.5 52.8 76.1 52.7 57.7 56.4 0.59 69.2 71.6 65.9 37.5 39.6 54.6 56.0 52.6 63.9
17 LLaVA-Video 7B 74.6 50.1 76.7 52.1 50.1 50.3 1.43 60.4 53.8 58.7 61.8 23.5 51.1 53.6 47.6 59.7
18 LLaVA-OV 7B 73.4 51.2 77.2 50.7 51.7 51.2 0.97 62.8 55.4 58.6 45.4 21.1 50.0 52.6 48.4 59.8
19 LLaVA-OV ft. 7B 80.9 64.1 85.7 64.1 68.7 65.1 4.49 74.2 70.9 71.7 95.4 87.6 69.4 67.8 65.1 69.7

Submit your results: This email address is being protected from spambots. You need JavaScript enabled to view it.

 

Video Presentation

 
References: InternVL2 [4]; MM-AU [5]; VITA [6]; BDD-X [8]; LLaVA-OV [10]; ARIA [11]; Dolphin [13]; DRAMA [15]; LingoQA [16]; GPT-4o [18]; Rank2Tell [24]; LongVU [25]; DriveLM [26]; ROAD [27]; Gemini-1.5-Pro [28]; Tarsier [30]; Qwen2-VL [31]; SUTD-TrafficQA [32]; BDD-OIA [33]; Mini-CPM-V [35]; IXC-2.5 [36]; LLaVA-Video [37]

 

Citation

@misc{parikh2025roadsocialdiversevideoqadataset, 
author = {Chirag Parikh and Deepti Rawat and Rakshitha R. T. and Tathagata Ghosh and Ravi Kiran Sarvadevabhatla},
year = {2025},
eprint = {2503.21459},
archivePrefix = {arXiv},
primaryClass = {cs.CV},
url = {https://arxiv.org/abs/2503.21459},