Sketchtopia: A Dataset and Foundational Agents for Benchmarking Asynchronous Multimodal Communication with Iconic Feedback
|
[Paper] [CVPR Paper] [Dataset (WIP)] [Demo ]
Abstract
We introduce Sketchtopia, a large-scale dataset and AI framework designed to explore goal-driven, multimodal communication through asynchronous interactions in a Pictionary-inspired setup. Sketchtopia captures natural human interactions, including freehand sketches, open-ended guesses, and iconic feedback gestures, showcasing the complex dynamics of cooperative communication under constraints. It features over 20K gameplay sessions from 916 players, capturing 263K sketches, 10K erases, 56K guesses and 19.4K iconic feedbacks.
We introduce multimodal foundational agents with capabilities for generative sketching, guess generation and asynchronous communication. Our dataset also includes 800 human-agent sessions for benchmarking the agents. We introduce novel metrics to characterize collaborative success, responsiveness to feedback and inter-agent asynchronous communication. Sketchtopia pushes the boundaries of multimodal AI, establishing a new benchmark for studying asynchronous, goal-oriented interactions between humans and AI agents.
Key Contributions
Rich Dataset
Large-scale, multimodal data capturing real-world asynchronous sketching dynamics.
Foundational Agents
DRAWBOT & GUESSBOT designed for asynchronous interaction.
New Metrics
Metrics like AAO, FRS, MATS for evaluation.
Dataset Highlights: Multimodal & Asynchronous
20K+
Sessions
Rich collection capturing diverse human Pictionary gameplay.
263K+
Sketches
Massive corpus of iterative freehand drawings for visual communication.
56K+
Open-ended Guesses
Natural language guesses reflecting understanding of visual cues.
19K+
Iconic Feedback
Non-verbal cues (👍👎❓) guiding the collaborative process asynchronously.
916
Players
Data from a diverse participant group ensuring robust analysis.
800
Human-Agent Sessions
Valuable data from humans interacting with our agents.
Sketchtopia Agents
ACTIONDECIDER: The Asynchronous Controller
The ActionDecider is the core component that enables asynchronous communication. It acts as a lightweight controller, continuously monitoring the game state (sketches, guesses, feedback) and deciding when agents should act and what action they should take. This allows for fluid, human-like interaction without the constraints of turn-taking, mirroring real-world communication dynamics.
DRAWBOT: The Sketcher
DRAWBOT visually communicates target word through asynchronous sketching, leveraging state-of-the-art generative models fine-tuned for iterative refinement based on communication context.
- Generates sketches from target concepts, adapting to canvas state.
- Refines drawings using feedback signals.
- Adapts to 👍 👎 ❓.
- Operates asynchronously, deciding when to draw or stay idle.
GUESSBOT: The Guesser
GUESSBOT interprets sketches and generates intelligent guesses, using a retrieval-based framework informed by historical interaction data.
- Uses vision models on sketch canvas content using vision models.
- Generates relevant guesses using efficient retrieval and filtering.
- Acts asynchronously, deciding when new information warrants a guess.
Evaluating Agent Performance
AAO
Asynchronous Action Overlap
Measures concurrent actions between agents. Close AAO values to human suggests more natural, human-like interaction dynamics.
FRS
Feedback Responsiveness Score
Quantifies how effectively agents adapt to feedback (👍👎) and move towards goal.
MATS
Multimodal Action Timing Similarity
Compares agent action timing patterns with human interactions to assess the naturalness of pacing.
Example Sessions
Resources
Interactive Demo - Coming Soon!
Stay tuned for a live demo where you can experience Sketchtopia agents interacting.
In the meantime, explore the Dataset
Citation
@in proceedings{khan2025sketchtopia,
author = {Sainithin Artham, Avijit Dasgupta, Shankar Gangisetty, and C. V. Jawahar},
title = {Sketchtopia: A Dataset and Foundational Agents for Benchmarking Asynchronous Multimodal Communication with Iconic Feedback},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
series = {},
volume = {},
pages = {},
publisher = {},
year = {2025},
url = {https://sketchtopia25.github.io/},
Intellectual Property Notice
This work is the subject of a patent application filed in [India / under PCT] and is protected under applicable intellectual property laws. All rights to the underlying technology, including the AI agents for drawing and guessing in a Pictionary-like setting, are reserved.The system is currently under active research and development. Any use, reproduction, or commercial exploitation of this work or its components without prior written consent is prohibited..
Patent Application Status: Patent Pending.


