Towards Generating and Evaluating Multi-modal Design Documents

Talk Abstract:

Design documents are ubiquitous, spanning everything from online advertisements and billboards to the packaging of everyday consumer products. For researchers, these documents constitute inherently multimodal artifacts that seamlessly integrate images, text, and graphical elements. In this talk, the speaker will present two AAAI 2026 works that investigate how design documents can be generated layer by layer using a multimodal large language model, as well as how their quality can be evaluated and improved through an agentic evaluation framework.

Speaker Bio:

Joseph K J is a Research Scientist at Adobe Research. His research interests span multimodal learning, computer vision, and intelligent content creation. He has authored numerous publications at leading venues, including CVPR, ICCV, ECCV, AAAI, and WACV. He received his PhD from the Indian Institute of Technology Hyderabad, where he was advised by Prof. Vineeth Balasubramanian. During his PhD, he has also interned with Google Research and Mohamed bin Zayed University of Artificial Intelligence (MBZUAI).