The Beginning: Research at CVIT

Following his undergraduate research journey, Rohan has joined Staqu Technologies as a Research Engineer in their core R&D team and is excited to continue applying his research mindset to industry-grade problems.

During his undergraduate years at CVIT, Rohan worked as a researcher at under the guidance of Prof. Ravi Kiran. This mentorship was the foundation of what would eventually become an award-winning research journey.

What started as a personal exploration of document analysis quickly became a serious academic endeavor. With no open-source datasets or starter code available, the early phase of the project demanded building everything from scratch — including the dataset and codebase for experimentation. Through consistent effort and Ravi sir’s mentorship, the initial groundwork was laid, and the vision for a novel solution began to take shape.

icdar2025b

 

Collaboration and Growth

While the initial stretch of research was carried out solo, the project evolved into a team effort in August 2024 when Jyothi joined as a co-author. She played a crucial role in extending the dataset, focusing on data augmentation strategies, and conducting experiments — all under the continued guidance of Prof. Ravi.

Together, the team built upon the foundational work, validating ideas through extensive experimentation. By January 2025, a new method proposed by them showed state-of-the-art performance — and that breakthrough became the backbone of TexTAR, the paper that would go on to win the Best Student Paper Award.

 

The Role of a Mentor

“A big part of this journey was the guidance of Prof. Ravi Kiran, who played a key role not just in shaping the research, but also in shaping how we approached it. His mentorship struck a balance between structure and freedom — giving us room to explore, but always with direction.” said Rohan.

Our weekly meetings, sharp at 9 AM every Monday, taught us the value of discipline and staying consistent. Even something as simple as being on time became part of our research mindset.

One piece of advice from Ravi sir that really stuck with me, Rohan says was:

“Try to reduce as many experiments as possible by using your intuitions — ML101.”

“This changed the way I thought about research. Instead of just running experiments blindly, I started thinking more critically about why something might work — and that made all the difference, said Rohan.”

 

A Real-World Problem, A Real Impact

At the heart of TexTAR was a desire to make document recognition systems more robust, flexible, and applicable across domains and languages. One real-world use case that inspired this direction was archival document digitization — particularly in preserving not just the text but the style and structure of historical documents. This is especially relevant for multilingual contexts and underrepresented scripts.

 

TexTAR: Textual Attribute Recognition in Multi-domain and Multi-lingual Document Images.

Understanding text in documents isn’t just about reading characters — it’s also about recognizing how that text is styled. Attributes like bold, italic, underline, and strikeout carry important semantic and visual cues. Accurately detecting these is crucial for tasks like document layout understanding, archival digitization, and semantic parsing.

However, existing methods often fall short — either due to high computational cost or poor adaptability, especially in noisy, multilingual settings.

To address these challenges, our paper introduces TexTAR: a multi-task, context-aware Transformer designed specifically for Textual Attribute Recognition (TAR). TexTAR brings together a set of innovations:

    • A context-aware data selection pipeline that improves the model’s understanding of visual and semantic context.
    • A novel 2D RoPE (Rotary Positional Encoding) mechanism that better captures the spatial layout of text — a key factor in accurately detecting attributes.
    • The creation of MMTAD, a diverse multilingual dataset annotated with text attributes from real-world documents — covering various scripts, fonts, and layout styles.

Through extensive evaluations, TexTAR demonstrated state-of-the-art performance, outperforming existing methods on both accuracy and robustness across diverse languages and domains.

This work not only advances the technical field but also brings us closer to building more intelligent, human-like document understanding systems.

 

In Her Own Words: Swaroopa Jyothi’s Path to ICDAR 2025

icdar2025c

 

For Swaroopa Jyothi Jinka, a third-year M.S. student at IIIT Hyderabad, the ICDAR 2025 experience was both unexpected and unforgettable. She completed her undergraduate studies at CBIT, Hyderabad in 2023 and joined the MS by Research program the same year. Since January 2024, she has been part of the CVIT lab, where her research has focused on document image analysis — starting with textual attribute recognition and currently exploring synthetic question–answer generation in the document domain.

While she joined the project midway, she quickly became a key contributor. By the time she came on board, Rohan had already built much of the foundation. But thanks to clear communication and shared commitment, the two developed a strong working rhythm.

“Rohan explained the entire project clearly when I joined, and from there, we always divided the work equally. Whether it was running experiments or writing the paper, we kept things parallel and worked really well as a team,” Jyothi shared.

What truly surprised Jyothi was the recognition their work received.

“To be honest, I didn’t expect we’d win Best Student Paper at ICDAR. But it gave me so much confidence in my research direction. It reminded me to focus on problems that matter to the community — and to think long-term, not just about short-term results.”

She recalls the paper writing phase as one of the most intense — and rewarding — parts of the process. With Prof. Ravi Kiran’s guidance, the team spent over 1.5 months solely focused on writing and refining.

“Sir shared many helpful blogs about how to write a good research paper. We went through multiple review cycles — making sure the storyline was informative and easy to follow. He was very particular about the diagrams too, which really helped improve the overall presentation.”

Representing the paper at ICDAR 2025 in China was another highlight. Jyothi’s presentation attracted a lot of attention — with more than 10 questions from an engaged audience. Though the official Q&A was only 5 minutes, hers went on for over 13.

 

icdar2025d

 

“People were really curious about the work, and asked interesting, detailed questions. After my talk, one of the session chairs came up to me personally and said, ‘It was a great presentation.’ That meant a lot.”

Beyond the conference halls, the experience of visiting China was just as enriching. Despite language barriers, the hospitality stood out.

“People were kind and helpful wherever we went. The monuments were beautiful, and on the second day of the conference, we had a cruise dinner — I met researchers working on similar topics and also many fellow Indians in the same space.”

Reflecting on the award and the journey, Jyothi emphasized how much the experience has shaped her.

“It taught me that my work can make a real impact. It also showed me how important clarity, discipline, and teamwork are in research.”

And none of it, she says, would have been possible without her advisor.

“Prof. Ravi was always there — especially during moments when progress was slow or I felt stuck. His encouragement helped me stay focused and find new ways to approach problems.”

“He was very particular about discipline — meetings were always on time, and he made sure we aimed to complete the paper a week ahead of the actual deadline. At the time it felt strict, but looking back, it really helped us polish and refine our work.”

What stood out most to her was how he didn’t just guide the research technically, but also taught them how to tell a compelling research story — how to think clearly, present thoughtfully, and approach the entire research process with seriousness and purpose.

She shared how he personally guided her through the process of building a compelling presentation, structuring the slides effectively, and — most importantly — speaking with clarity and confidence. His attention to detail and consistent feedback gave her the tools to communicate their research impactfully. Presenting at such a prestigious platform can be daunting, but thanks to this mentorship, Jyothi delivered with assurance and poise — earning appreciation from peers and experts alike.

The response from the academic community was overwhelmingly positive. Attendees engaged with the work deeply, offering valuable feedback and discussing the broader applications of the approach — especially its potential to generalize across languages and domains.

There was specific appreciation for the focus on practical challenges in multilingual document understanding and the fact that the system performed robustly even without relying on large-scale annotated data — a major step toward real-world applicability.

Winning the Best Student Paper Award at ICDAR 2025 is a remarkable achievement — but it’s also a glimpse into what’s possible when curiosity, discipline, mentorship, and persistence come together. We congratulate Rohan, Jyothi and Prof. Ravi Kiran on this well-deserved recognition, and look forward to seeing what they build next.