Bhavan Jasani — Applied Scientist

About

I’m an Applied Scientist who brings research to production, specializing in computer vision and multimodal machine learning. My work focuses on two areas:

Multi-modal learning - across images, text, video, audio, and structured data
Synthetic data generation & annotation - with humans and AI in the loop to overcome scarce or hard-to-label data

Areas of expertise: Foundation Model Post-Training, Multimodal Learning, Synthetic Data Generation, Visually Rich Document Understanding, Visual Grounding, Chart Reasoning & Visual Question Answering, and Multi-node Distributed Training.

Looking ahead: I’m excited to apply my AI skills to physical robotics (humanoid robots, self-driving cars) and healthcare (genomics, drug discovery) - areas where research can have real societal impact.

Experience

Amazon AWS AI Labs

Applied Scientist (Computer Vision Research)

Sep 2019 – Present
Carnegie Mellon University, Robotics Institute

Research Assistant (Multi-modal Emotion Recognition)

Oct 2017 – Aug 2019
Nanyang Technological University, Singapore

Research Assistant (Hardware-efficient Computer Vision)

Jan 2016 – May 2017

Selected Publications

Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators for Reasoning-Based Chart VQA

Conference on Computer Vision and Pattern Recognition (CVPR), 2024

B Jasani*, Z Li*, P Tang, S Ghadar

Project Page | PDF | arXiv
YORO: Lightweight End-to-End Visual Grounding

European Conference on Computer Vision (ECCV) Workshops, 2022

CH Ho, S Appalaraju, B Jasani, R Manmatha, N Vasconcelos

PDF
DocFormer: End-to-End Transformer for Document Understanding

International Conference on Computer Vision (ICCV), 2021

S Appalaraju, B Jasani, BU Kota, Y Xie, R Manmatha

PDF
End-to-End Visual Question Answering on Document Images

Amazon Machine Learning Conference (AMLC), 2021

B Jasani*, Y Xie*, R Manmatha

PDF
Exploiting Spatial Layout in Document Question Answering using Transformers

Amazon Machine Learning Conference (AMLC), 2021

Y Xie, B Pang, Y Zhang, B Jasani, V Mahadevan, R Manmatha

PDF
Are We Asking the Right Questions in MovieQA?

International Conference on Computer Vision (ICCV) Workshops, 2019

[Spotlight oral presentation]

B Jasani, R Girdhar, D Ramanan

Project Page | PDF
Skeleton-based Zero Shot Action Recognition in Joint Pose-Language Semantic Space

arXiv, 2019

B Jasani, A Mazagonwalla

PDF
Automatic detection of human affective behavior in dyadic conversations

CMU RI Technical Report (Master's Thesis), 2019

B Jasani

PDF
Learning sampling policies for domain adaptation?

arXiv, 2019<< /p>
Y Patel*, K Chitta*, B Jasani*

PDF
Data-path unrolling with logic folding for area-time-efficient FPGA-based FAST corner detector

Journal of Real-Time Image Processing, 2019

SK Lam, T Lim, M Wu, B Cao, B Jasani

PDF
Threshold-Guided Design and Optimization for Harris Corner Detector Architecture

IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), Journal, 2017

B Jasani, SK Lam, PK Meher, M Wu

PDF
Area-time efficient FAST corner detector using data-path transposition

IEEE Transactions on Circuits and Systems II: Express Briefs, Journal, 2017

SK Lam, T Lim, M Wu, B Cao, B Jasani

PDF
Accelerating Feature Detectors For Real-Time Vision-Based Applications

Bachelor's Thesis, 2016

B Jasani

PDF

Patents

Global Prompts with Linear Adapter Tuning for Regression-Free Model Update

US Patent 85,779,920 — filed 2023

B. Jasani, P. Tan, P. Zhu, R. Manmatha, V. Mahadevan, Y. Xie
Document Visual Question Answering with Multimodal Transformer Encoder–Decoder Models

US Patent 85,528,792 — filed 2022

B. Jasani, N. Sankaran, P. Zhu, R. Manmatha, Y. Xie

Academic Services

Program Committee
- International Conference on Document Analysis and Recognition (ICDAR), 2025
- Transferring and Adapting Source Knowledge in Computer Vision (TASK-CV) Workshop, ICCV 2019
Reviewer
- Conference on Computer Vision and Pattern Recognition (CVPR)
- International Conference on Computer Vision (ICCV)
- European Conference on Computer Vision (ECCV)
- Amazon Computer Vision Conference (ACVC)
- Amazon Research Awards
- Book chapters – “Data Augmentation with Python”, Packt Publishing, 2023

Curriculum Vitae

Download my latest CV:

Download PDF Open in new tab

Fun Stuff

I enjoy partner dancing (Fusion and Salsa), have a deep passion for aviation and am working toward my private pilot license. I also love attending meditation retreats and going biking in my free time.

Applied Scientist · Computer Vision · Multimodal AI · Synthetic Data

About

Experience

Amazon AWS AI Labs

Carnegie Mellon University, Robotics Institute

Nanyang Technological University, Singapore

Selected Publications

Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators for Reasoning-Based Chart VQA

YORO: Lightweight End-to-End Visual Grounding

DocFormer: End-to-End Transformer for Document Understanding

End-to-End Visual Question Answering on Document Images

Exploiting Spatial Layout in Document Question Answering using Transformers

Are We Asking the Right Questions in MovieQA?

Skeleton-based Zero Shot Action Recognition in Joint Pose-Language Semantic Space

Automatic detection of human affective behavior in dyadic conversations

Learning sampling policies for domain adaptation?

Data-path unrolling with logic folding for area-time-efficient FPGA-based FAST corner detector

Threshold-Guided Design and Optimization for Harris Corner Detector Architecture

Area-time efficient FAST corner detector using data-path transposition

Accelerating Feature Detectors For Real-Time Vision-Based Applications

Patents

Global Prompts with Linear Adapter Tuning for Regression-Free Model Update

Document Visual Question Answering with Multimodal Transformer Encoder–Decoder Models

Academic Services

Program Committee

Reviewer

Curriculum Vitae

Fun Stuff