Bhavan Jasani

Applied Scientist · Computer Vision · Multimodal AI · Synthetic Data

I build practical machine learning systems at the intersection of computer vision, natural language processing, and reasoning.

Portrait of Bhavan Jasani

About

I’m an Applied Scientist who brings research to production, specializing in computer vision and multimodal machine learning. My work focuses on two areas:

  1. Multi-modal learning - across images, text, video, audio, and structured data
  2. Synthetic data generation & annotation - with humans and AI in the loop to overcome scarce or hard-to-label data

Areas of expertise: Foundation Model Post-Training, Multimodal Learning, Synthetic Data Generation, Visually Rich Document Understanding, Visual Grounding, Chart Reasoning & Visual Question Answering, and Multi-node Distributed Training.

Looking ahead: I’m excited to apply my AI skills to physical robotics (humanoid robots, self-driving cars) and healthcare (genomics, drug discovery) - areas where research can have real societal impact.

Experience

  1. Amazon AWS AI Labs

    Amazon AWS AI Labs

    Applied Scientist (Computer Vision Research)

    Sep 2019 – Present

  2. Carnegie Mellon University

    Carnegie Mellon University, Robotics Institute

    Research Assistant (Multi-modal Emotion Recognition)

    Oct 2017 – Aug 2019

  3. Nanyang Technological University

    Nanyang Technological University, Singapore

    Research Assistant (Hardware-efficient Computer Vision)

    Jan 2016 – May 2017

Selected Publications

  1. Chart VQA

    Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators for Reasoning-Based Chart VQA

    Conference on Computer Vision and Pattern Recognition (CVPR), 2024

    B Jasani*, Z Li*, P Tang, S Ghadar

    Project Page | PDF | arXiv

  2. YORO

    YORO: Lightweight End-to-End Visual Grounding

    European Conference on Computer Vision (ECCV) Workshops, 2022

    CH Ho, S Appalaraju, B Jasani, R Manmatha, N Vasconcelos

    PDF

  3. DocFormer

    DocFormer: End-to-End Transformer for Document Understanding

    International Conference on Computer Vision (ICCV), 2021

    S Appalaraju, B Jasani, BU Kota, Y Xie, R Manmatha

    PDF

  4. AMLC2021VQA_1

    End-to-End Visual Question Answering on Document Images

    Amazon Machine Learning Conference (AMLC), 2021

    B Jasani*, Y Xie*, R Manmatha

    PDF

  5. AMLC2021VQA_2

    Exploiting Spatial Layout in Document Question Answering using Transformers

    Amazon Machine Learning Conference (AMLC), 2021

    Y Xie, B Pang, Y Zhang, B Jasani, V Mahadevan, R Manmatha

    PDF

  6. MovieQA

    Are We Asking the Right Questions in MovieQA?

    International Conference on Computer Vision (ICCV) Workshops, 2019

    [Spotlight oral presentation]

    B Jasani, R Girdhar, D Ramanan

    Project Page | PDF

  7. Pose Action Recognition

    Skeleton-based Zero Shot Action Recognition in Joint Pose-Language Semantic Space

    arXiv, 2019

    B Jasani, A Mazagonwalla

    PDF

  8. CMU Thesis

    Automatic detection of human affective behavior in dyadic conversations

    CMU RI Technical Report (Master's Thesis), 2019

    B Jasani

    PDF

  9. Learning Sampling Policies

    Learning sampling policies for domain adaptation?

    arXiv, 2019<< /p>

    Y Patel*, K Chitta*, B Jasani*

    PDF

  10. JRTIP FAST Corner

    Data-path unrolling with logic folding for area-time-efficient FPGA-based FAST corner detector

    Journal of Real-Time Image Processing, 2019

    SK Lam, T Lim, M Wu, B Cao, B Jasani

    PDF

  11. T-CSVT Harris Corner

    Threshold-Guided Design and Optimization for Harris Corner Detector Architecture

    IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), Journal, 2017

    B Jasani, SK Lam, PK Meher, M Wu

    PDF

  12. Area-time FAST Corner

    Area-time efficient FAST corner detector using data-path transposition

    IEEE Transactions on Circuits and Systems II: Express Briefs, Journal, 2017

    SK Lam, T Lim, M Wu, B Cao, B Jasani

    PDF

  13. Bachelor Thesis

    Accelerating Feature Detectors For Real-Time Vision-Based Applications

    Bachelor's Thesis, 2016

    B Jasani

    PDF

Patents

  1. Global Prompts with Linear Adapter Tuning for Regression-Free Model Update

    US Patent 85,779,920 — filed 2023

    B. Jasani, P. Tan, P. Zhu, R. Manmatha, V. Mahadevan, Y. Xie

  2. Document Visual Question Answering with Multimodal Transformer Encoder–Decoder Models

    US Patent 85,528,792 — filed 2022

    B. Jasani, N. Sankaran, P. Zhu, R. Manmatha, Y. Xie

Academic Services

  1. Program Committee

    • International Conference on Document Analysis and Recognition (ICDAR), 2025
    • Transferring and Adapting Source Knowledge in Computer Vision (TASK-CV) Workshop, ICCV 2019
  2. Reviewer

    • Conference on Computer Vision and Pattern Recognition (CVPR)
    • International Conference on Computer Vision (ICCV)
    • European Conference on Computer Vision (ECCV)
    • Amazon Computer Vision Conference (ACVC)
    • Amazon Research Awards
    • Book chapters – “Data Augmentation with Python”, Packt Publishing, 2023

Curriculum Vitae

Download my latest CV:

Fun Stuff

I enjoy partner dancing (Fusion and Salsa), have a deep passion for aviation and am working toward my private pilot license. I also love attending meditation retreats and going biking in my free time.