Bhavan Jasani

Applied Scientist · Computer Vision · Multimodal AI · Synthetic Data

I build practical machine learning systems at the intersection of computer vision, natural language processing, and reasoning.

Portrait of Bhavan Jasani

About

I’m an Applied Scientist who brings research to production, specializing in computer vision and multimodal machine learning. My work focuses on two areas:

  1. Multi-modal learning — across images, text, video, audio, and structured data
  2. Synthetic data generation & annotation — with humans and AI in the loop to overcome scarce or hard-to-label data

Areas of expertise: multi-modal machine learning, synthetic data generation, document intelligence (layout-aware transformers), visual grounding, chart reasoning & visual question answering, and scalable training/inference.

I’m increasingly motivated to apply my skills in AI to healthcare, genomics, and drug discovery — with the broader goal of contributing to research and products that have real clinical and societal impact.

Experience

  1. Amazon AWS AI Labs

    Amazon AWS AI Labs

    Applied Scientist II (Computer Vision Research)

    Sep 2019 – Present

  2. Carnegie Mellon University

    Carnegie Mellon University, Robotics Institute

    Research Assistant (Multi-modal Emotion Recognition)

    Oct 2017 – Aug 2019

  3. Nanyang Technological University

    Nanyang Technological University, Singapore

    Research Assistant (Hardware-efficient Computer Vision)

    Jan 2016 – May 2017

Selected Publications

  1. Chart VQA

    Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators for Reasoning-Based Chart VQA

    Conference on Computer Vision and Pattern Recognition (CVPR), 2024

    B Jasani*, Z Li*, P Tang, S Ghadar

    PDF

  2. YORO

    YORO: Lightweight End-to-End Visual Grounding

    European Conference on Computer Vision (ECCV) Workshops, 2022

    CH Ho, S Appalaraju, B Jasani, R Manmatha, N Vasconcelos

    PDF

  3. DocFormer

    DocFormer: End-to-End Transformer for Document Understanding

    International Conference on Computer Vision (ICCV), 2021

    S Appalaraju, B Jasani, BU Kota, Y Xie, R Manmatha

    PDF

  4. AMLC2021VQA_1

    End-to-End Visual Question Answering on Document Images

    Amazon Machine Learning Conference (AMLC), 2021

    B Jasani*, Y Xie*, R Manmatha

    PDF

  5. AMLC2021VQA_2

    Exploiting Spatial Layout in Document Question Answering using Transformers

    Amazon Machine Learning Conference (AMLC), 2021

    Y Xie, B Pang, Y Zhang, B Jasani, V Mahadevan, R Manmatha

    PDF

  6. MovieQA

    Are We Asking the Right Questions in MovieQA?

    International Conference on Computer Vision (ICCV) Workshops, 2019

    [Spotlight oral presentation]

    B Jasani, R Girdhar, D Ramanan

    PDF

  7. Pose Action Recognition

    Skeleton-based Zero Shot Action Recognition in Joint Pose-Language Semantic Space

    arXiv, 2019

    B Jasani, A Mazagonwalla

    PDF

  8. CMU Thesis

    Automatic detection of human affective behavior in dyadic conversations

    CMU RI Technical Report (Master's Thesis), 2019

    B Jasani

    PDF

  9. MovieQA

    Learning sampling policies for domain adaptation?

    arXiv, 2019<

    Y Patel*, K Chitta*, B Jasani*

    PDF

  10. T-CSVT Harris Corner

    Data-path unrolling with logic folding for area-time-efficient FPGA-based FAST corner detector

    Journal of Real-Time Image Processing, 2019

    SK Lam, T Lim, M Wu, B Cao, B Jasani

    PDF

  11. T-CSVT Harris Corner

    Threshold-Guided Design and Optimization for Harris Corner Detector Architecture

    IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), Journal, 2017

    B Jasani, SK Lam, PK Meher, M Wu

    PDF

  12. T-CSVT Harris Corner

    Area-time efficient FAST corner detector using data-path transposition

    IEEE Transactions on Circuits and Systems II: Express Briefs, Journal, 2017

    SK Lam, T Lim, M Wu, B Cao, B Jasani

    PDF

  13. Bachelor Thesis

    Accelerating Feature Detectors For Real-Time Vision-Based Applications

    Bachelor's Thesis, 2016

    B Jasani

    PDF

Patents

  1. Global Prompts with Linear Adapter Tuning for Regression-Free Model Update

    US Patent 85,779,920 — filed 2023

    B. Jasani, P. Tan, P. Zhu, R. Manmatha, V. Mahadevan, Y. Xie

  2. Document Visual Question Answering with Multimodal Transformer Encoder–Decoder Models

    US Patent 85,528,792 — filed 2022

    B. Jasani, N. Sankaran, P. Zhu, R. Manmatha, Y. Xie

Academic Services

  1. Program Committee

    • International Conference on Document Analysis and Recognition (ICDAR), 2025
    • Transferring and Adapting Source Knowledge in Computer Vision (TASK-CV) Workshop, ICCV 2019
  2. Reviewer

    • Conference on Computer Vision and Pattern Recognition (CVPR)
    • International Conference on Computer Vision (ICCV)
    • European Conference on Computer Vision (ECCV)
    • Amazon Computer Vision Conference (ACVC)
    • Amazon Research Awards
    • Book chapters – “Data Augmentation with Python”, Packt Publishing, 2023

Curriculum Vitae

Download my latest CV (updated 2025):