Research Direction

My current research direction centers on efficient AI inference, especially how model-level choices interact with compiler passes, runtime policies, and hardware realities.

Hardware-aware co-optimization of quantization and runtime schedulingEfficient inference for embodied AI and vision agents on edge devicesBridging the semantic gap between model abstractions and low-level execution

Research Overview

This page captures the current technical direction behind the site: efficient inference, system optimization, and hardware-aware execution.

Theme

Memory Wall in Edge Inference

I am interested in how transformer-based perception modules suffer from memory bandwidth and locality bottlenecks on edge devices such as Jetson-class platforms.

Theme

Quantization and Operator Fusion

Rather than treating quantization and fusion as isolated stages, I want to study them as a shared search space shaped by instruction-level and hardware-level constraints.

Theme

Runtime Scheduling

I care about how runtime policies, especially CPU-GPU coordination and dynamic scheduling, affect deterministic inference behavior in resource-constrained systems.

Agenda

Current Agenda

Hardware-aware co-optimization of quantization and runtime scheduling
Efficient inference for embodied AI and vision agents on edge devices
Bridging the semantic gap between model abstractions and low-level execution

Outputs

Outputs and Materials

One manuscript in preparation on multimodal video captioning
Research proposal on hardware-aware AI systems optimization
Project portfolio spanning CUDA, RAG systems, and multimodal inference